Muutke küpsiste eelistusi

Data Cleaning Pocket Primer: Pocket Primer [Pehme köide]

  • Formaat: Paperback / softback, 188 pages, kaal: 290 g
  • Sari: Pocket Primer
  • Ilmumisaeg: 07-Feb-2018
  • Kirjastus: Mercury Learning & Information
  • ISBN-10: 1683922174
  • ISBN-13: 9781683922179
Teised raamatud teemal:
  • Formaat: Paperback / softback, 188 pages, kaal: 290 g
  • Sari: Pocket Primer
  • Ilmumisaeg: 07-Feb-2018
  • Kirjastus: Mercury Learning & Information
  • ISBN-10: 1683922174
  • ISBN-13: 9781683922179
Teised raamatud teemal:
Campesato introduces a powerful, flexible, and free set of data manipulation and cleansing commands developed during decades in the Unix/Linux environment that are now available in any operating system with a minimum amount of effort to set up the environment. He writes for data scientists, data analysts, and other people who perform data cleaning tasks and have a modest knowledge of shell programming, but may be relatively new to a "bash" environment. His examples and scripts use the bash command set, but many of the concepts translate into other forms of shell scripting (ksh, sh, csh), including the concept of piping data between commands, regular expression substitution, and the sed and awk commands. Distributed in North America by Stylus Publishing and Distribution Annotation ©2018 Ringgold, Inc., Portland, OR (protoview.com)

As part of the best selling Pocket Primer series, this book is an effort to give programmers sufficient knowledge of data cleaning to be able to work on their own projects. It is designed as a practical introduction to using flexible, powerful (and free) Unix / Linux shell commands to perform common data cleaning tasks. The book is packed with realistic examples and numerous commands that illustrate both the syntax and how the commands work together. Companion files with source code are available for downloading from the publisher.

Features:
- A practical introduction to using flexible, powerful (and free) Unix / Linux shell commands to perform common data cleaning tasks

- Includes the concept of piping data between commands, regular expression substitution, and the sed and awk commands

- Packed with realistic examples and numerous commands that illustrate both the syntax and how the commands work together

- Assumes the reader has no prior experience, but the topic is covered comprehensively enough to teach a pro some new tricks

- Includes companion files with all of the source code examples (download from the publisher).

Preface: Data Cleaning Pocket Primer xiii
What Is the Goal? xiii
Is This Book is for Me and What Will I Learn? xiii
How Were the Code Samples Created? xiv
What You Need to Know for This Book xiv
Which bash Commands are Excluded? xiv
How Do I Set Up a Command Shell? xv
What Are the "Next Steps" after Finishing This Book? xv
About the Technical Reviewer xvii
Chapter 1 Introduction
1(38)
What Is Unix?
2(1)
Available Shell Types
2(1)
What Is bash?
3(3)
Getting Help for bash Commands
4(1)
Navigating Around Directories
4(1)
The history Command
5(1)
Lasting Filenames with the Is Command
6(1)
Displaying Contents of Files
7(5)
The cat Command
8(1)
The head and tail Commands
9(1)
The Pipe Symbol
10(1)
The fold Command
11(1)
File Ownership: Owner, Group, and World
12(1)
Hidden Files
12(1)
Handling Problematic Filenames
13(1)
Working with Environment Variables
14(2)
The env Command
14(1)
Useful Environment Variables
14(1)
Setting the PATH Environment Variable
15(1)
Specifying Aliases and Environment Variables
15(1)
Finding Executable Files
16(1)
What Are Shell Scripts?
17(2)
A Simple Shell Script
18(1)
Using a Semicolon to Separate Commands
19(1)
The printf Command and the echo Command
20(1)
The echo Command and Whitespaces
20(1)
Command Substitution ("back tick")
21(1)
Setting Environment Variables via Shell Scripts
22(2)
Sourcing or "Dotting" a Shell Script
23(1)
Working with Arrays
24(3)
Working with Nested Loops
27(2)
The paste Command
29(1)
Inserting Blank Lines with the paste Command
29(1)
The cut Command
30(1)
Working with Metacharacters
31(1)
Working with Character Classes
32(1)
The "pipe" Symbol and Multiple Commands
33(1)
A Simple Use Case
34(1)
Another Simple Use Case
35(2)
Summary
37(2)
Chapter 2 Useful Commands
39(26)
The join Command
40(1)
The fold Command
41(1)
The split Command
41(1)
The sort Command
42(2)
The uniq Command
44(1)
How to Compare Files
45(1)
The od Command
45(1)
The tr Command
46(4)
A Simple Use Case
50(1)
The find Command
51(1)
The tee Command
52(1)
File Compression Commands
52(3)
The tar command
52(1)
The cpio Command
53(1)
The gzip and gunzip Commands
54(1)
The bunzip2 Command
54(1)
The zip Command
55(1)
Commands for zip Files and bz Files
55(1)
Internal Field Separator (IFS)
56(1)
Data from a Range of Columns in a Dataset
56(2)
Working with Uneven Rows in Datasets
58(1)
Working with Functions in Shell Scripts
59(2)
Recursion and Shell Scripts
61(1)
Iterative Solutions for Factorial Values
62(2)
Summary
64(1)
Chapter 3 Filtering Data with grep
65(26)
What Is the grep Command?
66(1)
Metacharacters and the grep Command
67(1)
Escaping Metacharacters with the grep Command
68(1)
Useful Options for the grep Command
69(5)
Character Classes and the grep Command
73(1)
Working with the --c Option in grep
74(1)
Matching a Range of Lines
75(2)
Using Back References in the grep Command
77(2)
Finding Empty Lines in Datasets
79(1)
Using Keys to Search Datasets
80(1)
The Backslash Character and the grep Command
81(1)
Multiple Matches in the grep Command
81(1)
The grep Command and the xargs Command
81(3)
Searching zip Files for a String
83(1)
Checking for a Unique Key Value
84(1)
Redirecting Error Messages
85(1)
The egrep Command and the fgrep Command
85(3)
Displaying "Pure" Words in a Dataset with egrep
86(2)
The fgrep Command
88(1)
A Simple Use Case
88(2)
Summary
90(1)
Chapter 4 Transforming Data with sed
91(24)
What Is the sed Command?
91(1)
The sed Execution Cycle
92(1)
Matching String Patterns Using sed
92(1)
Substituting String Patterns Using sed
93(3)
Replacing Vowels from a String or a File
95(1)
Deleting Multiple Digits and Letters from a String
96(1)
Search and Replace with sed
96(3)
Datasets with Multiple Delimiters
99(1)
Useful Switches in sed
99(1)
Working with Datasets
100(4)
Printing Lines
101(1)
Character Classes and sed
102(1)
Removing Control Characters
103(1)
Counting Words in a Dataset
104(1)
Back References in sed
104(1)
Displaying Only "Pure" Words in a Dataset
105(2)
One-Line sed Commands
107(7)
Summary
114(1)
Chapter 5 Doing Everything Else with awk
115(36)
The awk Command
116(2)
Built-in Variables That Control awk
116(1)
How Does the awk Command Work?
117(1)
Aligning Text with the printf Command
118(1)
Conditional Logic and Control Statements
119(3)
The while Statement
119(1)
A for loop in awk
120(1)
A for loop with a break Statement
121(1)
The next and continue Statements
121(1)
Deleting Alternate Lines in Datasets
122(1)
Merging Lines in Datasets
122(4)
Printing File Contents as a Single Line
123(1)
Joining Groups of Lines in a Text File
124(1)
Joining Alternate Lines in a Text File
125(1)
Matching with Metacharacters and Character Sets
126(1)
Printing Lines Using Conditional Logic
127(1)
Splitting Filenames with awk
128(1)
Working with Postfix Arithmetic Operators
129(1)
Numeric Functions in awk
130(2)
One-Line awk Commands
132(1)
Useful Short awk Scripts
133(2)
Printing the Words in a Text String in awk
135(1)
Count Occurrences of a String in Specific Rows
135(1)
Printing a String in a Fixed Number of Columns
136(1)
Printing a Dataset in a Fixed Number of Columns
137(1)
Aligning Columns in Datasets
138(1)
Aligning Columns and Multiple Rows in Datasets
139(1)
Removing a Column from a Text File
140(1)
Subsets of Columns of Even Rows in Datasets
141(1)
Counting Word Frequency in Datasets
142(2)
Displaying Only "Pure" Words in a Dataset
144(2)
Working with Multiline Records in awk
146(1)
A Simple Use Case
147(1)
Another Use Case
148(1)
Summary
149(2)
Appendix: Other Code Samples
151(32)
Examples for
Chapter 1
151(1)
Examples for
Chapter 2
151(1)
Calculating Fibonacci Numbers
152(1)
Calculating the GCD of Two Positive Integers
153(2)
Calculating the LCM of Two Positive Integers
155(1)
Calculating Prime Divisors
156(2)
Examples for
Chapter 3
158(6)
Simulating Relational Data with the grep Command
164(3)
Checking Updates in a Logfile
167(2)
Examples for
Chapter 4
169(1)
Examples for
Chapter 5
169(1)
Processing Multiline Records
169(2)
Adding the Contents of Records
171(1)
Using the split Function in awk
171(1)
Scanning Diagonal Elements in Datasets
172(3)
Adding Values from Multiple Datasets (1)
175(1)
Adding Values from Multiple Datasets (2)
176(2)
Adding Values from Multiple Datasets (3)
178(2)
Calculating Combinations of Field Values
180(1)
Summary
181(2)
Index 183
Campesato Oswald : Oswald Campesato (San Francisco, CA) is an adjunct instructor at UC-Santa Clara and specializes in Deep Learning, Java, Android, TensorFlow, and NLP. He is the author/co-author of over twenty-five books including TensorFlow 2 Pocket Primer, Python 3 for Machine Learning, and the NLP Using R Pocket Primer (all Mercury Learning and Information).