Preface: Data Cleaning Pocket Primer |
|
xiii | |
What Is the Goal? |
|
xiii | |
Is This Book is for Me and What Will I Learn? |
|
xiii | |
How Were the Code Samples Created? |
|
xiv | |
What You Need to Know for This Book |
|
xiv | |
Which bash Commands are Excluded? |
|
xiv | |
How Do I Set Up a Command Shell? |
|
xv | |
What Are the "Next Steps" after Finishing This Book? |
|
xv | |
About the Technical Reviewer |
|
xvii | |
|
|
1 | (38) |
|
|
2 | (1) |
|
|
2 | (1) |
|
|
3 | (3) |
|
Getting Help for bash Commands |
|
|
4 | (1) |
|
Navigating Around Directories |
|
|
4 | (1) |
|
|
5 | (1) |
|
Lasting Filenames with the Is Command |
|
|
6 | (1) |
|
Displaying Contents of Files |
|
|
7 | (5) |
|
|
8 | (1) |
|
The head and tail Commands |
|
|
9 | (1) |
|
|
10 | (1) |
|
|
11 | (1) |
|
File Ownership: Owner, Group, and World |
|
|
12 | (1) |
|
|
12 | (1) |
|
Handling Problematic Filenames |
|
|
13 | (1) |
|
Working with Environment Variables |
|
|
14 | (2) |
|
|
14 | (1) |
|
Useful Environment Variables |
|
|
14 | (1) |
|
Setting the PATH Environment Variable |
|
|
15 | (1) |
|
Specifying Aliases and Environment Variables |
|
|
15 | (1) |
|
|
16 | (1) |
|
|
17 | (2) |
|
|
18 | (1) |
|
Using a Semicolon to Separate Commands |
|
|
19 | (1) |
|
The printf Command and the echo Command |
|
|
20 | (1) |
|
The echo Command and Whitespaces |
|
|
20 | (1) |
|
Command Substitution ("back tick") |
|
|
21 | (1) |
|
Setting Environment Variables via Shell Scripts |
|
|
22 | (2) |
|
Sourcing or "Dotting" a Shell Script |
|
|
23 | (1) |
|
|
24 | (3) |
|
Working with Nested Loops |
|
|
27 | (2) |
|
|
29 | (1) |
|
Inserting Blank Lines with the paste Command |
|
|
29 | (1) |
|
|
30 | (1) |
|
Working with Metacharacters |
|
|
31 | (1) |
|
Working with Character Classes |
|
|
32 | (1) |
|
The "pipe" Symbol and Multiple Commands |
|
|
33 | (1) |
|
|
34 | (1) |
|
|
35 | (2) |
|
|
37 | (2) |
|
Chapter 2 Useful Commands |
|
|
39 | (26) |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
41 | (1) |
|
|
42 | (2) |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
45 | (1) |
|
|
46 | (4) |
|
|
50 | (1) |
|
|
51 | (1) |
|
|
52 | (1) |
|
File Compression Commands |
|
|
52 | (3) |
|
|
52 | (1) |
|
|
53 | (1) |
|
The gzip and gunzip Commands |
|
|
54 | (1) |
|
|
54 | (1) |
|
|
55 | (1) |
|
Commands for zip Files and bz Files |
|
|
55 | (1) |
|
Internal Field Separator (IFS) |
|
|
56 | (1) |
|
Data from a Range of Columns in a Dataset |
|
|
56 | (2) |
|
Working with Uneven Rows in Datasets |
|
|
58 | (1) |
|
Working with Functions in Shell Scripts |
|
|
59 | (2) |
|
Recursion and Shell Scripts |
|
|
61 | (1) |
|
Iterative Solutions for Factorial Values |
|
|
62 | (2) |
|
|
64 | (1) |
|
Chapter 3 Filtering Data with grep |
|
|
65 | (26) |
|
What Is the grep Command? |
|
|
66 | (1) |
|
Metacharacters and the grep Command |
|
|
67 | (1) |
|
Escaping Metacharacters with the grep Command |
|
|
68 | (1) |
|
Useful Options for the grep Command |
|
|
69 | (5) |
|
Character Classes and the grep Command |
|
|
73 | (1) |
|
Working with the --c Option in grep |
|
|
74 | (1) |
|
Matching a Range of Lines |
|
|
75 | (2) |
|
Using Back References in the grep Command |
|
|
77 | (2) |
|
Finding Empty Lines in Datasets |
|
|
79 | (1) |
|
Using Keys to Search Datasets |
|
|
80 | (1) |
|
The Backslash Character and the grep Command |
|
|
81 | (1) |
|
Multiple Matches in the grep Command |
|
|
81 | (1) |
|
The grep Command and the xargs Command |
|
|
81 | (3) |
|
Searching zip Files for a String |
|
|
83 | (1) |
|
Checking for a Unique Key Value |
|
|
84 | (1) |
|
Redirecting Error Messages |
|
|
85 | (1) |
|
The egrep Command and the fgrep Command |
|
|
85 | (3) |
|
Displaying "Pure" Words in a Dataset with egrep |
|
|
86 | (2) |
|
|
88 | (1) |
|
|
88 | (2) |
|
|
90 | (1) |
|
Chapter 4 Transforming Data with sed |
|
|
91 | (24) |
|
|
91 | (1) |
|
|
92 | (1) |
|
Matching String Patterns Using sed |
|
|
92 | (1) |
|
Substituting String Patterns Using sed |
|
|
93 | (3) |
|
Replacing Vowels from a String or a File |
|
|
95 | (1) |
|
Deleting Multiple Digits and Letters from a String |
|
|
96 | (1) |
|
Search and Replace with sed |
|
|
96 | (3) |
|
Datasets with Multiple Delimiters |
|
|
99 | (1) |
|
|
99 | (1) |
|
|
100 | (4) |
|
|
101 | (1) |
|
Character Classes and sed |
|
|
102 | (1) |
|
Removing Control Characters |
|
|
103 | (1) |
|
Counting Words in a Dataset |
|
|
104 | (1) |
|
|
104 | (1) |
|
Displaying Only "Pure" Words in a Dataset |
|
|
105 | (2) |
|
|
107 | (7) |
|
|
114 | (1) |
|
Chapter 5 Doing Everything Else with awk |
|
|
115 | (36) |
|
|
116 | (2) |
|
Built-in Variables That Control awk |
|
|
116 | (1) |
|
How Does the awk Command Work? |
|
|
117 | (1) |
|
Aligning Text with the printf Command |
|
|
118 | (1) |
|
Conditional Logic and Control Statements |
|
|
119 | (3) |
|
|
119 | (1) |
|
|
120 | (1) |
|
A for loop with a break Statement |
|
|
121 | (1) |
|
The next and continue Statements |
|
|
121 | (1) |
|
Deleting Alternate Lines in Datasets |
|
|
122 | (1) |
|
Merging Lines in Datasets |
|
|
122 | (4) |
|
Printing File Contents as a Single Line |
|
|
123 | (1) |
|
Joining Groups of Lines in a Text File |
|
|
124 | (1) |
|
Joining Alternate Lines in a Text File |
|
|
125 | (1) |
|
Matching with Metacharacters and Character Sets |
|
|
126 | (1) |
|
Printing Lines Using Conditional Logic |
|
|
127 | (1) |
|
Splitting Filenames with awk |
|
|
128 | (1) |
|
Working with Postfix Arithmetic Operators |
|
|
129 | (1) |
|
|
130 | (2) |
|
|
132 | (1) |
|
|
133 | (2) |
|
Printing the Words in a Text String in awk |
|
|
135 | (1) |
|
Count Occurrences of a String in Specific Rows |
|
|
135 | (1) |
|
Printing a String in a Fixed Number of Columns |
|
|
136 | (1) |
|
Printing a Dataset in a Fixed Number of Columns |
|
|
137 | (1) |
|
Aligning Columns in Datasets |
|
|
138 | (1) |
|
Aligning Columns and Multiple Rows in Datasets |
|
|
139 | (1) |
|
Removing a Column from a Text File |
|
|
140 | (1) |
|
Subsets of Columns of Even Rows in Datasets |
|
|
141 | (1) |
|
Counting Word Frequency in Datasets |
|
|
142 | (2) |
|
Displaying Only "Pure" Words in a Dataset |
|
|
144 | (2) |
|
Working with Multiline Records in awk |
|
|
146 | (1) |
|
|
147 | (1) |
|
|
148 | (1) |
|
|
149 | (2) |
|
Appendix: Other Code Samples |
|
|
151 | (32) |
|
|
151 | (1) |
|
|
151 | (1) |
|
Calculating Fibonacci Numbers |
|
|
152 | (1) |
|
Calculating the GCD of Two Positive Integers |
|
|
153 | (2) |
|
Calculating the LCM of Two Positive Integers |
|
|
155 | (1) |
|
Calculating Prime Divisors |
|
|
156 | (2) |
|
|
158 | (6) |
|
Simulating Relational Data with the grep Command |
|
|
164 | (3) |
|
Checking Updates in a Logfile |
|
|
167 | (2) |
|
|
169 | (1) |
|
|
169 | (1) |
|
Processing Multiline Records |
|
|
169 | (2) |
|
Adding the Contents of Records |
|
|
171 | (1) |
|
Using the split Function in awk |
|
|
171 | (1) |
|
Scanning Diagonal Elements in Datasets |
|
|
172 | (3) |
|
Adding Values from Multiple Datasets (1) |
|
|
175 | (1) |
|
Adding Values from Multiple Datasets (2) |
|
|
176 | (2) |
|
Adding Values from Multiple Datasets (3) |
|
|
178 | (2) |
|
Calculating Combinations of Field Values |
|
|
180 | (1) |
|
|
181 | (2) |
Index |
|
183 | |