Foreword |
|
xiii | |
Preface |
|
xv | |
1 Introduction |
|
1 | (10) |
|
|
2 | (2) |
|
|
3 | (1) |
|
|
3 | (1) |
|
|
3 | (1) |
|
|
4 | (1) |
|
|
4 | (1) |
|
|
4 | (1) |
|
What Is the Command Line? |
|
|
5 | (2) |
|
Why Data Science at the Command Line? |
|
|
7 | (3) |
|
The Command Line Is Agile |
|
|
7 | (1) |
|
The Command Line Is Augmenting |
|
|
8 | (1) |
|
The Command Line Is Scalable |
|
|
8 | (1) |
|
The Command Line Is Extensible |
|
|
9 | (1) |
|
The Command Line Is Ubiquitous |
|
|
9 | (1) |
|
|
10 | (1) |
|
|
10 | (1) |
2 Getting Started |
|
11 | (24) |
|
|
11 | (1) |
|
Installing the Docker Image |
|
|
12 | (1) |
|
|
13 | (20) |
|
|
14 | (1) |
|
Executing a Command-Line Tool |
|
|
15 | (1) |
|
Five Types of Command-Line Tools |
|
|
16 | (4) |
|
Combining Command-Line Tools |
|
|
20 | (2) |
|
Redirecting Input and Output |
|
|
22 | (4) |
|
Working with Files and Directories |
|
|
26 | (2) |
|
|
28 | (2) |
|
|
30 | (3) |
|
|
33 | (1) |
|
|
33 | (2) |
3 Obtaining Data |
|
35 | (18) |
|
|
36 | (1) |
|
Copying Local Files to the Docker Container |
|
|
36 | (1) |
|
Downloading from the Internet |
|
|
37 | (4) |
|
|
37 | (1) |
|
|
38 | (1) |
|
|
39 | (1) |
|
|
39 | (2) |
|
|
41 | (2) |
|
Converting Microsoft Excel Spreadsheets to CSV |
|
|
43 | (3) |
|
Querying Relational Databases |
|
|
46 | (1) |
|
|
47 | (5) |
|
|
48 | (2) |
|
|
50 | (2) |
|
|
52 | (1) |
|
|
52 | (1) |
4 Creating Command-Line Tools |
|
53 | (24) |
|
|
54 | (1) |
|
Converting One-Liners into Shell Scripts |
|
|
55 | (14) |
|
|
58 | (3) |
|
Step 2: Give Permission to Execute |
|
|
61 | (1) |
|
|
62 | (3) |
|
Step 4: Remove the Fixed Input |
|
|
65 | (1) |
|
|
66 | (2) |
|
|
68 | (1) |
|
Creating Command-Line Tools with Python and R |
|
|
69 | (5) |
|
|
70 | (2) |
|
Processing Streaming Data from Standard Input |
|
|
72 | (2) |
|
|
74 | (1) |
|
|
74 | (3) |
5 Scrubbing Data |
|
77 | (30) |
|
|
78 | (1) |
|
Transformations, Transformations Everywhere |
|
|
78 | (3) |
|
|
81 | (9) |
|
|
81 | (5) |
|
|
86 | (2) |
|
Replacing and Deleting Values |
|
|
88 | (2) |
|
|
90 | (11) |
|
Bodies and Headers and Columns, Oh My! |
|
|
90 | (3) |
|
Performing SQL Queries on CSV |
|
|
93 | (1) |
|
Extracting and Reordering Columns |
|
|
94 | (1) |
|
|
95 | (1) |
|
|
96 | (3) |
|
Combining Multiple CSV Files |
|
|
99 | (2) |
|
Working with XML/HTML and JSON |
|
|
101 | (3) |
|
|
104 | (1) |
|
|
105 | (2) |
6 Project Management with Make |
|
107 | (12) |
|
|
108 | (1) |
|
|
109 | (1) |
|
|
109 | (3) |
|
|
112 | (1) |
|
|
113 | (5) |
|
|
118 | (1) |
|
|
118 | (1) |
7 Exploring Data |
|
119 | (34) |
|
|
120 | (1) |
|
Inspecting Data and Its Properties |
|
|
120 | (6) |
|
Header or Not, Here I Come |
|
|
120 | (1) |
|
|
121 | (1) |
|
Feature Names and Data Types |
|
|
122 | (2) |
|
Unique Identifiers, Continuous Variables, and Factors |
|
|
124 | (2) |
|
Computing Descriptive Statistics |
|
|
126 | (7) |
|
|
126 | (3) |
|
R One-Liners on the Shell |
|
|
129 | (4) |
|
|
133 | (19) |
|
Displaying Images from the Command Line |
|
|
133 | (5) |
|
|
138 | (2) |
|
|
140 | (2) |
|
|
142 | (1) |
|
|
143 | (1) |
|
|
144 | (2) |
|
|
146 | (1) |
|
|
147 | (2) |
|
|
149 | (1) |
|
|
150 | (2) |
|
|
152 | (1) |
|
|
152 | (1) |
|
|
152 | (1) |
8 Parallel Pipelines |
|
153 | (24) |
|
|
154 | (1) |
|
|
154 | (4) |
|
|
155 | (1) |
|
|
156 | (1) |
|
|
157 | (1) |
|
|
158 | (9) |
|
|
160 | (2) |
|
|
162 | (2) |
|
Controlling the Number of Concurrent Jobs |
|
|
164 | (1) |
|
|
164 | (2) |
|
|
166 | (1) |
|
|
167 | (7) |
|
Get List of Running AWS EC2 Instances |
|
|
167 | (2) |
|
Running Commands on Remote Machines |
|
|
169 | (1) |
|
Distributing Local Data Among Remote Machines |
|
|
170 | (1) |
|
Processing Files on Remote Machines |
|
|
171 | (3) |
|
|
174 | (1) |
|
|
175 | (2) |
9 Modeling Data |
|
177 | (22) |
|
|
178 | (1) |
|
|
178 | (4) |
|
Dimensionality Reduction with Tapkee |
|
|
182 | (5) |
|
|
183 | (1) |
|
Linear and Nonlinear Mappings |
|
|
183 | (4) |
|
Regression with Vowpal Wabbit |
|
|
187 | (6) |
|
|
187 | (1) |
|
|
188 | (2) |
|
|
190 | (3) |
|
Classification with SciKit-Learn Laboratory |
|
|
193 | (4) |
|
|
193 | (1) |
|
|
194 | (1) |
|
|
195 | (2) |
|
|
197 | (1) |
|
|
198 | (1) |
10 Polyglot Data Science |
|
199 | (14) |
|
|
200 | (1) |
|
|
200 | (3) |
|
|
203 | (2) |
|
|
205 | (2) |
|
|
207 | (1) |
|
|
208 | (2) |
|
|
210 | (1) |
|
|
211 | (2) |
11 Conclusion |
|
213 | (6) |
|
|
213 | (1) |
|
|
214 | (1) |
|
|
214 | (1) |
|
|
215 | (1) |
|
|
215 | (1) |
|
|
215 | (2) |
|
|
216 | (1) |
|
|
216 | (1) |
|
|
216 | (1) |
|
|
216 | (1) |
|
|
217 | (1) |
|
|
217 | (2) |
List of Command-Line Tools |
|
219 | (30) |
Index |
|
249 | |