Preface |
|
xi | |
|
|
1 | (12) |
|
|
2 | (1) |
|
|
2 | (2) |
|
|
2 | (1) |
|
|
3 | (1) |
|
|
3 | (1) |
|
|
3 | (1) |
|
|
4 | (1) |
|
|
4 | (1) |
|
What Is the Command Line?? |
|
|
5 | (2) |
|
Why Data Science at the Command Line? |
|
|
7 | (2) |
|
The Command Line Is Agile |
|
|
7 | (1) |
|
The Command Line Is Augmenting |
|
|
7 | (1) |
|
The Command Line Is Scalable |
|
|
8 | (1) |
|
The Command Line Is Extensible |
|
|
8 | (1) |
|
The Command Line Is Ubiquitous |
|
|
9 | (1) |
|
|
9 | (3) |
|
|
12 | (1) |
|
|
13 | (16) |
|
|
13 | (1) |
|
Setting Up Your Data Science Toolbox |
|
|
13 | (4) |
|
Step 1 Download and Install VirtualBox |
|
|
14 | (1) |
|
Step 2 Download and Install Vagrant |
|
|
14 | (1) |
|
Step 3 Download and Start the Data Science Toolbox |
|
|
15 | (1) |
|
Step 4 Log In (on Linux and Mac OS X) |
|
|
16 | (1) |
|
Step 4 Log In (on Microsoft Windows) |
|
|
17 | (1) |
|
Step 5 Shut Down or Start Anew |
|
|
17 | (1) |
|
Essential Concepts and Tools |
|
|
17 | (10) |
|
|
18 | (1) |
|
Executing a Command-Line Tool |
|
|
19 | (1) |
|
Five Types of Command-Line Tools |
|
|
20 | (3) |
|
Combining Command-Line Tools |
|
|
23 | (1) |
|
Redirecting Input and Output |
|
|
24 | (1) |
|
|
24 | (1) |
|
|
25 | (2) |
|
|
27 | (2) |
|
|
29 | (12) |
|
|
29 | (1) |
|
Copying Local Files to the Data Science Toolbox |
|
|
30 | (1) |
|
Local Version of Data Science Toolbox |
|
|
30 | (1) |
|
Remote Version of Data Science Toolbox |
|
|
30 | (1) |
|
|
31 | (1) |
|
Converting Microsoft Excel Spreadsheets |
|
|
32 | (2) |
|
Querying Relational Databases |
|
|
34 | (1) |
|
Downloading from the Internet |
|
|
35 | (2) |
|
|
37 | (2) |
|
|
39 | (2) |
|
4 Creating Reusable Command-Line Tools |
|
|
41 | (14) |
|
|
42 | (1) |
|
Converting One-Liners into Shell Scripts |
|
|
42 | (7) |
|
|
44 | (1) |
|
Step 2 Add Permission to Execute |
|
|
45 | (1) |
|
|
46 | (1) |
|
Step 4 Remove Fixed Input |
|
|
47 | (1) |
|
|
47 | (1) |
|
|
48 | (1) |
|
Creating Command-Line Tools with Python and R |
|
|
49 | (4) |
|
|
50 | (2) |
|
Processing Streaming Data from Standard Input |
|
|
52 | (1) |
|
|
53 | (2) |
|
|
55 | (26) |
|
|
56 | (1) |
|
Common Scrub Operations for Plain Text |
|
|
56 | (6) |
|
|
57 | (3) |
|
|
60 | (2) |
|
Replacing and Deleting Values |
|
|
62 | (1) |
|
|
62 | (5) |
|
Bodies and Headers and Columns, Oh My! |
|
|
62 | (5) |
|
Performing SQL Queries on CSV |
|
|
67 | (1) |
|
Working with HTML/XML and JSON |
|
|
67 | (5) |
|
Common Scrub Operations for CSV |
|
|
72 | (8) |
|
Extracting and Reordering Columns |
|
|
72 | (1) |
|
|
73 | (2) |
|
|
75 | (2) |
|
Combining Multiple CSV Files |
|
|
77 | (3) |
|
|
80 | (1) |
|
6 Managing Your Data Workflow |
|
|
81 | (10) |
|
|
82 | (1) |
|
|
82 | (1) |
|
|
82 | (2) |
|
Obtain Top Ebooks from Project Gutenberg |
|
|
84 | (1) |
|
Every Workflow Starts with a Single Step |
|
|
85 | (2) |
|
|
87 | (2) |
|
Rebuilding Specific Targets |
|
|
89 | (1) |
|
|
90 | (1) |
|
|
90 | (1) |
|
|
91 | (24) |
|
|
92 | (1) |
|
Inspecting Data and Its Properties |
|
|
92 | (4) |
|
Header or Not, Here I Come |
|
|
92 | (1) |
|
|
92 | (1) |
|
Feature Names and Data Types |
|
|
93 | (2) |
|
Unique Identifiers, Continuous Variables, and Factors |
|
|
95 | (1) |
|
Computing Descriptive Statistics |
|
|
96 | (6) |
|
|
96 | (3) |
|
Using R from the Command Line with Rio |
|
|
99 | (3) |
|
|
102 | (12) |
|
Introducing Gnuplot and feedgnuplot |
|
|
102 | (2) |
|
|
104 | (3) |
|
|
107 | (1) |
|
|
108 | (2) |
|
|
110 | (1) |
|
|
111 | (1) |
|
|
112 | (1) |
|
|
113 | (1) |
|
|
114 | (1) |
|
|
114 | (1) |
|
|
115 | (20) |
|
|
116 | (1) |
|
|
116 | (3) |
|
|
116 | (1) |
|
|
117 | (1) |
|
|
118 | (1) |
|
|
119 | (6) |
|
|
121 | (1) |
|
|
122 | (1) |
|
Controlling the Number of Concurrent Jobs |
|
|
123 | (1) |
|
|
123 | (1) |
|
|
124 | (1) |
|
|
125 | (7) |
|
Get a List of Running AWS EC2 Instances |
|
|
126 | (1) |
|
Running Commands on Remote Machines |
|
|
127 | (1) |
|
Distributing Local Data Among Remote Machines |
|
|
128 | (1) |
|
Processing Files on Remote Machines |
|
|
129 | (3) |
|
|
132 | (1) |
|
|
133 | (2) |
|
|
135 | (24) |
|
|
136 | (1) |
|
|
136 | (3) |
|
Dimensionality Reduction with Tapkee |
|
|
139 | (3) |
|
|
140 | (1) |
|
|
140 | (1) |
|
Linear and Nonlinear Mappings |
|
|
141 | (1) |
|
|
142 | (8) |
|
|
143 | (1) |
|
Taming Weka on the Command Line |
|
|
143 | (4) |
|
Converting Between CSV and ARFF |
|
|
147 | (1) |
|
Comparing Three Clustering Algorithms |
|
|
147 | (3) |
|
Regression with SciKit-Learn Laboratory |
|
|
150 | (3) |
|
|
150 | (1) |
|
|
151 | (1) |
|
|
151 | (2) |
|
Classification with BigML |
|
|
153 | (3) |
|
Creating Balanced Train and Test Data Sets |
|
|
153 | (2) |
|
|
155 | (1) |
|
|
155 | (1) |
|
|
156 | (1) |
|
|
156 | (3) |
|
|
159 | (6) |
|
|
159 | (1) |
|
|
160 | (1) |
|
|
160 | (1) |
|
|
161 | (1) |
|
|
161 | (1) |
|
|
161 | (1) |
|
|
161 | (1) |
|
|
162 | (1) |
|
|
162 | (1) |
|
|
162 | (1) |
|
|
162 | (3) |
A List of Command-Line Tools |
|
165 | (18) |
B Bibliography |
|
183 | (4) |
Index |
|
187 | |