Preface |
|
ix | |
|
1 Introduction to Data Wrangling and Data Quality |
|
|
1 | (1) |
|
What Is "Data Wrangling"? |
|
|
2 | (1) |
|
|
3 | (3) |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
6 | (2) |
|
|
6 | (1) |
|
|
7 | (1) |
|
|
7 | (1) |
|
|
7 | (1) |
|
|
8 | (1) |
|
Writing and "Running" Python |
|
|
8 | (3) |
|
Working with Python on Your Own Device |
|
|
11 | (8) |
|
Getting Started with the Command Line |
|
|
11 | (3) |
|
Installing Python, Jupyter Notebook, and a Code Editor |
|
|
14 | (5) |
|
Working with Python Online |
|
|
19 | (1) |
|
|
20 | (3) |
|
Using Atom to Create a Standalone Python File |
|
|
20 | (1) |
|
Using Jupyter to Create a New Python Notebook |
|
|
21 | (1) |
|
Using Google Colab to Create a New Python Notebook |
|
|
22 | (1) |
|
|
23 | (1) |
|
|
23 | (1) |
|
|
23 | (1) |
|
|
23 | (1) |
|
|
23 | (1) |
|
|
24 | (1) |
|
Documenting, Saving, and Versioning Your Work |
|
|
24 | (11) |
|
|
24 | (1) |
|
|
25 | (1) |
|
|
26 | (9) |
|
|
35 | (2) |
|
|
37 | (34) |
|
The Programming "Parts of Speech" |
|
|
38 | (9) |
|
|
39 | (3) |
|
|
42 | (4) |
|
Cooking with Custom Functions |
|
|
46 | (1) |
|
Libraries: Borrowing Custom Functions from Other Coders |
|
|
47 | (1) |
|
Taking Control: Loops and Conditionals |
|
|
47 | (8) |
|
|
48 | (3) |
|
|
51 | (4) |
|
|
55 | (7) |
|
|
56 | (2) |
|
|
58 | (2) |
|
|
60 | (2) |
|
Hitting the Road with Citi Bike Data |
|
|
62 | (8) |
|
|
63 | (6) |
|
|
69 | (1) |
|
|
70 | (1) |
|
3 Understanding Data Quality |
|
|
71 | (20) |
|
|
73 | (6) |
|
|
74 | (2) |
|
|
76 | (1) |
|
|
77 | (2) |
|
|
79 | (9) |
|
Necessary, but Not Sufficient |
|
|
81 | (1) |
|
|
82 | (3) |
|
|
85 | (3) |
|
|
88 | (2) |
|
|
88 | (1) |
|
|
89 | (1) |
|
|
90 | (1) |
|
4 Working with File-Based and Feed-Based Data in Python |
|
|
91 | (50) |
|
Structured Versus Unstructured Data |
|
|
93 | (4) |
|
Working with Structured Data |
|
|
97 | (8) |
|
File-Based, Table-Type Data--Take It to Delimit |
|
|
97 | (2) |
|
Wrangling Table-Type Data with Python |
|
|
99 | (6) |
|
Real-World Data Wrangling: Understanding Unemployment |
|
|
105 | (29) |
|
XLSX, ODS, and All the Rest |
|
|
107 | (7) |
|
|
114 | (4) |
|
Feed-Based Data--Web-Driven Live Updates |
|
|
118 | (2) |
|
Wrangling Feed-Type Data with Python |
|
|
120 | (14) |
|
Working with Unstructured Data |
|
|
134 | (1) |
|
Image-Based Text: Accessing Data in PDFs |
|
|
134 | (6) |
|
Wrangling PDFs with Python |
|
|
135 | (4) |
|
Accessing PDF Tables with Tabula |
|
|
139 | (1) |
|
|
140 | (1) |
|
5 Accessing Web-Based Data |
|
|
141 | (44) |
|
Accessing Online XML and JSON |
|
|
143 | (2) |
|
|
145 | (1) |
|
Basic APIs: A Search Engine Example |
|
|
146 | (2) |
|
Specialized APIs: Adding Basic Authentication |
|
|
148 | (3) |
|
|
149 | (1) |
|
Using Your API key to Request Data |
|
|
150 | (1) |
|
Reading API Documentation |
|
|
151 | (2) |
|
Protecting Your API Key When Using Python |
|
|
153 | (6) |
|
Creating Your "Credentials" File |
|
|
155 | (1) |
|
Using Your Credentials in a Separate Script |
|
|
155 | (2) |
|
Getting Started with gitignore |
|
|
157 | (2) |
|
Specialized APIs: Working With OAuth |
|
|
159 | (13) |
|
Applying for a Twitter Developer Account |
|
|
160 | (2) |
|
Creating Your Twitter "App" and Credentials |
|
|
162 | (5) |
|
Encoding Your API Key and Secret |
|
|
167 | (1) |
|
Requesting an Access Token and Data from the Twitter API |
|
|
168 | (4) |
|
|
172 | (1) |
|
Web Scraping: The Data Source of Last Resort |
|
|
173 | (11) |
|
Carefully Scraping the MTA |
|
|
176 | (2) |
|
Using Browser Inspection Tools |
|
|
178 | (2) |
|
The Python Web Scraping Solution: Beautiful Soup |
|
|
180 | (4) |
|
|
184 | (1) |
|
|
185 | (40) |
|
|
187 | (1) |
|
|
187 | (28) |
|
|
188 | (1) |
|
|
189 | (1) |
|
|
189 | (12) |
|
|
201 | (5) |
|
|
206 | (2) |
|
|
208 | (3) |
|
|
211 | (2) |
|
|
213 | (1) |
|
|
213 | (2) |
|
Is It Dimensionally Structured? |
|
|
215 | (1) |
|
|
215 | (7) |
|
|
216 | (3) |
|
|
219 | (1) |
|
|
220 | (2) |
|
|
222 | (3) |
|
7 Cleaning, Transforming, and Augmenting Data |
|
|
225 | (32) |
|
Selecting a Subset of Citi Bike Data |
|
|
226 | (10) |
|
|
227 | (2) |
|
Regular Expressions: Supercharged String Matching |
|
|
229 | (4) |
|
|
233 | (3) |
|
|
236 | (3) |
|
|
239 | (3) |
|
Generating True CSVs from Fixed-Width Data |
|
|
242 | (2) |
|
Correcting for Spelling Inconsistencies |
|
|
244 | (6) |
|
The Circuitous Path to "Simple" Solutions |
|
|
250 | (2) |
|
Gotchas That Will Get Ya! |
|
|
252 | (1) |
|
|
253 | (3) |
|
|
256 | (1) |
|
8 Structuring and Refactoring Your Code |
|
|
257 | (30) |
|
Revisiting Custom Functions |
|
|
258 | (1) |
|
Will You Use It More Than Once? |
|
|
258 | (1) |
|
Is It Ugly and Confusing? |
|
|
258 | (1) |
|
Do You Just Really Hate the Default Functionality? |
|
|
259 | (1) |
|
|
259 | (3) |
|
Defining the Parameters for Function "Ingredients" |
|
|
262 | (2) |
|
|
263 | (1) |
|
|
263 | (1) |
|
|
264 | (1) |
|
|
265 | (2) |
|
Refactoring for Fun and Profit |
|
|
267 | (10) |
|
A Function for Identifying Weekdays |
|
|
267 | (3) |
|
Metadata Without the Mess |
|
|
270 | (7) |
|
Documenting Your Custom Scripts and Functions with pydoc |
|
|
277 | (4) |
|
The Case for Command-Line Arguments |
|
|
281 | (3) |
|
Where Scripts and Notebooks Diverge |
|
|
284 | (1) |
|
|
285 | (2) |
|
9 Introduction to Data Analysis |
|
|
287 | (36) |
|
|
288 | (1) |
|
|
289 | (1) |
|
What's Typical? Evaluating Central Tendency |
|
|
290 | (2) |
|
|
290 | (1) |
|
|
291 | (1) |
|
Think Different: Identifying Outliers |
|
|
292 | (1) |
|
Visualization for Data Analysis |
|
|
292 | (14) |
|
What's Our Data's Shape? Understanding Histograms |
|
|
296 | (1) |
|
The Significance of Symmetry |
|
|
297 | (8) |
|
|
305 | (1) |
|
|
306 | (11) |
|
|
317 | (4) |
|
|
321 | (2) |
|
|
323 | (34) |
|
Foundations for Visual Eloquence |
|
|
324 | (2) |
|
Making Your Data Statement |
|
|
326 | (1) |
|
Charts, Graphs, and Maps: Oh My! |
|
|
327 | (18) |
|
|
328 | (2) |
|
|
330 | (5) |
|
|
335 | (4) |
|
|
339 | (3) |
|
|
342 | (3) |
|
Elements of Eloquent Visuals |
|
|
345 | (4) |
|
The "Finicky" Details Really Do Make a Difference |
|
|
345 | (1) |
|
Trust Your Eyes (and the Experts) |
|
|
345 | (2) |
|
|
347 | (1) |
|
|
347 | (1) |
|
|
348 | (1) |
|
From Basic to Beautiful: Customizing a Visualization with seaborn and matplotlib |
|
|
349 | (5) |
|
|
354 | (1) |
|
|
355 | (2) |
|
|
357 | (8) |
|
Additional Tools for Data Review |
|
|
358 | (3) |
|
|
358 | (1) |
|
|
359 | (2) |
|
Additional Tools for Sharing and Presenting Data |
|
|
361 | (2) |
|
Image Editing for JPGs, PNGs, and GIFs |
|
|
361 | (1) |
|
Software for Editing SVGs and Other Vector Formats |
|
|
362 | (1) |
|
|
363 | (1) |
|
|
364 | (1) |
A More Python Programming Resources |
|
365 | (4) |
B A Bit More About Git |
|
369 | (6) |
C Finding Data |
|
375 | (6) |
D Resources for Visualization and Information Design |
|
381 | (2) |
Index |
|
383 | |