About the Authors |
|
xi | |
Preface |
|
xiii | |
|
I Part I Introduction to Data Science |
|
|
1 | (180) |
|
1 Prologue: Why data science? |
|
|
3 | (6) |
|
1.1 What is data science? |
|
|
4 | (2) |
|
1.2 Case study: The evolution of sabermetrics |
|
|
6 | (1) |
|
|
7 | (1) |
|
|
8 | (1) |
|
|
9 | (26) |
|
2.1 The 2012 federal election cycle |
|
|
9 | (7) |
|
2.2 Composing data graphics |
|
|
16 | (8) |
|
2.3 Importance of data graphics: Challenger |
|
|
24 | (4) |
|
2.4 Creating effective presentations |
|
|
28 | (1) |
|
2.5 The wider world of data visualization |
|
|
29 | (2) |
|
|
31 | (1) |
|
|
32 | (1) |
|
2.8 Supplementary exercises |
|
|
33 | (2) |
|
|
35 | (32) |
|
3.1 A grammar for data graphics |
|
|
35 | (8) |
|
3.2 Canonical data graphics in R |
|
|
43 | (10) |
|
3.3 Extended example: Historical baby names |
|
|
53 | (9) |
|
|
62 | (1) |
|
|
62 | (3) |
|
3.6 Supplementary exercises |
|
|
65 | (2) |
|
4 Data wrangling on one table |
|
|
67 | (22) |
|
4.1 A grammar for data wrangling |
|
|
67 | (9) |
|
4.2 Extended example: Ben's time with the Mets |
|
|
76 | (8) |
|
|
84 | (1) |
|
|
84 | (4) |
|
4.5 Supplementary exercises |
|
|
88 | (1) |
|
5 Data wrangling on multiple tables |
|
|
89 | (14) |
|
|
89 | (2) |
|
|
91 | (1) |
|
5.3 Extended example: Manny Ramirez |
|
|
92 | (7) |
|
|
99 | (1) |
|
|
99 | (2) |
|
5.6 Supplementary exercises |
|
|
101 | (2) |
|
|
103 | (36) |
|
|
103 | (9) |
|
|
112 | (8) |
|
|
120 | (1) |
|
|
121 | (14) |
|
|
135 | (1) |
|
|
135 | (3) |
|
6.7 Supplementary exercises |
|
|
138 | (1) |
|
|
139 | (20) |
|
7.1 Vectorized operations |
|
|
139 | (3) |
|
7.2 Using across() with dplyr functions |
|
|
142 | (1) |
|
7.3 The map() family of functions |
|
|
143 | (1) |
|
7.4 Iterating over a one-dimensional vector |
|
|
144 | (2) |
|
7.5 Iteration over subgroups |
|
|
146 | (5) |
|
|
151 | (2) |
|
7.7 Extended example: Factors associated with BMI |
|
|
153 | (2) |
|
|
155 | (2) |
|
|
157 | (1) |
|
7.10 Supplementary exercises |
|
|
157 | (2) |
|
|
159 | (22) |
|
|
159 | (1) |
|
|
160 | (1) |
|
8.3 Role of data science in society |
|
|
161 | (2) |
|
8.4 Some settings for professional ethics |
|
|
163 | (4) |
|
8.5 Some principles to guide ethical action |
|
|
167 | (4) |
|
|
171 | (1) |
|
|
172 | (2) |
|
|
174 | (1) |
|
|
175 | (1) |
|
8.10 Professional guidelines for ethical conduct |
|
|
176 | (1) |
|
|
176 | (1) |
|
|
177 | (2) |
|
8.13 Supplementary exercises |
|
|
179 | (2) |
|
II Part II Statistics and Modeling |
|
|
181 | (118) |
|
9 Statistical foundations |
|
|
183 | (24) |
|
9.1 Samples and populations |
|
|
183 | (3) |
|
|
186 | (4) |
|
|
190 | (4) |
|
|
194 | (2) |
|
9.5 Statistical models: Explaining variation |
|
|
196 | (3) |
|
9.6 Confounding and accounting for other factors |
|
|
199 | (3) |
|
9.7 The perils of p-values |
|
|
202 | (2) |
|
|
204 | (1) |
|
|
205 | (1) |
|
9.10 Supplementary exercises |
|
|
206 | (1) |
|
|
207 | (22) |
|
|
208 | (1) |
|
10.2 Simple classification models |
|
|
209 | (7) |
|
|
216 | (7) |
|
10.4 Extended example: Who has diabetes? |
|
|
223 | (4) |
|
|
227 | (1) |
|
|
227 | (1) |
|
10.7 Supplementary exercises |
|
|
228 | (1) |
|
|
229 | (34) |
|
11.1 Non-regression classifiers |
|
|
229 | (16) |
|
|
245 | (1) |
|
11.3 Example: Evaluation of income models redux |
|
|
246 | (4) |
|
11.4 Extended example: Who has diabetes this time? |
|
|
250 | (5) |
|
|
255 | (3) |
|
|
258 | (1) |
|
|
259 | (2) |
|
11.8 Supplementary exercises |
|
|
261 | (2) |
|
|
263 | (18) |
|
|
263 | (7) |
|
|
270 | (8) |
|
|
278 | (1) |
|
|
278 | (1) |
|
12.5 Supplementary exercises |
|
|
279 | (2) |
|
|
281 | (18) |
|
13.1 Reasoning in reverse |
|
|
281 | (1) |
|
13.2 Extended example: Grouping cancers |
|
|
282 | (3) |
|
13.3 Randomizing functions |
|
|
285 | (1) |
|
13.4 Simulating variability |
|
|
286 | (7) |
|
|
293 | (1) |
|
13.6 Key principles of simulation |
|
|
293 | (3) |
|
|
296 | (1) |
|
|
296 | (2) |
|
13.9 Supplementary exercises |
|
|
298 | (1) |
|
III Part III Topics in Data Science |
|
|
299 | (192) |
|
14 Dynamic and customized data graphics |
|
|
301 | (24) |
|
14.1 Rich Web content using D3.js and htmlwidgets |
|
|
301 | (5) |
|
|
306 | (1) |
|
|
306 | (2) |
|
14.4 Interactive web apps with Shiny |
|
|
308 | (5) |
|
14.5 Customization of ggplot2 graphics |
|
|
313 | (4) |
|
14.6 Extended example: Hot dog eating |
|
|
317 | (5) |
|
|
322 | (1) |
|
|
322 | (2) |
|
14.9 Supplementary exercises |
|
|
324 | (1) |
|
15 Database querying using SQL |
|
|
325 | (38) |
|
|
325 | (4) |
|
|
329 | (2) |
|
|
331 | (1) |
|
15.4 The SQL data manipulation language |
|
|
332 | (20) |
|
15.5 Extended example: FiveThirtyEight flights |
|
|
352 | (8) |
|
|
360 | (1) |
|
|
360 | (1) |
|
|
360 | (2) |
|
15.9 Supplementary exercises |
|
|
362 | (1) |
|
16 Database administration |
|
|
363 | (14) |
|
16.1 Constructing efficient SQL databases |
|
|
363 | (6) |
|
|
369 | (2) |
|
16.3 Extended example: Building a database |
|
|
371 | (4) |
|
|
375 | (1) |
|
|
375 | (1) |
|
|
375 | (1) |
|
16.7 Supplementary exercises |
|
|
376 | (1) |
|
17 Working with geospatial data |
|
|
377 | (30) |
|
17.1 Motivation: What's so great about geospatial data? |
|
|
377 | (3) |
|
17.2 Spatial data structures |
|
|
380 | (2) |
|
|
382 | (9) |
|
17.4 Extended example: Congressional districts |
|
|
391 | (8) |
|
17.5 Effective maps: How (not) to lie |
|
|
399 | (2) |
|
|
401 | (1) |
|
17.7 Playing well with others |
|
|
402 | (1) |
|
|
403 | (1) |
|
|
404 | (1) |
|
17.10 Supplementary exercises |
|
|
405 | (2) |
|
18 Geospatial computations |
|
|
407 | (18) |
|
18.1 Geospatial operations |
|
|
407 | (9) |
|
18.2 Geospatial aggregation |
|
|
416 | (2) |
|
|
418 | (1) |
|
18.4 Extended example: Trail elevations at MacLeish |
|
|
419 | (4) |
|
|
423 | (1) |
|
|
423 | (1) |
|
18.7 Supplementary exercises |
|
|
424 | (1) |
|
|
425 | (26) |
|
19.1 Regular expressions using Macbeth |
|
|
425 | (6) |
|
19.2 Extended example: Analyzing textual data from arXiv.org |
|
|
431 | (14) |
|
|
445 | (3) |
|
|
448 | (1) |
|
|
448 | (2) |
|
19.6 Supplementary exercises |
|
|
450 | (1) |
|
|
451 | (26) |
|
20.1 Introduction to network science |
|
|
451 | (5) |
|
20.2 Extended example: Six degrees of Kristen Stewart |
|
|
456 | (9) |
|
|
465 | (2) |
|
20.4 Extended example: 1996 men's college basketball |
|
|
467 | (7) |
|
|
474 | (1) |
|
|
475 | (1) |
|
20.7 Supplementary exercises |
|
|
475 | (2) |
|
21 Epilogue: Towards "big data" |
|
|
477 | (14) |
|
|
477 | (2) |
|
21.2 Tools for bigger data |
|
|
479 | (10) |
|
|
489 | (1) |
|
|
489 | (1) |
|
|
490 | (1) |
|
|
491 | (82) |
|
A Packages used in this book |
|
|
493 | (6) |
|
|
493 | (1) |
|
|
493 | (5) |
|
|
498 | (1) |
|
B Introduction to R and RStudio |
|
|
499 | (20) |
|
|
499 | (1) |
|
|
500 | (1) |
|
B.3 Fundamental structures and objects |
|
|
501 | (7) |
|
|
508 | (6) |
|
|
514 | (1) |
|
|
515 | (2) |
|
B.7 Supplementary exercises |
|
|
517 | (2) |
|
|
519 | (12) |
|
|
519 | (1) |
|
|
519 | (3) |
|
C.3 Extended example: Law of large numbers |
|
|
522 | (3) |
|
C.4 Non-standard evaluation |
|
|
525 | (2) |
|
C.5 Debugging and defensive coding |
|
|
527 | (2) |
|
|
529 | (1) |
|
|
529 | (1) |
|
C.8 Supplementary exercises |
|
|
530 | (1) |
|
D Reproducible analysis and workflow |
|
|
531 | (10) |
|
D.1 Scriptable statistical computing |
|
|
532 | (1) |
|
D.2 Reproducible analysis with R Markdown |
|
|
532 | (3) |
|
D.3 Projects and version control |
|
|
535 | (2) |
|
|
537 | (1) |
|
|
537 | (3) |
|
D.6 Supplementary exercises |
|
|
540 | (1) |
|
|
541 | (22) |
|
E.1 Simple linear regression |
|
|
541 | (5) |
|
|
546 | (6) |
|
E.3 Inference for regression |
|
|
552 | (1) |
|
E.4 Assumptions underlying regression |
|
|
553 | (3) |
|
|
556 | (3) |
|
|
559 | (2) |
|
|
561 | (1) |
|
E.8 Supplementary exercises |
|
|
562 | (1) |
|
F Setting up a database server |
|
|
563 | (10) |
|
|
563 | (1) |
|
|
564 | (3) |
|
|
567 | (1) |
|
|
568 | (5) |
Bibliography |
|
573 | (16) |
Indices |
|
589 | (1) |
Subject index |
|
590 | (32) |
R index |
|
622 | |