Preface |
|
xxv | |
Acknowledgments |
|
xxvii | |
Introduction |
|
xxix | |
|
1 Getting started with R and RStudio |
|
|
1 | (10) |
|
|
1 | (1) |
|
|
1 | (1) |
|
|
2 | (1) |
|
|
3 | (5) |
|
|
3 | (2) |
|
|
5 | (1) |
|
1.4.3 Running commands while editing scripts |
|
|
6 | (2) |
|
1.4.4 Changing global options |
|
|
8 | (1) |
|
1.5 Installing R packages |
|
|
8 | (3) |
I R |
|
11 | (74) |
|
|
13 | (32) |
|
2.1 Case study: US Gun Murders |
|
|
13 | (2) |
|
|
15 | (5) |
|
|
15 | (1) |
|
|
15 | (1) |
|
|
16 | (2) |
|
2.2.4 Other prebuilt objects |
|
|
18 | (1) |
|
|
18 | (1) |
|
2.2.6 Saving your workspace |
|
|
19 | (1) |
|
|
19 | (1) |
|
2.2.8 Commenting your code |
|
|
20 | (1) |
|
|
20 | (1) |
|
|
21 | (6) |
|
|
21 | (1) |
|
2.4.2 Examining an object |
|
|
21 | (1) |
|
|
22 | (1) |
|
2.4.4 Vectors: numerics, characters, and logical |
|
|
23 | (1) |
|
|
24 | (1) |
|
|
24 | (2) |
|
|
26 | (1) |
|
|
27 | (1) |
|
|
28 | (3) |
|
|
28 | (1) |
|
|
29 | (1) |
|
|
29 | (1) |
|
|
30 | (1) |
|
|
31 | (1) |
|
2.7.1 Not availables (NA) |
|
|
32 | (1) |
|
|
32 | (1) |
|
|
33 | (2) |
|
|
33 | (1) |
|
|
33 | (1) |
|
|
34 | (1) |
|
|
34 | (1) |
|
2.9.5 Beware of recycling |
|
|
35 | (1) |
|
|
35 | (1) |
|
|
36 | (2) |
|
2.11.1 Rescaling a vector |
|
|
37 | (1) |
|
|
37 | (1) |
|
|
38 | (1) |
|
|
38 | (2) |
|
2.13.1 Subsetting with logicals |
|
|
38 | (1) |
|
|
39 | (1) |
|
|
39 | (1) |
|
|
40 | (1) |
|
|
40 | (1) |
|
|
40 | (1) |
|
|
41 | (3) |
|
|
41 | (1) |
|
|
42 | (1) |
|
|
43 | (1) |
|
|
43 | (1) |
|
|
44 | (1) |
|
|
45 | (8) |
|
3.1 Conditional expressions |
|
|
45 | (2) |
|
|
47 | (1) |
|
|
48 | (1) |
|
|
49 | (1) |
|
3.5 Vectorization and functionals |
|
|
50 | (1) |
|
|
51 | (2) |
|
|
53 | (22) |
|
|
53 | (1) |
|
|
54 | (1) |
|
4.3 Manipulating data frames |
|
|
55 | (2) |
|
4.3.1 Adding a column with mutate |
|
|
55 | (1) |
|
4.3.2 Subsetting with filter |
|
|
56 | (1) |
|
4.3.3 Selecting columns with select |
|
|
56 | (1) |
|
|
57 | (1) |
|
|
58 | (1) |
|
|
59 | (1) |
|
|
60 | (4) |
|
|
60 | (2) |
|
|
62 | (1) |
|
4.7.3 Group then summarize with group_by |
|
|
63 | (1) |
|
|
64 | (1) |
|
|
64 | (1) |
|
|
65 | (1) |
|
|
65 | (1) |
|
|
66 | (3) |
|
4.10.1 Tibbles display better |
|
|
67 | (1) |
|
4.10.2 Subsets of tibbles are tibbles |
|
|
67 | (1) |
|
4.10.3 Tibbles can have complex entries |
|
|
68 | (1) |
|
4.10.4 Tibbles can be grouped |
|
|
68 | (1) |
|
4.10.5 Create a tibble using tibble instead of data.frame |
|
|
68 | (1) |
|
|
69 | (1) |
|
|
70 | (1) |
|
|
71 | (2) |
|
4.14 Tidyverse conditionals |
|
|
73 | (1) |
|
|
73 | (1) |
|
|
73 | (1) |
|
|
74 | (1) |
|
|
75 | (10) |
|
5.1 Paths and the working directory |
|
|
76 | (3) |
|
|
76 | (1) |
|
5.1.2 Relative and full paths |
|
|
77 | (1) |
|
5.1.3 The working directory |
|
|
77 | (1) |
|
5.1.4 Generating path names |
|
|
78 | (1) |
|
5.1.5 Copying files using paths |
|
|
78 | (1) |
|
5.2 The readr and readxl packages |
|
|
79 | (1) |
|
|
79 | (1) |
|
|
80 | (1) |
|
|
80 | (1) |
|
|
81 | (1) |
|
5.5 R-base importing functions |
|
|
82 | (1) |
|
|
82 | (1) |
|
5.6 Text versus binary files |
|
|
83 | (1) |
|
|
83 | (1) |
|
5.8 Organizing data with spreadsheets |
|
|
84 | (1) |
|
|
84 | (1) |
II Data Visualization |
|
85 | (128) |
|
6 Introduction to data visualization |
|
|
87 | (4) |
|
|
91 | (18) |
|
7.1 The components of a graph |
|
|
92 | (1) |
|
|
93 | (1) |
|
|
94 | (1) |
|
|
95 | (1) |
|
|
96 | (2) |
|
7.5.1 Tinkering with arguments |
|
|
97 | (1) |
|
7.6 Global versus local aesthetic mappings |
|
|
98 | (1) |
|
|
99 | (1) |
|
|
100 | (1) |
|
|
101 | (1) |
|
7.10 Annotation, shapes, and adjustments |
|
|
102 | (1) |
|
|
103 | (1) |
|
7.12 Putting it all together |
|
|
104 | (1) |
|
7.13 Quick plots with qplot |
|
|
105 | (1) |
|
|
106 | (1) |
|
|
106 | (3) |
|
8 Visualizing data distributions |
|
|
109 | (32) |
|
|
109 | (1) |
|
8.2 Case study: describing student heights |
|
|
110 | (1) |
|
8.3 Distribution function |
|
|
110 | (1) |
|
8.4 Cumulative distribution functions |
|
|
111 | (1) |
|
|
112 | (1) |
|
|
113 | (5) |
|
8.6.1 Interpreting the y-axis |
|
|
117 | (1) |
|
8.6.2 Densities permit stratification |
|
|
118 | (1) |
|
|
118 | (4) |
|
8.8 The normal distribution |
|
|
122 | (2) |
|
|
124 | (1) |
|
8.10 Quantile-quantile plots |
|
|
125 | (2) |
|
|
127 | (1) |
|
|
127 | (2) |
|
|
129 | (1) |
|
8.14 Case study: describing student heights (continued) |
|
|
129 | (2) |
|
|
131 | (1) |
|
|
132 | (8) |
|
|
133 | (1) |
|
|
134 | (1) |
|
|
135 | (1) |
|
|
136 | (1) |
|
|
136 | (1) |
|
|
137 | (1) |
|
|
138 | (2) |
|
|
140 | (1) |
|
9 Data visualization in practice |
|
|
141 | (30) |
|
9.1 Case study: new insights on poverty |
|
|
141 | (2) |
|
9.1.1 Hans Rosling's quiz |
|
|
142 | (1) |
|
|
143 | (1) |
|
|
144 | (3) |
|
|
146 | (1) |
|
9.3.2 Fixed scales for better comparisons |
|
|
147 | (1) |
|
|
147 | (4) |
|
9.4.1 Labels instead of legends |
|
|
150 | (1) |
|
|
151 | (4) |
|
|
151 | (2) |
|
|
153 | (1) |
|
9.5.3 Transform the values or the scale? |
|
|
154 | (1) |
|
9.6 Visualizing multimodal distributions |
|
|
155 | (1) |
|
9.7 Comparing multiple distributions with boxplots and ridge plots |
|
|
155 | (12) |
|
|
156 | (1) |
|
|
157 | (2) |
|
9.7.3 Example: 1970 versus 2010 income distributions |
|
|
159 | (5) |
|
9.7.4 Accessing computed variables |
|
|
164 | (3) |
|
|
167 | (1) |
|
9.8 The ecological fallacy and importance of showing the data |
|
|
167 | (4) |
|
9.8.1 Logistic transformation |
|
|
168 | (1) |
|
|
168 | (3) |
|
10 Data visualization principles |
|
|
171 | (34) |
|
10.1 Encoding data using visual cues |
|
|
171 | (3) |
|
10.2 Know when to include 0 |
|
|
174 | (3) |
|
10.3 Do not distort quantities |
|
|
177 | (2) |
|
10.4 Order categories by a meaningful value |
|
|
179 | (1) |
|
|
180 | (3) |
|
|
183 | (5) |
|
|
183 | (1) |
|
10.6.2 Align plots vertically to see horizontal changes and horizontally to see vertical changes |
|
|
184 | (1) |
|
10.6.3 Consider transformations |
|
|
185 | (2) |
|
10.6.4 Visual cues to be compared should be adjacent |
|
|
187 | (1) |
|
|
188 | (1) |
|
10.7 Think of the color blind |
|
|
188 | (1) |
|
10.8 Plots for two variables |
|
|
189 | (2) |
|
|
189 | (2) |
|
|
191 | (1) |
|
10.9 Encoding a third variable |
|
|
191 | (2) |
|
10.10 Avoid pseudo-three-dimensional plots |
|
|
193 | (2) |
|
10.11 Avoid too many significant digits |
|
|
195 | (1) |
|
|
196 | (1) |
|
|
196 | (5) |
|
10.14 Case study: vaccines and infectious diseases |
|
|
201 | (3) |
|
|
204 | (1) |
|
|
205 | (8) |
|
|
205 | (1) |
|
|
206 | (1) |
|
11.3 The inter quartile range (IQR) |
|
|
206 | (1) |
|
11.4 Tukey's definition of an outlier |
|
|
207 | (1) |
|
11.5 Median absolute deviation |
|
|
208 | (1) |
|
|
208 | (1) |
|
11.7 Case study: self-reported student heights |
|
|
209 | (4) |
III Statistics with R |
|
213 | (172) |
|
12 Introduction to statistics with R |
|
|
215 | (2) |
|
|
217 | (24) |
|
13.1 Discrete probability |
|
|
217 | (1) |
|
13.1.1 Relative frequency |
|
|
217 | (1) |
|
|
218 | (1) |
|
13.1.3 Probability distributions |
|
|
218 | (1) |
|
13.2 Monte Carlo simulations for categorical data |
|
|
218 | (3) |
|
13.2.1 Setting the random seed |
|
|
220 | (1) |
|
13.2.2 With and without replacement |
|
|
220 | (1) |
|
|
221 | (1) |
|
13.4 Conditional probabilities |
|
|
221 | (1) |
|
13.5 Addition and multiplication rules |
|
|
222 | (1) |
|
13.5.1 Multiplication rule |
|
|
222 | (1) |
|
13.5.2 Multiplication rule under independence |
|
|
222 | (1) |
|
|
223 | (1) |
|
13.6 Combinations and permutations |
|
|
223 | (4) |
|
13.6.1 Monte Carlo example |
|
|
227 | (1) |
|
|
227 | (4) |
|
13.7.1 Monty Hall problem |
|
|
228 | (1) |
|
|
229 | (2) |
|
13.8 Infinity in practice |
|
|
231 | (1) |
|
|
232 | (2) |
|
13.10 Continuous probability |
|
|
234 | (1) |
|
13.11 Theoretical continuous distributions |
|
|
235 | (3) |
|
13.11.1 Theoretical distributions as approximations |
|
|
235 | (2) |
|
13.11.2 The probability density |
|
|
237 | (1) |
|
13.12 Monte Carlo simulations for continuous variables |
|
|
238 | (1) |
|
13.13 Continuous distributions |
|
|
239 | (1) |
|
|
239 | (2) |
|
|
241 | (20) |
|
|
241 | (1) |
|
|
242 | (1) |
|
14.3 The probability distribution of a random variable |
|
|
243 | (2) |
|
14.4 Distributions versus probability distributions |
|
|
245 | (1) |
|
14.5 Notation for random variables |
|
|
245 | (1) |
|
14.6 The expected value and standard error |
|
|
246 | (3) |
|
14.6.1 Population SD versus the sample SD |
|
|
248 | (1) |
|
14.7 Central Limit Theorem |
|
|
249 | (1) |
|
14.7.1 How large is large in the Central Limit Theorem? |
|
|
250 | (1) |
|
14.8 Statistical properties of averages |
|
|
250 | (2) |
|
14.9 Law of large numbers |
|
|
252 | (1) |
|
14.9.1 Misinterpreting law of averages |
|
|
252 | (1) |
|
|
252 | (2) |
|
14.11 Case study: The Big Short |
|
|
254 | (6) |
|
14.11.1 Interest rates explained with chance model |
|
|
254 | (3) |
|
|
257 | (3) |
|
|
260 | (1) |
|
|
261 | (26) |
|
|
261 | (3) |
|
15.1.1 The sampling model for polls |
|
|
262 | (2) |
|
15.2 Populations, samples, parameters, and estimates |
|
|
264 | (3) |
|
15.2.1 The sample average |
|
|
264 | (1) |
|
|
265 | (1) |
|
15.2.3 Polling versus forecasting |
|
|
265 | (1) |
|
15.2.4 Properties of our estimate: expected value and standard error |
|
|
266 | (1) |
|
|
267 | (1) |
|
15.4 Central Limit Theorem in practice |
|
|
268 | (4) |
|
15.4.1 A Monte Carlo simulation |
|
|
269 | (2) |
|
|
271 | (1) |
|
15.4.3 Bias: why not run a very large poll? |
|
|
271 | (1) |
|
|
272 | (2) |
|
15.6 Confidence intervals |
|
|
274 | (3) |
|
15.6.1 A Monte Carlo simulation |
|
|
276 | (1) |
|
15.6.2 The correct language |
|
|
277 | (1) |
|
|
277 | (1) |
|
|
278 | (1) |
|
|
279 | (1) |
|
|
280 | (6) |
|
|
281 | (1) |
|
15.10.2 Two-by-two tables |
|
|
282 | (1) |
|
|
282 | (1) |
|
|
283 | (1) |
|
15.10.5 Confidence intervals for the odds ratio |
|
|
284 | (1) |
|
15.10.6 Small count correction |
|
|
285 | (1) |
|
15.10.7 Large samples, small p-values |
|
|
285 | (1) |
|
|
286 | (1) |
|
|
287 | (34) |
|
|
288 | (5) |
|
|
290 | (2) |
|
|
292 | (1) |
|
|
293 | (2) |
|
|
295 | (3) |
|
|
298 | (1) |
|
|
298 | (1) |
|
16.5 Bayes theorem simulation |
|
|
299 | (2) |
|
|
300 | (1) |
|
|
301 | (2) |
|
|
303 | (2) |
|
16.8 Case study: election forecasting |
|
|
305 | (12) |
|
|
306 | (1) |
|
|
307 | (1) |
|
16.8.3 Mathematical representations of models |
|
|
307 | (3) |
|
16.8.4 Predicting the electoral college |
|
|
310 | (4) |
|
|
314 | (3) |
|
|
317 | (1) |
|
|
318 | (3) |
|
|
321 | (14) |
|
17.1 Case study: is height hereditary? |
|
|
321 | (1) |
|
17.2 The correlation coefficient |
|
|
322 | (4) |
|
17.2.1 Sample correlation is a random variable |
|
|
324 | (2) |
|
17.2.2 Correlation is not always a useful summary |
|
|
326 | (1) |
|
17.3 Conditional expectations |
|
|
326 | (3) |
|
|
329 | (5) |
|
17.4.1 Regression improves precision |
|
|
330 | (1) |
|
17.4.2 Bivariate normal distribution (advanced) |
|
|
331 | (2) |
|
17.4.3 Variance explained |
|
|
333 | (1) |
|
17.4.4 Warning: there are two regression lines |
|
|
333 | (1) |
|
|
334 | (1) |
|
|
335 | (38) |
|
18.1 Case study: Moneyball |
|
|
335 | (9) |
|
|
336 | (1) |
|
|
337 | (1) |
|
|
338 | (1) |
|
18.1.4 Base on balls or stolen bases? |
|
|
339 | (2) |
|
18.1.5 Regression applied to baseball statistics |
|
|
341 | (3) |
|
|
344 | (4) |
|
18.2.1 Understanding confounding through stratification |
|
|
345 | (3) |
|
18.2.2 Multivariate regression |
|
|
348 | (1) |
|
18.3 Least squares estimates |
|
|
348 | (6) |
|
18.3.1 Interpreting linear models |
|
|
349 | (1) |
|
18.3.2 Least Squares Estimates (LSE) |
|
|
349 | (2) |
|
|
351 | (1) |
|
18.3.4 LSE are random variables |
|
|
352 | (1) |
|
18.3.5 Predicted values are random variables |
|
|
353 | (1) |
|
|
354 | (1) |
|
18.5 Linear regression in the tidyverse |
|
|
355 | (4) |
|
|
358 | (1) |
|
|
359 | (1) |
|
18.7 Case study: Moneyball (continued) |
|
|
360 | (7) |
|
18.7.1 Adding salary and position information |
|
|
364 | (1) |
|
18.7.2 Picking nine players |
|
|
365 | (2) |
|
18.8 The regression fallacy |
|
|
367 | (2) |
|
18.9 Measurement error models |
|
|
369 | (2) |
|
|
371 | (2) |
|
19 Association is not causation |
|
|
373 | (12) |
|
19.1 Spurious correlation |
|
|
373 | (3) |
|
|
376 | (2) |
|
19.3 Reversing cause and effect |
|
|
378 | (1) |
|
|
379 | (3) |
|
19.4.1 Example: UC Berkeley admissions |
|
|
379 | (1) |
|
19.4.2 Confounding explained graphically |
|
|
380 | (1) |
|
19.4.3 Average after stratifying |
|
|
381 | (1) |
|
|
382 | (1) |
|
|
383 | (2) |
IV Data Wrangling |
|
385 | (84) |
|
20 Introduction to data wrangling |
|
|
387 | (2) |
|
|
389 | (8) |
|
|
389 | (2) |
|
|
391 | (1) |
|
|
391 | (3) |
|
|
394 | (1) |
|
|
395 | (2) |
|
|
397 | (10) |
|
|
398 | (4) |
|
|
399 | (1) |
|
|
400 | (1) |
|
|
400 | (1) |
|
|
400 | (1) |
|
|
401 | (1) |
|
|
401 | (1) |
|
|
402 | (1) |
|
|
402 | (1) |
|
|
402 | (1) |
|
|
403 | (2) |
|
|
403 | (1) |
|
|
404 | (1) |
|
|
404 | (1) |
|
|
404 | (1) |
|
|
405 | (2) |
|
|
407 | (8) |
|
|
408 | (1) |
|
|
409 | (2) |
|
|
411 | (1) |
|
|
412 | (1) |
|
|
413 | (2) |
|
|
415 | (34) |
|
|
415 | (2) |
|
24.2 Case study 1: US murders data |
|
|
417 | (2) |
|
24.3 Case study 2: self-reported heights |
|
|
419 | (2) |
|
24.4 How to escape when defining strings |
|
|
421 | (2) |
|
|
423 | (7) |
|
24.5.1 Strings are a regexp |
|
|
423 | (1) |
|
24.5.2 Special characters |
|
|
423 | (2) |
|
|
425 | (1) |
|
|
426 | (1) |
|
|
426 | (1) |
|
|
427 | (1) |
|
24.5.7 Quantifiers: *, ?, + |
|
|
428 | (1) |
|
|
428 | (1) |
|
|
429 | (1) |
|
24.6 Search and replace with regex |
|
|
430 | (3) |
|
24.6.1 Search and replace using groups |
|
|
432 | (1) |
|
24.7 Testing and improving |
|
|
433 | (2) |
|
|
435 | (1) |
|
|
436 | (1) |
|
24.10 Case study 2: self-reported heights (continued) |
|
|
436 | (3) |
|
24.10.1 The extract function |
|
|
437 | (1) |
|
24.10.2 Putting it all together |
|
|
438 | (1) |
|
|
439 | (3) |
|
24.12 Case study 3: extracting tables from a PDF |
|
|
442 | (3) |
|
|
445 | (1) |
|
|
446 | (3) |
|
25 Parsing dates and times |
|
|
449 | (6) |
|
|
449 | (1) |
|
25.2 The lubridate package |
|
|
450 | (3) |
|
|
453 | (2) |
|
|
455 | (14) |
|
26.1 Case study: Trump tweets |
|
|
455 | (2) |
|
|
457 | (5) |
|
|
462 | (5) |
|
|
467 | (2) |
V Machine Learning |
|
469 | (176) |
|
27 Introduction to machine learning |
|
|
471 | (22) |
|
|
471 | (1) |
|
|
472 | (2) |
|
|
474 | (1) |
|
|
474 | (12) |
|
27.4.1 Training and test sets |
|
|
475 | (1) |
|
|
476 | (2) |
|
27.4.3 The confusion matrix |
|
|
478 | (1) |
|
27.4.4 Sensitivity and specificity |
|
|
479 | (2) |
|
27.4.5 Balanced accuracy and F1 score |
|
|
481 | (1) |
|
27.4.6 Prevalence matters in practice |
|
|
482 | (1) |
|
27.4.7 ROC and precision-recall curves |
|
|
483 | (1) |
|
|
484 | (2) |
|
|
486 | (1) |
|
27.6 Conditional probabilities and expectations |
|
|
486 | (3) |
|
27.6.1 Conditional probabilities |
|
|
487 | (1) |
|
27.6.2 Conditional expectations |
|
|
488 | (1) |
|
27.6.3 Conditional expectation minimizes squared loss function |
|
|
488 | (1) |
|
|
489 | (1) |
|
27.8 Case study: is it a 2 or a 7? |
|
|
489 | (4) |
|
|
493 | (14) |
|
|
495 | (2) |
|
|
497 | (1) |
|
28.3 Local weighted regression (loess) |
|
|
498 | (6) |
|
|
502 | (1) |
|
28.3.2 Beware of default smoothing parameters |
|
|
503 | (1) |
|
28.4 Connecting smoothing to machine learning |
|
|
504 | (1) |
|
|
504 | (3) |
|
|
507 | (16) |
|
29.1 Motivation with k-nearest neighbors |
|
|
507 | (6) |
|
|
509 | (1) |
|
|
510 | (1) |
|
29.1.3 Picking the k in kNN |
|
|
511 | (2) |
|
29.2 Mathematical description of cross validation |
|
|
513 | (1) |
|
29.3 K-fold cross validation |
|
|
514 | (3) |
|
|
517 | (1) |
|
|
518 | (3) |
|
|
521 | (2) |
|
|
523 | (6) |
|
30.1 The caret train function |
|
|
523 | (1) |
|
|
524 | (2) |
|
30.3 Example: fitting with loess |
|
|
526 | (3) |
|
31 Examples of algorithms |
|
|
529 | (44) |
|
|
529 | (2) |
|
31.1.1 The predict function |
|
|
530 | (1) |
|
|
531 | (2) |
|
|
533 | (6) |
|
31.3.1 Generalized linear models |
|
|
534 | (4) |
|
31.3.2 Logistic regression with more than one predictor |
|
|
538 | (1) |
|
|
539 | (1) |
|
|
540 | (1) |
|
|
541 | (1) |
|
|
541 | (8) |
|
|
542 | (1) |
|
31.7.2 Controlling prevalence |
|
|
543 | (2) |
|
31.7.3 Quadratic discriminant analysis |
|
|
545 | (2) |
|
31.7.4 Linear discriminant analysis |
|
|
547 | (2) |
|
31.7.5 Connection to distance |
|
|
549 | (1) |
|
31.8 Case study: more than three classes |
|
|
549 | (4) |
|
|
553 | (1) |
|
31.10 Classification and regression trees (CART) |
|
|
554 | (12) |
|
31.10.1 The curse of dimensionality |
|
|
554 | (1) |
|
|
555 | (3) |
|
|
558 | (6) |
|
31.10.4 Classification (decision) trees |
|
|
564 | (2) |
|
|
566 | (5) |
|
|
571 | (2) |
|
32 Machine learning in practice |
|
|
573 | (8) |
|
|
574 | (1) |
|
32.2 k-nearest neighbor and random forest |
|
|
575 | (3) |
|
|
578 | (1) |
|
|
579 | (1) |
|
|
579 | (1) |
|
|
580 | (1) |
|
|
581 | (58) |
|
|
581 | (10) |
|
|
582 | (2) |
|
33.1.2 Converting a vector to a matrix |
|
|
584 | (1) |
|
33.1.3 Row and column summaries |
|
|
585 | (1) |
|
|
586 | (1) |
|
33.1.5 Filtering columns based on summaries |
|
|
586 | (2) |
|
33.1.6 Indexing with matrices |
|
|
588 | (2) |
|
33.1.7 Binarizing the data |
|
|
590 | (1) |
|
33.1.8 Vectorization for matrices |
|
|
590 | (1) |
|
33.1.9 Matrix algebra operations |
|
|
591 | (1) |
|
|
591 | (1) |
|
|
591 | (4) |
|
33.3.1 Euclidean distance |
|
|
592 | (1) |
|
33.3.2 Distance in higher dimensions |
|
|
592 | (1) |
|
33.3.3 Euclidean distance example |
|
|
593 | (2) |
|
|
595 | (1) |
|
33.3.5 Distance between predictors |
|
|
595 | (1) |
|
|
595 | (1) |
|
|
596 | (13) |
|
33.5.1 Preserving distance |
|
|
596 | (3) |
|
33.5.2 Linear transformations (advanced) |
|
|
599 | (1) |
|
33.5.3 Orthogonal transformations (advanced) |
|
|
600 | (2) |
|
33.5.4 Principal component analysis |
|
|
602 | (2) |
|
|
604 | (3) |
|
|
607 | (2) |
|
|
609 | (1) |
|
33.7 Recommendation systems |
|
|
610 | (6) |
|
|
610 | (2) |
|
33.7.2 Recommendation systems as a machine learning challenge |
|
|
612 | (1) |
|
|
612 | (1) |
|
|
613 | (1) |
|
33.7.5 Modeling movie effects |
|
|
614 | (1) |
|
|
615 | (1) |
|
|
616 | (1) |
|
|
617 | (7) |
|
|
617 | (2) |
|
33.9.2 Penalized least squares |
|
|
619 | (3) |
|
33.9.3 Choosing the penalty terms |
|
|
622 | (2) |
|
|
624 | (1) |
|
33.11 Matrix factorization |
|
|
625 | (8) |
|
|
628 | (2) |
|
33.11.2 Connection to SVD and PCA |
|
|
630 | (3) |
|
|
633 | (6) |
|
|
639 | (6) |
|
34.1 Hierarchical clustering |
|
|
640 | (2) |
|
|
642 | (1) |
|
|
642 | (1) |
|
|
643 | (1) |
|
|
644 | (1) |
VI Productivity Tools |
|
645 | (50) |
|
35 Introduction to productivity tools |
|
|
647 | (2) |
|
|
649 | (18) |
|
|
649 | (1) |
|
|
650 | (1) |
|
|
650 | (3) |
|
36.3.1 Directories and subdirectories |
|
|
651 | (1) |
|
36.3.2 The home directory |
|
|
651 | (1) |
|
|
652 | (1) |
|
|
653 | (1) |
|
|
653 | (4) |
|
36.4.1 ls: Listing directory content |
|
|
654 | (1) |
|
36.4.2 mkdir and rmdir: make and remove a directory |
|
|
654 | (1) |
|
36.4.3 cd: navigating the filesystem by changing directories |
|
|
655 | (2) |
|
|
657 | (1) |
|
|
658 | (2) |
|
|
658 | (1) |
|
|
659 | (1) |
|
36.6.3 rm: removing files |
|
|
659 | (1) |
|
36.6.4 less: looking at a file |
|
|
659 | (1) |
|
36.7 Preparing for a data science project |
|
|
660 | (1) |
|
|
661 | (6) |
|
|
661 | (1) |
|
|
662 | (1) |
|
|
662 | (1) |
|
|
663 | (1) |
|
36.8.5 Environment variables |
|
|
663 | (1) |
|
|
664 | (1) |
|
|
664 | (1) |
|
36.8.8 Permissions and file types |
|
|
665 | (1) |
|
36.8.9 Commands you should learn |
|
|
665 | (1) |
|
36.8.10 File manipulation in R |
|
|
665 | (2) |
|
|
667 | (16) |
|
37.1 Why use Git and GitHub? |
|
|
667 | (1) |
|
|
667 | (3) |
|
|
670 | (1) |
|
|
671 | (5) |
|
|
672 | (4) |
|
37.5 Initializing a Git directory |
|
|
676 | (2) |
|
37.6 Using Git and GitHub in RStudio |
|
|
678 | (5) |
|
38 Reproducible projects with RStudio and R markdown |
|
|
683 | (12) |
|
|
683 | (3) |
|
|
686 | (4) |
|
|
688 | (1) |
|
|
688 | (1) |
|
|
689 | (1) |
|
|
689 | (1) |
|
38.2.5 More on R markdown |
|
|
690 | (1) |
|
38.3 Organizing a data science project |
|
|
690 | (5) |
|
38.3.1 Create directories in Unix |
|
|
690 | (1) |
|
38.3.2 Create an RStudio project |
|
|
691 | (1) |
|
38.3.3 Edit some R scripts |
|
|
692 | (1) |
|
38.3.4 Create some more directories using Unix |
|
|
693 | (1) |
|
|
693 | (1) |
|
38.3.6 Initializing a Git directory |
|
|
693 | (1) |
|
38.3.7 Add, commit, and push files using RStudio |
|
|
694 | (1) |
Index |
|
695 | |