|
|
1 | (18) |
|
1.1 What Is Data Science? |
|
|
1 | (2) |
|
|
3 | (2) |
|
1.3 Authors of the Federalist Papers |
|
|
5 | (1) |
|
1.4 Forecasting NASDAQ Stock Prices |
|
|
6 | (2) |
|
|
8 | (1) |
|
|
8 | (3) |
|
|
11 | (1) |
|
|
12 | (1) |
|
|
13 | (1) |
|
1.10 Terminology and Notation |
|
|
14 | (2) |
|
1.10.1 Matrices and Vectors |
|
|
14 | (2) |
|
|
16 | (3) |
|
|
|
2 Data Mapping and Data Dictionaries |
|
|
19 | (32) |
|
|
19 | (1) |
|
2.2 Political Contributions |
|
|
20 | (2) |
|
|
22 | (1) |
|
2.4 Tutorial: Big Contributors |
|
|
22 | (5) |
|
|
27 | (4) |
|
2.5.1 Notation and Terminology |
|
|
28 | (1) |
|
2.5.2 The Political Contributions Example |
|
|
29 | (1) |
|
|
30 | (1) |
|
2.6 Tutorial: Election Cycle Contributions |
|
|
31 | (7) |
|
|
38 | (5) |
|
|
41 | (2) |
|
2.8 Tutorial: Computing Similarity |
|
|
43 | (4) |
|
2.9 Concluding Remarks About Dictionaries |
|
|
47 | (1) |
|
|
48 | (3) |
|
|
48 | (1) |
|
|
49 | (2) |
|
3 Scalable Algorithms and Associative Statistics |
|
|
51 | (54) |
|
|
51 | (2) |
|
3.2 Example: Obesity in the United States |
|
|
53 | (1) |
|
3.3 Associative Statistics |
|
|
54 | (1) |
|
3.4 Univariate Observations |
|
|
55 | (5) |
|
|
57 | (1) |
|
3.4.2 Histogram Construction |
|
|
58 | (2) |
|
|
60 | (1) |
|
3.6 Tutorial: Histogram Construction |
|
|
61 | (13) |
|
|
74 | (1) |
|
|
74 | (6) |
|
3.7.1 Notation and Terminology |
|
|
75 | (1) |
|
|
76 | (3) |
|
3.7.3 The Augmented Moment Matrix |
|
|
79 | (1) |
|
|
80 | (1) |
|
3.8 Tutorial: Computing the Correlation Matrix |
|
|
80 | (8) |
|
|
87 | (1) |
|
3.9 Introduction to Linear Regression |
|
|
88 | (7) |
|
3.9.1 The Linear Regression Model |
|
|
89 | (1) |
|
|
90 | (3) |
|
3.9.3 Accuracy Assessment |
|
|
93 | (1) |
|
3.9.4 Computing R2adjusted |
|
|
94 | (1) |
|
3.10 Tutorial: Computing β |
|
|
95 | (7) |
|
|
101 | (1) |
|
|
102 | (3) |
|
|
102 | (1) |
|
|
103 | (2) |
|
|
105 | (28) |
|
|
105 | (1) |
|
|
106 | (5) |
|
4.2.1 The Hadoop Distributed File System |
|
|
106 | (2) |
|
|
108 | (1) |
|
|
108 | (2) |
|
|
110 | (1) |
|
4.3 Developing a Hadoop Application |
|
|
111 | (1) |
|
|
111 | (2) |
|
4.5 The Command Line Environment |
|
|
113 | (1) |
|
4.6 Tutorial: Programming a MapReduce Algorithm |
|
|
113 | (11) |
|
|
116 | (4) |
|
|
120 | (3) |
|
|
123 | (1) |
|
4.7 Tutorial: Using Amazon Web Services |
|
|
124 | (4) |
|
|
128 | (1) |
|
|
128 | (5) |
|
|
128 | (1) |
|
|
128 | (5) |
|
Part II Extracting Information from Data |
|
|
|
|
133 | (28) |
|
|
133 | (2) |
|
5.2 Principles of Data Visualization |
|
|
135 | (3) |
|
|
138 | (10) |
|
|
139 | (3) |
|
5.3.2 Bivariate and Multivariate Data |
|
|
142 | (6) |
|
5.4 Harnessing the Machine |
|
|
148 | (10) |
|
|
151 | (1) |
|
|
152 | (1) |
|
|
153 | (1) |
|
|
154 | (1) |
|
|
155 | (1) |
|
|
156 | (1) |
|
|
157 | (1) |
|
|
158 | (3) |
|
6 Linear Regression Methods |
|
|
161 | (56) |
|
|
161 | (1) |
|
6.2 The Linear Regression Model |
|
|
162 | (14) |
|
6.2.1 Example: Depression, Fatalism, and Simplicity |
|
|
164 | (2) |
|
|
166 | (2) |
|
6.2.3 Confidence Intervals |
|
|
168 | (2) |
|
6.2.4 Distributional Conditions |
|
|
170 | (1) |
|
|
171 | (4) |
|
|
175 | (1) |
|
|
176 | (1) |
|
|
177 | (4) |
|
|
181 | (1) |
|
6.5 Tutorial: Large Data Sets and R |
|
|
181 | (6) |
|
|
187 | (8) |
|
|
189 | (3) |
|
6.6.2 The Extra Sums-of-Squares F-test |
|
|
192 | (3) |
|
|
195 | (5) |
|
6.7.1 An Incongruous Result |
|
|
200 | (1) |
|
6.8 Analysis of Residuals |
|
|
200 | (8) |
|
|
201 | (1) |
|
6.8.2 Example: The Bike Share Problem |
|
|
202 | (2) |
|
|
204 | (4) |
|
6.9 Tutorial: Residual Analysis |
|
|
208 | (3) |
|
|
210 | (1) |
|
|
211 | (6) |
|
|
211 | (1) |
|
|
212 | (5) |
|
|
217 | (36) |
|
|
217 | (2) |
|
7.2 The Behavioral Risk Factor Surveillance System |
|
|
219 | (3) |
|
7.2.1 Estimation of Prevalence |
|
|
220 | (1) |
|
7.2.2 Estimation of Incidence |
|
|
221 | (1) |
|
7.3 Tutorial: Diabetes Prevalence and Incidence |
|
|
222 | (9) |
|
7.4 Predicting At-Risk Individuals |
|
|
231 | (5) |
|
7.4.1 Sensitivity and Specificity |
|
|
234 | (2) |
|
7.5 Tutorial: Identifying At-Risk Individuals |
|
|
236 | (7) |
|
7.6 Unusual Demographic Attribute Vectors |
|
|
243 | (2) |
|
7.7 Tutorial: Building Neighborhood Sets |
|
|
245 | (4) |
|
|
247 | (2) |
|
|
249 | (4) |
|
|
249 | (1) |
|
|
250 | (3) |
|
|
253 | (26) |
|
|
253 | (1) |
|
8.2 Hierarchical Agglomerative Clustering |
|
|
254 | (1) |
|
|
255 | (3) |
|
8.4 Tutorial: Hierarchical Clustering of States |
|
|
258 | (8) |
|
|
264 | (2) |
|
8.5 The k-Means Algorithm |
|
|
266 | (2) |
|
8.6 Tutorial: The k-Means Algorithm |
|
|
268 | (6) |
|
|
273 | (1) |
|
|
274 | (5) |
|
|
274 | (1) |
|
|
274 | (5) |
|
Part III Predictive Analytics |
|
|
|
9 k-Nearest Neighbor Prediction Functions |
|
|
279 | (34) |
|
|
279 | (3) |
|
9.1.1 The Prediction Task |
|
|
280 | (2) |
|
9.2 Notation and Terminology |
|
|
282 | (1) |
|
|
283 | (1) |
|
9.4 The k-Nearest Neighbor Prediction Function |
|
|
284 | (2) |
|
9.5 Exponentially Weighted k-Nearest Neighbors |
|
|
286 | (1) |
|
9.6 Tutorial: Digit Recognition |
|
|
287 | (8) |
|
|
294 | (1) |
|
|
295 | (3) |
|
|
297 | (1) |
|
9.8 A;-Nearest Neighbor Regression |
|
|
298 | (1) |
|
9.9 Forecasting the S&P 500 |
|
|
299 | (1) |
|
9.10 Tutorial: Forecasting by Pattern Recognition |
|
|
300 | (8) |
|
|
307 | (1) |
|
|
308 | (2) |
|
|
310 | (3) |
|
|
310 | (1) |
|
|
310 | (3) |
|
10 The Multinomial Naive Bayes Prediction Function |
|
|
313 | (30) |
|
|
313 | (1) |
|
10.2 The Federalist Papers |
|
|
314 | (1) |
|
10.3 The Multinomial Naive Bayes Prediction Function |
|
|
315 | (4) |
|
10.3.1 Posterior Probabilities |
|
|
317 | (2) |
|
10.4 Tutorial: Reducing the Federalist Papers |
|
|
319 | (6) |
|
|
325 | (1) |
|
10.5 Tutorial: Predicting Authorship of the Disputed Federalist Papers |
|
|
325 | (4) |
|
|
329 | (1) |
|
10.6 Tutorial: Customer Segmentation |
|
|
329 | (9) |
|
10.6.1 Additive Smoothing |
|
|
330 | (2) |
|
|
332 | (5) |
|
|
337 | (1) |
|
|
338 | (5) |
|
|
338 | (1) |
|
|
339 | (4) |
|
|
343 | (38) |
|
|
343 | (2) |
|
11.2 Tutorial: Working with Time |
|
|
345 | (5) |
|
|
350 | (4) |
|
|
350 | (1) |
|
11.3.2 Estimation of the Mean and Variance |
|
|
350 | (2) |
|
11.3.3 Exponential Forecasting |
|
|
352 | (1) |
|
|
353 | (1) |
|
11.4 Tutorial: Computing ρτ |
|
|
354 | (5) |
|
|
359 | (1) |
|
11.5 Drift and Forecasting |
|
|
359 | (1) |
|
11.6 Holt-Winters Exponential Forecasting |
|
|
360 | (3) |
|
|
362 | (1) |
|
11.7 Tutorial: Holt-Winters Forecasting |
|
|
363 | (4) |
|
11.8 Regression-Based Forecasting of Stock Prices |
|
|
367 | (1) |
|
11.9 Tutorial: Regression-Based Forecasting |
|
|
368 | (6) |
|
|
373 | (1) |
|
11.10 Time-Varying Regression Estimators |
|
|
374 | (1) |
|
11.11 Tutorial: Time-Varying Regression Estimators |
|
|
375 | (2) |
|
|
377 | (1) |
|
|
377 | (4) |
|
|
377 | (1) |
|
|
378 | (3) |
|
|
381 | (22) |
|
|
381 | (1) |
|
12.2 Forecasting with a NASDAQ Quotation Stream |
|
|
382 | (2) |
|
12.2.1 Forecasting Algorithms |
|
|
383 | (1) |
|
12.3 Tutorial: Forecasting the Apple Inc. Stream |
|
|
384 | (6) |
|
|
389 | (1) |
|
12.4 The Twitter Streaming API |
|
|
390 | (1) |
|
12.5 Tutorial: Tapping the Twitter Stream |
|
|
391 | (5) |
|
|
395 | (1) |
|
|
396 | (2) |
|
12.7 Tutorial: Sentiment Analysis of Hashtag Groups |
|
|
398 | (2) |
|
|
400 | (3) |
|
|
403 | (14) |
|
B Accessing the Twitter API |
|
|
417 | (2) |
References |
|
419 | (4) |
Index |
|
423 | |