Foreword |
|
xix | |
|
Foreword |
|
xxi | |
|
Preface to the Python Edition |
|
xxiii | |
Acknowledgments |
|
xxvii | |
Part I Preliminaries |
|
|
|
3 | (12) |
|
1.1 What Is Business Analytics' |
|
|
3 | (2) |
|
|
5 | (1) |
|
1.3 Data Mining and Related Terms |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
7 | (1) |
|
1.6 Why Are There So Many Different Methods? |
|
|
8 | (1) |
|
1.7 Terminology and Notation |
|
|
9 | (2) |
|
1.8 Road Maps to This Book |
|
|
11 | (4) |
|
|
11 | (4) |
|
Chapter 2 Overview of the Data Mining Process |
|
|
15 | (46) |
|
|
15 | (1) |
|
2.2 Core Ideas in Data Mining |
|
|
16 | (3) |
|
|
16 | (1) |
|
|
16 | (1) |
|
Association Rules and Recommendation Systems |
|
|
16 | (1) |
|
|
17 | (1) |
|
Data Reduction and Dimension Reduction |
|
|
17 | (1) |
|
Data Exploration and Visualization |
|
|
17 | (1) |
|
Supervised and Unsupervised Learning |
|
|
18 | (1) |
|
2.3 The Steps in Data Mining |
|
|
19 | (2) |
|
|
21 | (13) |
|
|
21 | (1) |
|
Predicting Home Values in the West Roxbury Neighborhood |
|
|
21 | (1) |
|
Loading and Looking at the Data in Python |
|
|
22 | (3) |
|
|
25 | (1) |
|
|
25 | (1) |
|
Oversampling Rare Events in Classification Tasks |
|
|
26 | (1) |
|
Preprocessing and Cleaning the Data |
|
|
27 | (7) |
|
2.5 Predictive Power and Overfitting |
|
|
34 | (6) |
|
|
34 | (2) |
|
Creation and Use of Data Partitions |
|
|
36 | (4) |
|
2.6 Building a Predictive Model |
|
|
40 | (4) |
|
|
40 | (4) |
|
2.7 Using Python for Data Mining on a Local Machine |
|
|
44 | (1) |
|
2.8 Automating Data Mining Solutions |
|
|
45 | (2) |
|
2.9 Ethical Practice in Data Mining |
|
|
47 | (9) |
|
Data Mining Software: The State of the Market (by Herb Edelstein) |
|
|
52 | (4) |
|
|
56 | (5) |
Part II Data Exploration And Dimension Reduction |
|
|
Chapter 3 Data Visualization |
|
|
61 | (38) |
|
|
61 | (3) |
|
|
64 | (1) |
|
Example 1: Boston Housing Data |
|
|
64 | (1) |
|
Example 2: Ridership on Amtrak Trains |
|
|
65 | (1) |
|
3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots |
|
|
65 | (9) |
|
Distribution Plots: Boxplots and Histograms |
|
|
68 | (3) |
|
Heatmaps: Visualizing Correlations and Missing Values |
|
|
71 | (3) |
|
3.4 Multidimensional Visualization |
|
|
74 | (14) |
|
Adding Variables: Color, Size, Shape, Multiple Panels, and Animation |
|
|
74 | (3) |
|
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering |
|
|
77 | (4) |
|
Reference: Trend Lines and Labels |
|
|
81 | (1) |
|
Scaling Up to Large Datasets |
|
|
82 | (1) |
|
Multivariate Plot: Parallel Coordinates Plot |
|
|
83 | (1) |
|
Interactive Visualization |
|
|
83 | (5) |
|
3.5 Specialized Visualizations |
|
|
88 | (5) |
|
Visualizing Networked Data |
|
|
88 | (2) |
|
Visualizing Hierarchical Data: Treemaps |
|
|
90 | (1) |
|
Visualizing Geographical Data: Map Charts |
|
|
91 | (2) |
|
3.6 Summary: Major Visualizations and Operations, by Data Mining Goal |
|
|
93 | (4) |
|
|
93 | (1) |
|
|
94 | (2) |
|
|
96 | (1) |
|
|
96 | (1) |
|
|
97 | (2) |
|
Chapter 4 Dimension Reduction |
|
|
99 | (26) |
|
|
100 | (1) |
|
4.2 Curse of Dimensionality |
|
|
100 | (1) |
|
4.3 Practical Considerations |
|
|
100 | (2) |
|
Example 1: House Prices in Boston |
|
|
101 | (1) |
|
|
102 | (3) |
|
|
102 | (2) |
|
Aggregation and Pivot Tables |
|
|
104 | (1) |
|
|
105 | (1) |
|
4.6 Reducing the Number of Categories in Categorical Variables |
|
|
106 | (2) |
|
4.7 Converting a Categorical Variable to a Numerical Variable |
|
|
108 | (1) |
|
4.8 Principal Components Analysis |
|
|
108 | (11) |
|
Example 2: Breakfast Cereals |
|
|
109 | (5) |
|
|
114 | (1) |
|
|
114 | (3) |
|
Using Principal Components for Classification and Prediction |
|
|
117 | (2) |
|
4.9 Dimension Reduction Using Regression Models |
|
|
119 | (1) |
|
4.10 Dimension Reduction Using Classification and Regression Trees |
|
|
119 | (1) |
|
|
120 | (5) |
Part III Performance Evaluation |
|
|
Chapter 5 Evaluating Predictive Performance |
|
|
125 | (36) |
|
|
126 | (1) |
|
5.2 Evaluating Predictive Performance |
|
|
126 | (5) |
|
Naive Benchmark: The Average |
|
|
127 | (1) |
|
Prediction Accuracy Measures |
|
|
127 | (1) |
|
Comparing Training and Validation Performance |
|
|
128 | (1) |
|
Cumulative Gains and Lift Charts |
|
|
128 | (3) |
|
5.3 Judging Classifier Performance |
|
|
131 | (13) |
|
Benchmark: The Naive Rule |
|
|
132 | (1) |
|
|
133 | (1) |
|
The Confusion (Classification) Matrix |
|
|
133 | (1) |
|
Using the Validation Data |
|
|
134 | (1) |
|
|
135 | (1) |
|
Propensities and Cutoff for Classification |
|
|
136 | (2) |
|
Performance in Case of Unequal Importance of Classes |
|
|
138 | (2) |
|
Asymmetric Misclassification Costs |
|
|
140 | (4) |
|
Generalization to More Than Two Classes |
|
|
144 | (1) |
|
5.4 Judging Ranking Performance |
|
|
144 | (5) |
|
Gains and Lift Charts for Binary Data |
|
|
144 | (3) |
|
|
147 | (1) |
|
|
148 | (1) |
|
Gains and Lift Charts Incorporating Costs and Benefits |
|
|
148 | (1) |
|
Cumulative Gains as a Function of Cutoff |
|
|
148 | (1) |
|
|
149 | (6) |
|
Oversampling the Training Set |
|
|
152 | (1) |
|
Evaluating Model Performance Using a Non-oversampled Validation Set |
|
|
152 | (1) |
|
Evaluating Model Performance if Only Oversampled Validation Set Exists |
|
|
152 | (3) |
|
|
155 | (6) |
Part IV Prediction And Classification Methods |
|
|
Chapter 6 Multiple Linear Regression |
|
|
161 | (24) |
|
|
162 | (1) |
|
6.2 Explanatory vs. Predictive Modeling |
|
|
162 | (2) |
|
6.3 Estimating the Regression Equation and Prediction |
|
|
164 | (5) |
|
Example: Predicting the Price of Used Toyota Corolla Cars |
|
|
165 | (4) |
|
6.4 Variable Selection in Linear Regression |
|
|
169 | (11) |
|
Reducing the Number of Predictors |
|
|
169 | (1) |
|
How to Reduce the Number of Predictors |
|
|
170 | (6) |
|
Regularization (Shrinkage Models) |
|
|
176 | (3) |
|
Appendix: Using Statmodels |
|
|
179 | (1) |
|
|
180 | (5) |
|
Chapter 7 k-Nearest Neighbors (kNN) |
|
|
185 | (14) |
|
7.1 The k-NN Classifier (Categorical Outcome) |
|
|
185 | (8) |
|
|
186 | (1) |
|
|
186 | (1) |
|
|
187 | (1) |
|
|
188 | (3) |
|
|
191 | (1) |
|
k-NN with More Than Two Classes |
|
|
192 | (1) |
|
Converting Categorical Variables to Binary Dummies |
|
|
193 | (1) |
|
7.2 k-NN for a Numerical Outcome |
|
|
193 | (2) |
|
7.3 Advantages and Shortcomings of k-NN Algorithms |
|
|
195 | (2) |
|
|
197 | (2) |
|
Chapter 8 The Naive Bayes Classifier |
|
|
199 | (18) |
|
|
199 | (2) |
|
Cutoff Probability Method |
|
|
200 | (1) |
|
|
200 | (1) |
|
Example 1: Predicting Fraudulent Financial Reporting |
|
|
201 | (1) |
|
8.2 Applying the Full (Exact) Bayesian Classifier |
|
|
201 | (9) |
|
Using the "Assign to the Most Probable Class" Method |
|
|
202 | (1) |
|
Using the Cutoff Probability Method |
|
|
202 | (1) |
|
Practical Difficulty with the Complete (Exact) Bayes Procedure |
|
|
202 | (1) |
|
|
203 | (1) |
|
The Naive Bayes Assumption of Conditional Independence |
|
|
204 | (1) |
|
Using the Cutoff Probability Method |
|
|
204 | (1) |
|
Example 2: Predicting Fraudulent Financial Reports, Two Predictors |
|
|
205 | (1) |
|
Example 3: Predicting Delayed Flights |
|
|
206 | (4) |
|
8.3 Advantages and Shortcomings of the Naive Bayes Classifier |
|
|
210 | (4) |
|
|
214 | (3) |
|
Chapter 9 Classification and Regression Trees |
|
|
217 | (34) |
|
|
218 | (2) |
|
|
219 | (1) |
|
|
219 | (1) |
|
|
220 | (1) |
|
|
220 | (8) |
|
|
220 | (1) |
|
|
221 | (2) |
|
|
223 | (5) |
|
9.3 Evaluating the Performance of a Classification Tree |
|
|
228 | (4) |
|
Example 2: Acceptance of Personal Loan |
|
|
228 | (2) |
|
Sensitivity Analysis Using Cross Validation |
|
|
230 | (2) |
|
|
232 | (6) |
|
|
233 | (1) |
|
Fine-tuning Tree Parameters |
|
|
234 | (2) |
|
Other Methods for Limiting Tree Size |
|
|
236 | (2) |
|
9.5 Classification Rules from Trees |
|
|
238 | (1) |
|
9.6 Classification Trees for More Than Two Classes |
|
|
239 | (1) |
|
|
239 | (4) |
|
|
240 | (1) |
|
|
240 | (1) |
|
|
241 | (2) |
|
9.8 Improving Prediction: Random Forests and Boosted Trees |
|
|
243 | (3) |
|
|
243 | (1) |
|
|
244 | (2) |
|
9.9 Advantages and Weaknesses of a Tree |
|
|
246 | (2) |
|
|
248 | (3) |
|
Chapter 10 Logistic Regression |
|
|
251 | (32) |
|
|
252 | (1) |
|
10.2 The Logistic Regression Model |
|
|
253 | (2) |
|
10.3 Example: Acceptance of Personal Loan |
|
|
255 | (6) |
|
Model with a Single Predictor |
|
|
255 | (2) |
|
Estimating the Logistic Model. from Data: Computing Parameter Estimates |
|
|
257 | (2) |
|
Interpreting Results in Terms of Odds (for a Profiling Goal) |
|
|
259 | (2) |
|
10.4 Evaluating Classification Performance |
|
|
261 | (3) |
|
|
262 | (2) |
|
10.5 Logistic Regression for Multi-class Classification |
|
|
264 | (5) |
|
|
264 | (2) |
|
|
266 | (1) |
|
Comparing Ordinal and Nominal Models |
|
|
267 | (2) |
|
10.6 Example of Complete Analysis: Predicting Delayed Flights |
|
|
269 | (11) |
|
|
270 | (2) |
|
|
272 | (1) |
|
|
273 | (1) |
|
|
273 | (3) |
|
|
276 | (2) |
|
Appendix: Using Statmodels |
|
|
278 | (2) |
|
|
280 | (3) |
|
|
283 | (26) |
|
|
284 | (1) |
|
11.2 Concept and Structure of a Neural Network |
|
|
284 | (1) |
|
11.3 Fitting a Network to Data |
|
|
285 | (12) |
|
|
285 | (1) |
|
Computing Output of Nodes |
|
|
286 | (3) |
|
|
289 | (1) |
|
|
290 | (2) |
|
Example 2: Classifying Accident Severity |
|
|
292 | (3) |
|
|
295 | (2) |
|
Using the Output for Prediction and Classification |
|
|
297 | (1) |
|
|
297 | (2) |
|
11.5 Exploring the Relationship Between Predictors and Outcome |
|
|
299 | (1) |
|
|
299 | (6) |
|
Convolutional Neural Networks (CNNs) |
|
|
300 | (1) |
|
|
301 | (1) |
|
|
302 | (1) |
|
|
302 | (1) |
|
|
303 | (1) |
|
|
304 | (1) |
|
11.7 Advantages and Weaknesses of Neural Networks |
|
|
305 | (1) |
|
|
306 | (3) |
|
Chapter 12 Discriminant Analysis |
|
|
309 | (18) |
|
|
310 | (1) |
|
|
310 | (1) |
|
Example 2: Personal Loan Acceptance |
|
|
310 | (1) |
|
12.2 Distance of a Record from a Class |
|
|
311 | (3) |
|
12.3 Fisher's Linear Classification Functions |
|
|
314 | (3) |
|
12.4 Classification Performance of Discriminant Analysis |
|
|
317 | (1) |
|
|
318 | (1) |
|
12.6 Unequal Misclassification Costs |
|
|
319 | (1) |
|
12.7 Classifying More Than Two Classes |
|
|
319 | (3) |
|
Example 3: Medical Dispatch to Accident Scenes |
|
|
319 | (3) |
|
12.8 Advantages and Weaknesses |
|
|
322 | (2) |
|
|
324 | (3) |
|
Chapter 13 Combining Methods: Ensembles and Uplift Modeling |
|
|
327 | (18) |
|
|
328 | (6) |
|
Why Ensembles Can Improve Predictive Power |
|
|
329 | (1) |
|
|
330 | (1) |
|
|
331 | (1) |
|
|
331 | (1) |
|
Bagging and Boosting in Python |
|
|
332 | (1) |
|
Advantages and Weaknesses of Ensembles |
|
|
332 | (2) |
|
13.2 Uplift (Persuasion) Modeling |
|
|
334 | (6) |
|
|
334 | (1) |
|
|
334 | (1) |
|
|
335 | (1) |
|
|
336 | (1) |
|
Modeling Individual Uplift |
|
|
337 | (1) |
|
Computing Uplift with Python |
|
|
338 | (1) |
|
Using the Results of an Uplift Model |
|
|
339 | (1) |
|
|
340 | (1) |
|
|
341 | (4) |
Part V Mining Relationships Among Records |
|
|
Chapter 14 Association Rules and Collaborative Filtering |
|
|
345 | (30) |
|
|
346 | (11) |
|
Discovering Association Rules in Transaction Databases |
|
|
346 | (2) |
|
Example 1: Synthetic Data on Purchases of Phone Faceplates |
|
|
348 | (1) |
|
Generating Candidate Rules |
|
|
348 | (1) |
|
|
349 | (1) |
|
|
349 | (3) |
|
|
352 | (1) |
|
The Process of Rule Selection |
|
|
353 | (1) |
|
|
354 | (1) |
|
|
355 | (2) |
|
Example 2: Rules for Similar Book Purchases |
|
|
357 | (1) |
|
14.2 Collaborative Filtering |
|
|
357 | (11) |
|
|
359 | (1) |
|
Example 3: Netflix Prize Contest |
|
|
360 | (1) |
|
User-Based Collaborative Filtering: "People Like You" |
|
|
361 | (2) |
|
Item-Based Collaborative Filtering |
|
|
363 | (1) |
|
Advantages and Weaknesses of Collaborative Filtering |
|
|
364 | (2) |
|
Collaborative Filtering vs. Association Rules |
|
|
366 | (2) |
|
|
368 | (2) |
|
|
370 | (5) |
|
Chapter 15 Cluster Analysis |
|
|
375 | (32) |
|
|
376 | (3) |
|
Example: Public Utilities |
|
|
377 | (2) |
|
15.2 Measuring Distance Between Two Records |
|
|
379 | (6) |
|
|
380 | (1) |
|
Normalizing Numerical Measurements |
|
|
380 | (1) |
|
Other Distance Measures for Numerical Data |
|
|
381 | (2) |
|
Distance Measures for Categorical Data |
|
|
383 | (1) |
|
Distance Measures for Mixed Data |
|
|
384 | (1) |
|
15.3 Measuring Distance Between Two Clusters |
|
|
385 | (2) |
|
|
385 | (1) |
|
|
385 | (1) |
|
|
385 | (1) |
|
|
385 | (2) |
|
15.4 Hierarchical (Agglomerative) Clustering |
|
|
387 | (8) |
|
|
388 | (1) |
|
|
388 | (1) |
|
|
388 | (1) |
|
|
389 | (1) |
|
|
389 | (1) |
|
Dendrograms: Displaying Clustering Process and Results |
|
|
390 | (1) |
|
|
390 | (3) |
|
Limitations of Hierarchical Clustering |
|
|
393 | (2) |
|
15.5 Non-Hierarchical Clustering: The k-Means Algorithm |
|
|
395 | (6) |
|
Choosing the Number of Clusters (k) |
|
|
396 | (5) |
|
|
401 | (6) |
Part VI Forecasting Time Series |
|
|
Chapter 16 Handling Time Series |
|
|
407 | (16) |
|
|
408 | (1) |
|
16.2 Descriptive vs. Predictive Modeling |
|
|
409 | (1) |
|
16.3 Popular Forecasting Methods in Business |
|
|
409 | (1) |
|
|
410 | (1) |
|
16.4 Time Series Components |
|
|
410 | (5) |
|
Example: Ridership on Amtrak Trains |
|
|
411 | (4) |
|
16.5 Data-Partitioning and Performance Evaluation |
|
|
415 | (4) |
|
Benchmark Performance: Naive Forecasts |
|
|
415 | (1) |
|
Generating Future Forecasts |
|
|
416 | (3) |
|
|
419 | (4) |
|
Chapter 17 Regression-Based Forecasting |
|
|
423 | (28) |
|
|
424 | (5) |
|
|
424 | (2) |
|
|
426 | (1) |
|
|
427 | (2) |
|
17.2 A Model with Seasonality |
|
|
429 | (3) |
|
17.3 A Model with Trend and Seasonality |
|
|
432 | (1) |
|
17.4 Autocorrelation and ARIMA Models |
|
|
433 | (9) |
|
Computing Autocorrelation |
|
|
434 | (2) |
|
Improving Forecasts by Integrating Autocorrelation Information |
|
|
436 | (4) |
|
Evaluating Predictability |
|
|
440 | (2) |
|
|
442 | (9) |
|
Chapter 18 Smoothing Methods |
|
|
451 | (22) |
|
|
452 | (1) |
|
|
452 | (5) |
|
Centered Moving Average for Visualization |
|
|
452 | (1) |
|
Trailing Moving Average for Forecasting |
|
|
453 | (2) |
|
Choosing Window Width (w) |
|
|
455 | (2) |
|
18.3 Simple Exponential Smoothing |
|
|
457 | (3) |
|
Choosing Smoothing Parameter a |
|
|
458 | (2) |
|
Relation Between Moving Average and Simple Exponential Smoothing |
|
|
460 | (1) |
|
18.4 Advanced Exponential Smoothing |
|
|
460 | (4) |
|
|
460 | (1) |
|
Series with a Trend and Seasonality |
|
|
461 | (1) |
|
Series with Seasonality (No Trend) |
|
|
462 | (2) |
|
|
464 | (9) |
Part VII Data Analytics |
|
|
Chapter 19 Social Network Analytics |
|
|
473 | (22) |
|
|
473 | (2) |
|
19.2 Directed vs. Undirected Networks |
|
|
475 | (1) |
|
19.3 Visualizing and Analyzing Networks |
|
|
476 | (4) |
|
|
476 | (2) |
|
|
478 | (1) |
|
|
479 | (1) |
|
Using Network Data in Classification and Prediction |
|
|
479 | (1) |
|
19.4 Social Data Metrics and Taxonomy |
|
|
480 | (5) |
|
Node-Level Centrality Metrics |
|
|
480 | (1) |
|
|
481 | (2) |
|
|
483 | (2) |
|
19.5 Using Network Metrics in Prediction and Classification |
|
|
485 | (6) |
|
|
485 | (1) |
|
|
485 | (3) |
|
|
488 | (3) |
|
19.6 Collecting Social Network Data with Python |
|
|
491 | (1) |
|
19.7 Advantages and Disadvantages |
|
|
491 | (3) |
|
|
494 | (1) |
|
|
495 | (20) |
|
|
496 | (1) |
|
20.2 The Tabular Representation of Text: Term-Document Matrix and "Bag-of-Words" |
|
|
496 | (1) |
|
20.3 Bag-of-Words vs. Meaning Extraction at Document Level |
|
|
497 | (1) |
|
20.4 Preprocessing the Text |
|
|
498 | (8) |
|
|
499 | (2) |
|
|
501 | (1) |
|
Presence/Absence vs. Frequency |
|
|
501 | (1) |
|
Term Frequency-Inverse Document Frequency (TF-IDF) |
|
|
502 | (3) |
|
From Terms to Concepts: Latent Semantic Indexing |
|
|
505 | (1) |
|
|
505 | (1) |
|
20.5 Implementing Data Mining Methods |
|
|
506 | (1) |
|
20.6 Example: Online Discussions on Autos and Electronics |
|
|
506 | (4) |
|
Importing and Labeling the Records |
|
|
507 | (1) |
|
Text Preprocessing in Python |
|
|
508 | (1) |
|
Producing a Concept Matrix |
|
|
508 | (1) |
|
Fitting a Predictive Model |
|
|
508 | (1) |
|
|
509 | (1) |
|
|
510 | (1) |
|
|
511 | (4) |
Part VIII Cases |
|
|
|
515 | (34) |
|
|
515 | (7) |
|
|
515 | (1) |
|
Database Marketing at Charles |
|
|
516 | (2) |
|
|
518 | (2) |
|
|
520 | (2) |
|
|
522 | (5) |
|
|
522 | (1) |
|
|
522 | (4) |
|
|
526 | (1) |
|
21.3 Tayko Software Cataloger |
|
|
527 | (4) |
|
|
527 | (1) |
|
|
527 | (1) |
|
|
527 | (2) |
|
|
529 | (2) |
|
21.4 Political Persuasion |
|
|
531 | (4) |
|
|
531 | (1) |
|
Predictive Analytics Arrives in US Politics |
|
|
531 | (1) |
|
|
531 | (1) |
|
|
532 | (1) |
|
|
533 | (1) |
|
|
533 | (2) |
|
|
535 | (2) |
|
|
535 | (1) |
|
|
535 | (2) |
|
21.6 Segmenting Consumers of Bath Soap |
|
|
537 | (4) |
|
|
537 | (1) |
|
|
537 | (1) |
|
|
538 | (1) |
|
|
538 | (1) |
|
|
538 | (3) |
|
21.7 Direct-Mail Fundraising |
|
|
541 | (3) |
|
|
541 | (1) |
|
|
541 | (1) |
|
|
541 | (3) |
|
21.8 Catalog Cross-Selling |
|
|
544 | (2) |
|
|
544 | (1) |
|
|
544 | (2) |
|
21.9 Time Series Case: Forecasting Public Transportation Demand |
|
|
546 | (3) |
|
|
546 | (1) |
|
|
546 | (1) |
|
|
546 | (1) |
|
|
546 | (1) |
|
|
547 | (1) |
|
|
547 | (2) |
References |
|
549 | (2) |
Data Files Used in the Book |
|
551 | (4) |
Python Utilities Functions |
|
555 | (10) |
Index |
|
565 | |