Foreword |
|
xvii | |
Preface |
|
xix | |
Acknowledgments |
|
xxi | |
Part I Preliminaries |
|
|
|
3 | (11) |
|
1.1 What Is Business Analytics? |
|
|
3 | (2) |
|
Who Uses Predictive Analytics? |
|
|
4 | (1) |
|
|
5 | (1) |
|
1.3 Data Mining and Related Terms |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
7 | (1) |
|
1.6 Why Are There So Many Different Methods? |
|
|
7 | (1) |
|
1.7 Terminology and Notation |
|
|
8 | (2) |
|
|
10 | (4) |
|
|
11 | (4) |
|
Using IMP Pro, Statistical Discovery Software from SAS |
|
|
11 | (3) |
|
2 Overview of the Data Mining Process |
|
|
14 | (37) |
|
|
14 | (1) |
|
2.2 Core Ideas in Data Mining |
|
|
15 | (2) |
|
|
15 | (1) |
|
|
15 | (1) |
|
Association Rules and Recommendation Systems |
|
|
15 | (1) |
|
|
16 | (1) |
|
Data Reduction and Dimension Reduction |
|
|
16 | (1) |
|
Data Exploration and Visualization |
|
|
16 | (1) |
|
Supervised and Unsupervised Learning |
|
|
16 | (1) |
|
2.3 The Steps in Data Mining |
|
|
17 | (2) |
|
|
19 | (6) |
|
|
19 | (1) |
|
|
19 | (1) |
|
Oversampling Rare Events in Classification Tasks |
|
|
19 | (1) |
|
Preprocessing and Cleaning the Data |
|
|
20 | (5) |
|
Changing Modeling Types in IMP |
|
|
20 | (5) |
|
Standardizing Data in IMP |
|
|
25 | (1) |
|
2.5 Predictive Power and Overfitting |
|
|
25 | (4) |
|
Creation and Use of Data Partitions |
|
|
25 | (2) |
|
Partitioning Data for Crossvalidation in JMP Pro |
|
|
27 | (1) |
|
|
27 | (2) |
|
2.6 Building a Predictive Model with JMP Pro |
|
|
29 | (9) |
|
Predicting Home Values in a Boston Neighborhood |
|
|
29 | (1) |
|
|
30 | (10) |
|
Setting the Random Seed in JMP |
|
|
34 | (4) |
|
2.7 Using JMP Pro for Data Mining |
|
|
38 | (2) |
|
2.8 Automating Data Mining Solutions |
|
|
40 | (4) |
|
Data Mining Software Tools: the State of the Market by Herb Edelstein |
|
|
41 | (3) |
|
|
44 | (7) |
Part II Data Exploration And Dimension Reduction |
|
|
|
51 | (30) |
|
3.1 Uses of Data Visualization |
|
|
51 | (1) |
|
|
52 | (2) |
|
Example 1: Boston Housing Data |
|
|
53 | (1) |
|
Example 2: Ridership on Amtrak Trains |
|
|
53 | (1) |
|
3.3 Basic Charts: Bar Charts, Line Graphs, and Scatterplots |
|
|
54 | (7) |
|
Using The JMP Graph Builder |
|
|
54 | (2) |
|
Distribution Plots: Boxplots and Histograms |
|
|
56 | (3) |
|
Tools for Data Visualization in JMP |
|
|
59 | (1) |
|
Heatmaps (Color Maps and Cell Plots): Visualizing Correlations and Missing Values |
|
|
59 | (2) |
|
3.4 Multidimensional Visualization |
|
|
61 | (12) |
|
Adding Variables: Color, Size, Shape, Multiple Panels, and Animation |
|
|
62 | (3) |
|
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering |
|
|
65 | (3) |
|
Reference: Trend Lines and Labels |
|
|
68 | (1) |
|
Adding Trendlines in the Graph Builder |
|
|
69 | (1) |
|
Scaling Up: Large Datasets |
|
|
70 | (1) |
|
Multivariate Plot: Parallel Coordinates Plot |
|
|
71 | (1) |
|
Interactive Visualization |
|
|
72 | (1) |
|
3.5 Specialized Visualizations |
|
|
73 | (4) |
|
Visualizing Networked Data |
|
|
74 | (1) |
|
Visualizing Hierarchical Data: More on Treemaps |
|
|
75 | (1) |
|
Visualizing Geographical Data: Maps |
|
|
76 | (1) |
|
3.6 Summary of Major Visualizations and Operations, According to Data Mining Goal |
|
|
77 | (2) |
|
|
77 | (1) |
|
|
78 | (1) |
|
|
78 | (1) |
|
|
79 | (1) |
|
|
79 | (2) |
|
|
81 | (24) |
|
|
81 | (1) |
|
4.2 Curse of Dimensionality |
|
|
82 | (1) |
|
4.3 Practical Considerations |
|
|
82 | (1) |
|
Example 1: House Prices in Boston |
|
|
82 | (1) |
|
|
83 | (4) |
|
|
83 | (2) |
|
Tabulating Data (Pivot Tables) |
|
|
85 | (2) |
|
|
87 | (1) |
|
4.6 Reducing the Number of Categories in Categorical Variables |
|
|
87 | (3) |
|
4.7 Converting a Categorical Variable to a Continuous Variable |
|
|
90 | (1) |
|
4.8 Principal Components Analysis |
|
|
90 | (10) |
|
Example 2: Breakfast Cereals |
|
|
91 | (4) |
|
|
95 | (2) |
|
|
97 | (3) |
|
Using Principal Components for Classification and Prediction |
|
|
100 | (1) |
|
4.9 Dimension Reduction Using Regression Models |
|
|
100 | (1) |
|
4.10 Dimension Reduction Using Classification and Regression Trees |
|
|
100 | (1) |
|
|
101 | (4) |
Part III Performance Evaluation |
|
|
5 Evaluating Predictive Performance |
|
|
105 | (28) |
|
|
105 | (1) |
|
5.2 Evaluating Predictive Performance |
|
|
106 | (3) |
|
|
106 | (1) |
|
Prediction Accuracy Measures |
|
|
107 | (1) |
|
Comparing Training and Validation Performance |
|
|
108 | (1) |
|
5.3 Judging Classifier Performance |
|
|
109 | (11) |
|
Benchmark: The Naive Rule |
|
|
109 | (1) |
|
|
109 | (1) |
|
The Classification Matrix |
|
|
109 | (2) |
|
Using the Validation Data |
|
|
111 | (1) |
|
|
111 | (1) |
|
Propensities and Cutoff for Classification |
|
|
112 | (3) |
|
|
112 | (2) |
|
Changing the Cutoff Values for a Confussion Matrix in JMP |
|
|
114 | (1) |
|
Performance in Unequal Importance of Classes |
|
|
115 | (1) |
|
False-Positive and False-Negative Rates |
|
|
116 | (1) |
|
Asymmetric Misclassification Costs |
|
|
116 | (4) |
|
Asymmetric Misclassification Costs in JMP |
|
|
119 | (1) |
|
Generalization to More Than Two Classes |
|
|
120 | (1) |
|
5.4 Judging Ranking Performance |
|
|
120 | (3) |
|
|
120 | (2) |
|
|
122 | (1) |
|
Lift Curves Incorporating Costs and Benefits |
|
|
122 | (1) |
|
|
123 | (6) |
|
Oversampling the Training Set |
|
|
126 | (1) |
|
Stratified Sampling and Oversampling in JMP |
|
|
126 | (1) |
|
Evaluating Model Performance Using a Nonoversampled Validation Set |
|
|
126 | (1) |
|
Evaluating Model Performance If Only Oversampled Validation Set Exists |
|
|
127 | (8) |
|
Applying Sampling Weights in JMP |
|
|
128 | (1) |
|
|
129 | (4) |
Part IV Prediction And Classification Methods |
|
|
6 Multiple Linear Regression |
|
|
133 | (22) |
|
|
133 | (1) |
|
6.2 Explanatory versus Predictive Modeling |
|
|
134 | (1) |
|
6.3 Estimating the Regression Equation and Prediction |
|
|
135 | (6) |
|
Example: Predicting the Price of Used Toyota Corolla Automobiles |
|
|
136 | (5) |
|
Coding of Categorical Variables in Regression |
|
|
138 | (2) |
|
Additional Options for Regression Models in JMP |
|
|
140 | (1) |
|
6.4 Variable Selection in Linear Regression |
|
|
141 | (9) |
|
Reducing the Number of Predictors |
|
|
141 | (1) |
|
How to Reduce the Number of Predictors |
|
|
142 | (1) |
|
Manual Variable Selection |
|
|
142 | (1) |
|
Automated Variable Selection |
|
|
142 | (13) |
|
Coding of Categorical Variables in Stepwise Regression |
|
|
143 | (2) |
|
Working with the All Possible Models Output |
|
|
145 | (2) |
|
When Using a Stopping Algorithm in JMP |
|
|
147 | (2) |
|
Other Regression Procedures in JMP Pro-Generalized Regression |
|
|
149 | (1) |
|
|
150 | (5) |
|
7 k-Nearest Neighbors (k-NN) |
|
|
155 | (12) |
|
7.1 The k-NN Classifier (Categorical Outcome) |
|
|
155 | (6) |
|
|
155 | (1) |
|
|
156 | (1) |
|
|
156 | (1) |
|
|
157 | (2) |
|
k Nearest Neighbors in JMP Pro |
|
|
158 | (1) |
|
The Cutoff Value for Classification |
|
|
159 | (2) |
|
k-NN Predictions and Prediction Formulas in JMP Pro |
|
|
161 | (1) |
|
k-NN with More Than Two Classes |
|
|
161 | (1) |
|
7.2 k-NN for a Numerical Response |
|
|
161 | (2) |
|
|
161 | (2) |
|
7.3 Advantages and Shortcomings of k-NN Algorithms |
|
|
163 | (1) |
|
|
164 | (3) |
|
8 The Naive Bayes Classifier |
|
|
167 | (16) |
|
|
167 | (2) |
|
|
167 | (1) |
|
Cutoff Probability Method |
|
|
168 | (1) |
|
|
168 | (1) |
|
Example 1: Predicting Fraudulent Financial Reporting |
|
|
168 | (1) |
|
8.2 Applying the Full (Exact) Bayesian Classifier |
|
|
169 | (10) |
|
Using the "Assign to the Most Probable Class" Method |
|
|
169 | (1) |
|
Using the Cutoff Probability Method |
|
|
169 | (1) |
|
Practical Difficulty with the Complete (Exact) Bayes Procedure |
|
|
170 | (1) |
|
|
170 | (2) |
|
Example 2: Predicting Fraudulent Financial Reports, Two Predictors |
|
|
172 | (2) |
|
Using the JMP Naive Bayes Add-in |
|
|
174 | (1) |
|
Example 3: Predicting Delayed Flights |
|
|
174 | (5) |
|
8.3 Advantages and Shortcomings of the Naive Bayes Classifier |
|
|
179 | (1) |
|
|
179 | (1) |
|
|
180 | (3) |
|
9 Classification and Regression Trees |
|
|
183 | (28) |
|
|
183 | (1) |
|
|
184 | (3) |
|
|
184 | (1) |
|
|
185 | (1) |
|
|
186 | (1) |
|
|
187 | (5) |
|
|
187 | (1) |
|
Classifying a New Observation |
|
|
188 | (4) |
|
Fitting Classification Trees in JMP Pro |
|
|
191 | (1) |
|
|
192 | (1) |
|
9.4 Evaluating the Performance of a Classification Tree |
|
|
192 | (1) |
|
Example 2: Acceptance of Personal Loan |
|
|
192 | (1) |
|
|
193 | (3) |
|
Stopping Tree Growth: CHAID |
|
|
194 | (1) |
|
Growing a Full Tree and Pruning It Back |
|
|
194 | (2) |
|
|
196 | (1) |
|
9.6 Classification Rules from Trees |
|
|
196 | (2) |
|
9.7 Classification Trees for More Than Two Classes |
|
|
198 | (1) |
|
|
199 | (1) |
|
|
199 | (1) |
|
|
200 | (1) |
|
9.9 Advantages and Weaknesses of a Tree |
|
|
200 | (4) |
|
9.10 Improving Prediction: Multiple Trees |
|
|
204 | (3) |
|
Fitting Ensemble Tree Models in JMP Pro |
|
|
206 | (1) |
|
9.11 CART and Measures of Impurity |
|
|
207 | (1) |
|
|
207 | (4) |
|
|
211 | (34) |
|
|
211 | (2) |
|
Logistic Regression and Consumer Choice Theory |
|
|
212 | (1) |
|
10.2 The Logistic Regression Model |
|
|
213 | (8) |
|
Example: Acceptance of Personal Loan (Universal Bank) |
|
|
214 | (2) |
|
Indicator (Dummy) Variables in JMP |
|
|
216 | (1) |
|
Model with a Single Predictor |
|
|
216 | (2) |
|
Fitting One Predictor Logistic Models in JMP |
|
|
218 | (1) |
|
Estimating the Logistic Model from Data: Multiple Predictors |
|
|
218 | (3) |
|
Fitting Logistic Models in JMP with More Than One Predictor |
|
|
221 | (1) |
|
10.3 Evaluating Classification Performance |
|
|
221 | (2) |
|
|
222 | (1) |
|
10.4 Example of Complete Analysis: Predicting Delayed Flights |
|
|
223 | (11) |
|
|
225 | (1) |
|
Model Fitting, Estimation and Interpretation-A Simple Model |
|
|
226 | (1) |
|
Model Fitting, Estimation and Interpretation-The Full Model |
|
|
227 | (2) |
|
|
229 | (1) |
|
|
230 | (2) |
|
Regrouping and Recoding Variables in JMP |
|
|
232 | (2) |
|
10.5 Appendixes: Logistic Regression for Profiling |
|
|
234 | (7) |
|
Appendix A: Why Linear Regression Is Problematic for a Categorical Response |
|
|
234 | (2) |
|
Appendix B: Evaluating Explanatory Power |
|
|
236 | (2) |
|
Appendix C: Logistic Regression for More Than Two Classes |
|
|
238 | (1) |
|
|
238 | (3) |
|
|
241 | (4) |
|
|
245 | (23) |
|
|
245 | (1) |
|
11.2 Concept and Structure of a Neural Network |
|
|
246 | (1) |
|
11.3 Fitting a Network to Data |
|
|
246 | (14) |
|
|
246 | (2) |
|
Computing Output of Nodes |
|
|
248 | (3) |
|
|
251 | (1) |
|
Activation Functions and Data Processing Features in JMP Pro |
|
|
251 | (1) |
|
|
251 | (3) |
|
Fitting a Neural Network in JMP Pro |
|
|
254 | (2) |
|
Using the Output for Prediction and Classification |
|
|
256 | (2) |
|
Example 2: Classifying Accident Severity |
|
|
258 | (1) |
|
|
259 | (1) |
|
11.4 User Input in JMP Pro |
|
|
260 | (4) |
|
Unsupervised Feature Extraction and Deep Learning |
|
|
263 | (1) |
|
11.5 Exploring the Relationship between Predictors and Response |
|
|
264 | (1) |
|
Understanding Neural Models in JMP Pro |
|
|
264 | (1) |
|
11.6 Advantages and Weaknesses of Neural Networks |
|
|
264 | (1) |
|
|
265 | (3) |
|
|
268 | (17) |
|
|
268 | (2) |
|
|
269 | (1) |
|
Example 2: Personal Loan Acceptance (Universal Bank) |
|
|
269 | (1) |
|
12.2 Distance of an Observation from a Class |
|
|
270 | (2) |
|
12.3 From Distances to Propensities and Classifications |
|
|
272 | (3) |
|
Linear Discriminant Analysis in JMP |
|
|
275 | (1) |
|
12.4 Classification Performance of Discriminant Analysis |
|
|
275 | (2) |
|
|
277 | (1) |
|
12.6 Classifying More Than Two Classes |
|
|
278 | (2) |
|
Example 3: Medical Dispatch to Accident Scenes |
|
|
278 | (1) |
|
Using Categorical Predictors in Discriminant Analysis in JMP |
|
|
279 | (1) |
|
12.7 Advantages and Weaknesses |
|
|
280 | (2) |
|
|
282 | (3) |
|
13 Combining Methods: Ensembles and Uplift Modeling |
|
|
285 | (16) |
|
|
285 | (5) |
|
Why Ensembles Can Improve Predictive Power |
|
|
286 | (1) |
|
|
287 | (1) |
|
|
287 | (1) |
|
|
288 | (1) |
|
|
288 | (1) |
|
Creating Ensemble Models in JMP Pro |
|
|
289 | (1) |
|
Advantages and Weaknesses of Ensembles |
|
|
289 | (1) |
|
13.2 Uplift (Persuasion) Modeling |
|
|
290 | (5) |
|
|
290 | (1) |
|
|
290 | (1) |
|
|
291 | (1) |
|
|
292 | (1) |
|
Modeling Individual Uplift |
|
|
293 | (1) |
|
Using the Results of an Uplift Model |
|
|
294 | (1) |
|
Creating Uplift Models in JMP Pro |
|
|
294 | (1) |
|
Using the Uplift Platform in JMP Pro |
|
|
295 | (1) |
|
|
295 | (2) |
|
|
297 | (4) |
Part V Mining Relationships Among Records |
|
|
|
301 | (34) |
|
|
301 | (4) |
|
Example: Public Utilities |
|
|
302 | (3) |
|
14.2 Measuring Distance between Two Observations |
|
|
305 | (4) |
|
|
305 | (1) |
|
Normalizing Numerical Measurements |
|
|
305 | (1) |
|
Other Distance Measures for Numerical Data |
|
|
306 | (2) |
|
Distance Measures for Categorical Data |
|
|
308 | (1) |
|
Distance Measures for Mixed Data |
|
|
308 | (1) |
|
14.3 Measuring Distance between Two Clusters |
|
|
309 | (2) |
|
|
309 | (1) |
|
|
309 | (1) |
|
|
309 | (1) |
|
|
309 | (2) |
|
14.4 Hierarchical (Agglomerative) Clustering |
|
|
311 | (9) |
|
Hierarchical Clustering in JMP and JMP Pro |
|
|
311 | (1) |
|
Hierarchical Agglomerative Clustering Algorithm |
|
|
312 | (1) |
|
|
312 | (1) |
|
|
313 | (1) |
|
|
313 | (1) |
|
|
313 | (1) |
|
|
314 | (1) |
|
Dendrograms: Displaying Clustering Process and Results |
|
|
314 | (2) |
|
|
316 | (2) |
|
|
318 | (1) |
|
Limitations of Hierarchical Clustering |
|
|
319 | (1) |
|
14.5 Nonhierarchical Clustering: The k-Means Algorithm |
|
|
320 | (9) |
|
k-Means Clustering Algorithm |
|
|
321 | (1) |
|
Initial Partition into K Clusters |
|
|
322 | (15) |
|
K-Means Clustering in JMP |
|
|
322 | (7) |
|
|
329 | (6) |
Part VI Forecasting Time Series |
|
|
|
335 | (11) |
|
|
335 | (1) |
|
15.2 Descriptive versus Predictive Modeling |
|
|
336 | (1) |
|
15.3 Popular Forecasting Methods in Business |
|
|
337 | (1) |
|
|
337 | (1) |
|
15.4 Time Series Components |
|
|
337 | (4) |
|
Example: Ridership on Amtrak Trains |
|
|
337 | (4) |
|
15.5 Data Partitioning and Performance Evaluation |
|
|
341 | (2) |
|
Benchmark Performance: Naive Forecasts |
|
|
342 | (1) |
|
Generating Future Forecasts |
|
|
342 | (4) |
|
Partitioning Time Series Data in JMP and Validating Time Series Models |
|
|
342 | (1) |
|
|
343 | (3) |
|
16 Regression-Based Forecasting |
|
|
346 | (31) |
|
|
346 | (7) |
|
|
346 | (4) |
|
Fitting a Model with Linear Trend in JMP |
|
|
348 | (2) |
|
Creating Actual versus Predicted Plots and Residual Plots in JMP |
|
|
350 | (1) |
|
|
350 | (2) |
|
Computing Forecast Errors for Exponential Trend Models |
|
|
352 | (1) |
|
|
352 | (4) |
|
Fitting a Polynomial Trend in JMP |
|
|
353 | (1) |
|
16.2 A Model with Seasonality |
|
|
353 | (3) |
|
16.3 A Model with Trend and Seasonality |
|
|
356 | (1) |
|
16.4 Autocorrelation and ARIMA Models |
|
|
356 | (10) |
|
Computing Autocorrelation |
|
|
356 | (4) |
|
Improving Forecasts by Integrating Autocorrelation Information |
|
|
360 | (1) |
|
Fitting AR (Autoregression) Models in the JMP Time Series Platform |
|
|
361 | (1) |
|
Fitting AR Models to Residuals |
|
|
361 | (2) |
|
Evaluating Predictability |
|
|
363 | (2) |
|
Summary: Fitting Regression-Based Time Series Models in JMP |
|
|
365 | (1) |
|
|
366 | (11) |
|
|
377 | (25) |
|
|
377 | (1) |
|
|
378 | (4) |
|
Centered Moving Average for Visualization |
|
|
378 | (1) |
|
Trailing Moving Average for Forecasting |
|
|
379 | (3) |
|
Computing a Trailing Moving Average Forecast in JMP |
|
|
380 | (2) |
|
Choosing Window Width (w) |
|
|
382 | (1) |
|
17.3 Simple Exponential Smoothing |
|
|
382 | (5) |
|
Choosing Smoothing Parameter a |
|
|
383 | (3) |
|
Fitting Simple Exponential Smoothing Models in JMP |
|
|
384 | (2) |
|
Creating Plots for Actual versus Forecasted Series and Residuals Series Using the Graph Builder |
|
|
386 | (1) |
|
Relation between Moving Average and Simple Exponential Smoothing |
|
|
386 | (1) |
|
17.4 Advanced Exponential Smoothing |
|
|
387 | (3) |
|
|
387 | (1) |
|
Series with a Trend and Seasonality |
|
|
388 | (2) |
|
|
390 | (12) |
Part VII Cases |
|
|
|
402 | (29) |
|
|
401 | (8) |
|
|
401 | (1) |
|
Database Marketing at Charles |
|
|
402 | (1) |
|
|
403 | (2) |
|
|
405 | (4) |
|
|
409 | (1) |
|
|
409 | (1) |
|
|
409 | (1) |
|
|
409 | (1) |
|
18.3 Tayko Software Cataloger |
|
|
410 | (5) |
|
|
410 | (3) |
|
|
413 | (1) |
|
|
413 | (1) |
|
|
413 | (2) |
|
18.4 Political Persuasion |
|
|
415 | (4) |
|
|
415 | (1) |
|
Predictive Analytics Arrives in US Politics |
|
|
415 | (1) |
|
|
416 | (1) |
|
|
416 | (1) |
|
|
417 | (1) |
|
|
417 | (2) |
|
|
419 | (1) |
|
|
419 | (1) |
|
|
419 | (1) |
|
18.6 Segmenting Consumers of Bath Soap |
|
|
420 | (3) |
|
|
420 | (1) |
|
|
421 | (1) |
|
|
421 | (1) |
|
|
421 | (1) |
|
|
421 | (2) |
|
18.7 Direct-Mail Fundraising |
|
|
423 | (2) |
|
|
423 | (1) |
|
|
424 | (1) |
|
|
425 | (1) |
|
18.8 Predicting Bankruptcy |
|
|
425 | (3) |
|
Predicting Corporate Bankruptcy |
|
|
426 | (2) |
|
|
428 | (1) |
|
18.9 Time Series Case: Forecasting Public Transportation Demand |
|
|
428 | (3) |
|
|
428 | (1) |
|
|
428 | (1) |
|
|
428 | (1) |
|
|
429 | (1) |
|
|
429 | (1) |
|
|
429 | (2) |
References |
|
431 | (2) |
Data Files Used in the Book |
|
433 | (2) |
Index |
|
435 | |