Preface to Third Edition |
|
xxiii | |
Preface of Second Edition |
|
xxvii | |
Acknowledgments |
|
xxxi | |
Author |
|
xxxiii | |
|
|
1 | (12) |
|
1.1 The Personal Computer and Statistics |
|
|
1 | (2) |
|
1.2 Statistics and Data Analysis |
|
|
3 | (1) |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
7 | (1) |
|
1.6.1 Data Size Characteristics |
|
|
7 | (1) |
|
1.6.2 Data Size: Personal Observation of One |
|
|
8 | (1) |
|
|
8 | (1) |
|
1.8 Statistics and Machine Learning |
|
|
9 | (1) |
|
1.9 Statistical Data Mining |
|
|
10 | (3) |
|
|
11 | (2) |
|
2 Science Dealing with Data: Statistics and Data Science |
|
|
13 | (12) |
|
|
13 | (1) |
|
|
13 | (2) |
|
2.3 The Statistics and Data Science Comparison |
|
|
15 | (6) |
|
2.3.1 Statistics versus Data Science |
|
|
15 | (6) |
|
2.4 Discussion: Are Statistics and Data Science Different? |
|
|
21 | (2) |
|
2.4.1 Analysis: Are Statistics and Data Science Different? |
|
|
22 | (1) |
|
|
23 | (1) |
|
|
23 | (2) |
|
|
23 | (2) |
|
3 Two Basic Data Mining Methods for Variable Assessment |
|
|
25 | (12) |
|
|
25 | (1) |
|
3.2 Correlation Coefficient |
|
|
25 | (2) |
|
|
27 | (1) |
|
|
28 | (2) |
|
|
28 | (1) |
|
|
29 | (1) |
|
|
30 | (3) |
|
3.6 General Association Test |
|
|
33 | (1) |
|
|
34 | (3) |
|
|
35 | (2) |
|
4 CHAID-Based Data Mining for Paired-Variable Assessment |
|
|
37 | (10) |
|
|
37 | (1) |
|
|
37 | (1) |
|
4.2.1 An Exemplar Scatterplot |
|
|
38 | (1) |
|
4.3 The Smooth Scatterplot |
|
|
38 | (1) |
|
|
39 | (1) |
|
4.5 CHAID-Based Data Mining for a Smoother Scatterplot |
|
|
40 | (5) |
|
4.5.1 The Smoother Scatterplot |
|
|
42 | (3) |
|
|
45 | (2) |
|
|
45 | (2) |
|
5 The Importance of Straight Data: Simplicity and Desirability for Good Model-Building Practice |
|
|
47 | (8) |
|
|
47 | (1) |
|
5.2 Straightness and Symmetry in Data |
|
|
47 | (1) |
|
5.3 Data Mining Is a High Concept |
|
|
48 | (1) |
|
5.4 The Correlation Coefficient |
|
|
48 | (2) |
|
5.5 Scatterplot of (xx3, yy3) |
|
|
50 | (1) |
|
5.6 Data Mining the Relationship of (xx3, yy3) |
|
|
50 | (3) |
|
5.6.1 Side-by-Side Scatterplot |
|
|
53 | (1) |
|
5.7 What Is the GP-Based Data Mining Doing to the Data? |
|
|
53 | (1) |
|
5.8 Straightening a Handful of Variables and a Baker's Dozen of Variables |
|
|
53 | (1) |
|
|
54 | (1) |
|
|
54 | (1) |
|
6 Symmetrizing Ranked Data: A Statistical Data Mining Method for Improving the Predictive Power of Data |
|
|
55 | (14) |
|
|
55 | (1) |
|
6.2 Scales of Measurement |
|
|
55 | (2) |
|
6.3 Stem-and-Leaf Display |
|
|
57 | (1) |
|
6.4 Box-and-Whiskers Plot |
|
|
58 | (1) |
|
6.5 Illustration of the Symmetrizing Ranked Data Method |
|
|
58 | (10) |
|
|
59 | (1) |
|
6.5.1.1 Discussion of Illustration 1 |
|
|
59 | (2) |
|
|
61 | (1) |
|
|
62 | (1) |
|
6.5.2.2 Looking at the Recoded Titanic Ordinal Variables CLASS_, AGE_, GENDER_, CLASS_AGE_, and CLASS_GENDER_ |
|
|
62 | (2) |
|
6.5.2.3 Looking at the Symmetrized-Ranked Titanic Ordinal Variables rCLASS_, rAGE_, rGENDER_, rCLASS_AGE_, and rCLASS_GENDER_ |
|
|
64 | (1) |
|
6.5.2.4 Building a Preliminary Titanic Model |
|
|
65 | (3) |
|
|
68 | (1) |
|
|
68 | (1) |
|
7 Principal Component Analysis: A Statistical Data Mining Method for Many-Variable Assessment |
|
|
69 | (12) |
|
|
69 | (1) |
|
7.2 EDA Reexpression Paradigm |
|
|
69 | (1) |
|
7.3 What Is the Big Deal? |
|
|
70 | (1) |
|
|
70 | (1) |
|
7.5 Exemplary Detailed Illustration |
|
|
71 | (1) |
|
|
71 | (1) |
|
7.6 Algebraic Properties of PCA |
|
|
72 | (1) |
|
7.7 Uncommon Illustration |
|
|
73 | (3) |
|
7.7.1 PCA of R_CD Elements (X1, X2, X3, X4, X5, X6) |
|
|
74 | (1) |
|
7.7.2 Discussion of the PCA of R_CD Elements |
|
|
74 | (2) |
|
7.8 PCA in the Construction of Quasi-Interaction Variables |
|
|
76 | (4) |
|
7.8.1 SAS Program for the PCA of the Quasi-Interaction Variable |
|
|
78 | (2) |
|
|
80 | (1) |
|
8 Market Share Estimation: Data Mining for an Exceptional Case |
|
|
81 | (16) |
|
|
81 | (1) |
|
|
81 | (1) |
|
8.3 Data Mining for an Exceptional Case |
|
|
82 | (1) |
|
8.3.1 Exceptional Case: Infant Formula YUM |
|
|
82 | (1) |
|
8.4 Building the RAL-YUM Market Share Model |
|
|
83 | (10) |
|
8.4.1 Decile Analysis of YUM_3mos MARKET-SHARE Model |
|
|
92 | (1) |
|
8.4.2 Conclusion of YUM_3mos MARKET-SHARE Model |
|
|
92 | (1) |
|
|
93 | (4) |
|
Appendix 8.A Dummify PROMO_Code |
|
|
93 | (1) |
|
Appendix 8.B PCA of PROMO_Code Dummy Variables |
|
|
94 | (1) |
|
Appendix 8.C Logistic Regression YUM_3mos on PROMO_Code Dummy Variables |
|
|
94 | (1) |
|
Appendix 8.D Creating YUM_3mos_wo_PROMO_CodeEff |
|
|
94 | (1) |
|
Appendix 8.E Normalizing a Variable to Lie Within [ 0, 1] |
|
|
95 | (1) |
|
|
96 | (1) |
|
9 The Correlation Coefficient: Its Values Range between Plus and Minus 1, or Do They? |
|
|
97 | (8) |
|
|
97 | (1) |
|
9.2 Basics of the Correlation Coefficient |
|
|
97 | (2) |
|
9.3 Calculation of the Correlation Coefficient |
|
|
99 | (1) |
|
|
99 | (2) |
|
9.5 Calculation of the Adjusted Correlation Coefficient |
|
|
101 | (1) |
|
9.6 Implication of Rematching |
|
|
102 | (1) |
|
|
102 | (3) |
|
10 Logistic Regression: The Workhorse of Response Modeling |
|
|
105 | (46) |
|
|
105 | (1) |
|
10.2 Logistic Regression Model |
|
|
106 | (3) |
|
|
106 | (1) |
|
|
107 | (2) |
|
|
109 | (1) |
|
10.3.1 Candidate Predictor and Dependent Variables |
|
|
110 | (1) |
|
10.4 Logits and Logit Plots |
|
|
110 | (2) |
|
10.4.1 Logits for Case Study |
|
|
111 | (1) |
|
10.5 The Importance of Straight Data |
|
|
112 | (1) |
|
10.6 Reexpressing for Straight Data |
|
|
112 | (3) |
|
|
113 | (1) |
|
|
114 | (1) |
|
10.6.3 Measuring Straight Data |
|
|
114 | (1) |
|
10.7 Straight Data for Case Study |
|
|
115 | (3) |
|
10.7.1 Reexpressing FD2_OPEN |
|
|
116 | (1) |
|
10.7.2 Reexpressing INVESTMENT |
|
|
116 | (2) |
|
10.8 Techniques when the Bulging Rule Does Not Apply |
|
|
118 | (1) |
|
|
118 | (1) |
|
10.8.2 Smooth Predicted-versus-Actual Plot |
|
|
119 | (1) |
|
10.9 Reexpressing MOS_OPEN |
|
|
119 | (4) |
|
10.9.1 Plot of Smooth Predicted versus Actual for MOS_OPEN |
|
|
120 | (3) |
|
10.10 Assessing the Importance of Variables |
|
|
123 | (2) |
|
10.10.1 Computing the G Statistic |
|
|
123 | (1) |
|
10.10.2 Importance of a Single Variable |
|
|
124 | (1) |
|
10.10.3 Importance of a Subset of Variables |
|
|
124 | (1) |
|
10.10.4 Comparing the Importance of Different Subsets of Variables |
|
|
124 | (1) |
|
10.11 Important Variables for Case Study |
|
|
125 | (2) |
|
10.11.1 Importance of the Predictor Variables |
|
|
126 | (1) |
|
10.12 Relative Importance of the Variables |
|
|
127 | (1) |
|
10.12.1 Selecting the Best Subset |
|
|
127 | (1) |
|
10.13 Best Subset of Variables for Case Study |
|
|
128 | (1) |
|
10.14 Visual Indicators of Goodness of Model Predictions |
|
|
129 | (7) |
|
10.14.1 Plot of Smooth Residual by Score Groups |
|
|
130 | (1) |
|
10.14.1.1 Plot of the Smooth Residual by Score Groups for Case Study |
|
|
130 | (2) |
|
10.14.2 Plot of Smooth Actual versus Predicted by Decile Groups |
|
|
132 | (1) |
|
10.14.2.1 Plot of Smooth Actual versus Predicted by Decile Groups for Case Study |
|
|
132 | (2) |
|
10.14.3 Plot of Smooth Actual versus Predicted by Score Groups |
|
|
134 | (1) |
|
10.14.3.1 Plot of Smooth Actual versus Predicted by Score Groups for Case Study |
|
|
134 | (2) |
|
10.15 Evaluating the Data Mining Work |
|
|
136 | (5) |
|
10.15.1 Comparison of Plots of Smooth Residual by Score Groups: EDA versus Non-EDA Models |
|
|
137 | (2) |
|
10.15.2 Comparison of the Plots of Smooth Actual versus Predicted by Decile Groups: EDA versus Non-EDA Models |
|
|
139 | (1) |
|
10.15.3 Comparison of Plots of Smooth Actual versus Predicted by Score Groups: EDA versus Non-EDA Models |
|
|
140 | (1) |
|
10.15.4 Summary of the Data Mining Work |
|
|
141 | (1) |
|
10.16 Smoothing a Categorical Variable |
|
|
141 | (4) |
|
10.16.1 Smoothing FD_TYPE with CHAID |
|
|
142 | (2) |
|
10.16.2 Importance of CH_FTY_1 and CH_FTY_2 |
|
|
144 | (1) |
|
10.17 Additional Data Mining Work for Case Study |
|
|
145 | (5) |
|
10.17.1 Comparison of Plots of Smooth Residual by Score Group: 4var-EDA versus 3var-EDA Models |
|
|
146 | (1) |
|
10.17.2 Comparison of the Plots of Smooth Actual versus Predicted by Decile Groups: 4var-EDA versus 3var-EDA Models |
|
|
147 | (1) |
|
10.17.3 Comparison of Plots of Smooth Actual versus Predicted by Score Groups: 4var-EDA versus 3var-EDA Models |
|
|
147 | (2) |
|
10.17.4 Final Summary of the Additional Data Mining Work |
|
|
149 | (1) |
|
|
150 | (1) |
|
11 Predicting Share of Wallet without Survey Data |
|
|
151 | (18) |
|
|
151 | (1) |
|
|
151 | (2) |
|
|
152 | (1) |
|
11.2.1.1 SOW_q Definition |
|
|
152 | (1) |
|
11.2.1.2 SOW_q Likeliness Assumption |
|
|
152 | (1) |
|
11.3 Illustration of Calculation of SOW_q |
|
|
153 | (5) |
|
|
153 | (1) |
|
11.3.2 DOLLARS and TOTAL DOLLARS |
|
|
153 | (5) |
|
11.4 Building the AMPECS SOW_q Model |
|
|
158 | (1) |
|
11.5 SOW_q Model Definition |
|
|
159 | (2) |
|
11.5.1 SOW_q Model Results |
|
|
160 | (1) |
|
|
161 | (8) |
|
|
162 | (2) |
|
Appendix 11.B Seven Steps |
|
|
164 | (3) |
|
|
167 | (2) |
|
12 Ordinary Regression: The Workhorse of Profit Modeling |
|
|
169 | (20) |
|
|
169 | (1) |
|
12.2 Ordinary Regression Model |
|
|
169 | (3) |
|
|
170 | (1) |
|
12.2.2 Scoring an OLS Profit Model |
|
|
171 | (1) |
|
|
172 | (8) |
|
12.3.1 Straight Data for Mini Case Study |
|
|
172 | (2) |
|
12.3.1.1 Reexpressing INCOME |
|
|
174 | (1) |
|
12.3.1.2 Reexpressing AGE |
|
|
175 | (2) |
|
12.3.2 Plot of Smooth Predicted versus Actual |
|
|
177 | (1) |
|
12.3.3 Assessing the Importance of Variables |
|
|
178 | (1) |
|
12.3.3.1 Defining the F Statistic and R-Squared |
|
|
179 | (1) |
|
12.3.3.2 Importance of a Single Variable |
|
|
179 | (1) |
|
12.3.3.3 Importance of a Subset of Variables |
|
|
179 | (1) |
|
12.3.3.4 Comparing the Importance of Different Subsets of Variables |
|
|
180 | (1) |
|
12.4 Important Variables for Mini Case Study |
|
|
180 | (2) |
|
12.4.1 Relative Importance of the Variables |
|
|
181 | (1) |
|
12.4.2 Selecting the Best Subset |
|
|
181 | (1) |
|
12.5 Best Subset of Variables for Case Study |
|
|
182 | (3) |
|
12.5.1 PROFIT Model with gINCOME and AGE |
|
|
183 | (2) |
|
|
185 | (1) |
|
12.6 Suppressor Variable AGE |
|
|
185 | (1) |
|
|
186 | (3) |
|
|
187 | (2) |
|
13 Variable Selection Methods in Regression: Ignorable Problem, Notable Solution |
|
|
189 | (14) |
|
|
189 | (1) |
|
|
189 | (3) |
|
13.3 Frequently Used Variable Selection Methods |
|
|
192 | (1) |
|
13.4 Weakness in the Stepwise |
|
|
193 | (1) |
|
13.5 Enhanced Variable Selection Method |
|
|
194 | (2) |
|
13.6 Exploratory Data Analysis |
|
|
196 | (4) |
|
|
200 | (3) |
|
|
200 | (3) |
|
14 CHAID for Interpreting a Logistic Regression Model |
|
|
203 | (16) |
|
|
203 | (1) |
|
14.2 Logistic Regression Model |
|
|
203 | (1) |
|
14.3 Database Marketing Response Model Case Study |
|
|
204 | (1) |
|
|
205 | (1) |
|
|
205 | (3) |
|
14.4.1 Proposed CHAID-Based Method |
|
|
206 | (2) |
|
14.5 Multivariable CHAID Trees |
|
|
208 | (2) |
|
14.6 CHAID Market Segmentation |
|
|
210 | (3) |
|
|
213 | (3) |
|
|
216 | (3) |
|
15 The Importance of the Regression Coefficient |
|
|
219 | (10) |
|
|
219 | (1) |
|
15.2 The Ordinary Regression Model |
|
|
219 | (1) |
|
|
220 | (1) |
|
15.4 Important Predictor Variables |
|
|
220 | (1) |
|
15.5 P-Values and Big Data |
|
|
221 | (1) |
|
15.6 Returning to Question 1 |
|
|
222 | (1) |
|
15.7 Effect of Predictor Variable on Prediction |
|
|
222 | (1) |
|
|
223 | (2) |
|
15.9 Returning to Question 2 |
|
|
225 | (1) |
|
15.10 Ranking Predictor Variables by Effect on Prediction |
|
|
225 | (1) |
|
15.11 Returning to Question 3 |
|
|
226 | (1) |
|
15.12 Returning to Question 4 |
|
|
227 | (1) |
|
|
227 | (2) |
|
|
228 | (1) |
|
16 The Average Correlation: A Statistical Data Mining Measure for Assessment of Competing Predictive Models and the Importance of the Predictor Variables |
|
|
229 | (10) |
|
|
229 | (1) |
|
|
229 | (2) |
|
16.3 Illustration of the Difference between Reliability and Validity |
|
|
231 | (1) |
|
16.4 Illustration of the Relationship between Reliability and Validity |
|
|
231 | (1) |
|
16.5 The Average Correlation |
|
|
232 | (5) |
|
16.5.1 Illustration of the Average Correlation with an LTV5 Model |
|
|
232 | (4) |
|
16.5.2 Continuing with the Illustration of the Average Correlation with an LTV5 Model |
|
|
236 | (1) |
|
16.5.3 Continuing with the Illustration with a Competing LTV5 Model |
|
|
236 | (1) |
|
16.5.3.1 The Importance of the Predictor Variables |
|
|
237 | (1) |
|
|
237 | (2) |
|
|
237 | (2) |
|
17 CHAID for Specifying a Model with Interaction Variables |
|
|
239 | (12) |
|
|
239 | (1) |
|
17.2 Interaction Variables |
|
|
239 | (1) |
|
17.3 Strategy for Modeling with Interaction Variables |
|
|
240 | (1) |
|
17.4 Strategy Based on the Notion of a Special Point |
|
|
240 | (1) |
|
17.5 Example of a Response Model with an Interaction Variable |
|
|
241 | (1) |
|
17.6 CHAID for Uncovering Relationships |
|
|
242 | (1) |
|
17.7 Illustration of CHAID for Specifying a Model |
|
|
243 | (3) |
|
|
246 | (1) |
|
17.9 Database Implication |
|
|
247 | (1) |
|
|
248 | (3) |
|
|
249 | (2) |
|
18 Market Segmentation Classification Modeling with Logistic Regression |
|
|
251 | (14) |
|
|
251 | (1) |
|
18.2 Binary Logistic Regression |
|
|
251 | (1) |
|
18.2.1 Necessary Notation |
|
|
252 | (1) |
|
18.3 Polychotomous Logistic Regression Model |
|
|
252 | (1) |
|
18.4 Model Building with PLR |
|
|
253 | (1) |
|
18.5 Market Segmentation Classification Model |
|
|
254 | (9) |
|
18.5.1 Survey of Cellular Phone Users |
|
|
254 | (1) |
|
|
255 | (3) |
|
|
258 | (3) |
|
18.5.4 Market Segmentation Classification Model |
|
|
261 | (2) |
|
|
263 | (2) |
|
19 Market Segmentation Based on Time-Series Data Using Latent Class Analysis |
|
|
265 | (22) |
|
|
265 | (1) |
|
|
265 | (5) |
|
19.2.1 K-Means Clustering |
|
|
265 | (1) |
|
|
266 | (1) |
|
|
266 | (1) |
|
|
267 | (1) |
|
19.2.3.2 FA Model Estimation |
|
|
267 | (1) |
|
19.2.3.3 FA versus OLS Graphical Depiction |
|
|
268 | (1) |
|
19.2.4 LCA versus FA Graphical Depiction |
|
|
268 | (2) |
|
|
270 | (2) |
|
19.3.1 LCA of Universal and Particular Study |
|
|
270 | (1) |
|
19.3.1.1 Discussion of LCA Output |
|
|
270 | (1) |
|
19.3.1.2 Discussion of Posterior Probability |
|
|
271 | (1) |
|
19.4 LCA versus k-Means Clustering |
|
|
272 | (2) |
|
19.5 LCA Market Segmentation Model Based on Time-Series Data |
|
|
274 | (8) |
|
|
274 | (2) |
|
|
276 | (2) |
|
19.5.2.1 Cluster Sizes and Conditional Probabilities/Means |
|
|
278 | (3) |
|
19.5.2.2 Indicator-Level Posterior Probabilities |
|
|
281 | (1) |
|
|
282 | (5) |
|
Appendix 19.A Creating Trend3 for UNITS |
|
|
282 | (2) |
|
Appendix 19.B POS-ZER-NEG Creating Trend4 |
|
|
284 | (1) |
|
|
285 | (2) |
|
20 Market Segmentation: An Easy Way to Understand the Segments |
|
|
287 | (6) |
|
|
287 | (1) |
|
|
287 | (1) |
|
|
288 | (1) |
|
20.4 Understanding the Segments |
|
|
289 | (1) |
|
|
290 | (3) |
|
Appendix 20.A Dataset SAMPLE |
|
|
290 | (1) |
|
Appendix 20.B Segmentor-Means |
|
|
291 | (1) |
|
Appendix 20.C Indexed Profiles |
|
|
291 | (1) |
|
|
292 | (1) |
|
21 The Statistical Regression Model: An Easy Way to Understand the Model |
|
|
293 | (14) |
|
|
293 | (1) |
|
|
293 | (1) |
|
21.3 EZ-Method Applied to the LR Model |
|
|
294 | (2) |
|
21.4 Discussion of the LR EZ-Method Illustration |
|
|
296 | (3) |
|
|
299 | (8) |
|
Appendix 21.A M65-Spread Base Means X10--X14 |
|
|
299 | (2) |
|
Appendix 21.B Create Ten Datasets for Each Decile |
|
|
301 | (1) |
|
Appendix 21.C Indexed Profiles of Deciles |
|
|
302 | (5) |
|
22 CHAID as a Method for Filling in Missing Values |
|
|
307 | (16) |
|
|
307 | (1) |
|
22.2 Introduction to the Problem of Missing Data |
|
|
307 | (2) |
|
22.3 Missing Data Assumption |
|
|
309 | (1) |
|
|
310 | (1) |
|
|
311 | (5) |
|
22.5.1 CHAID Mean-Value Imputation for a Continuous Variable |
|
|
312 | (1) |
|
22.5.2 Many Mean-Value CHAID Imputations for a Continuous Variable |
|
|
313 | (1) |
|
22.5.3 Regression Tree Imputation for LIFE_DOL |
|
|
314 | (2) |
|
22.6 CHAID Most Likely Category Imputation for a Categorical Variable |
|
|
316 | (4) |
|
22.6.1 CHAID Most Likely Category Imputation for GENDER |
|
|
316 | (2) |
|
22.6.2 Classification Tree Imputation for GENDER |
|
|
318 | (2) |
|
|
320 | (3) |
|
|
321 | (2) |
|
23 Model Building with Big Complete and Incomplete Data |
|
|
323 | (12) |
|
|
323 | (1) |
|
|
323 | (1) |
|
23.3 The CCA-PCA Method: Illustration Details |
|
|
324 | (2) |
|
23.3.1 Determining the Complete and Incomplete Datasets |
|
|
324 | (2) |
|
23.4 Building the RESPONSE Model with Complete (CCA) Dataset |
|
|
326 | (2) |
|
23.4.1 CCA RESPONSE Model Results |
|
|
327 | (1) |
|
23.5 Building the RESPONSE Model with Incomplete (ICA) Dataset |
|
|
328 | (1) |
|
|
329 | (1) |
|
23.6 Building the RESPONSE Model on PCA-BICA Data |
|
|
329 | (3) |
|
23.6.1 PCA-BICA RESPONSE Model Results |
|
|
330 | (1) |
|
23.6.2 Combined CCA and PCA-BICA RESPONSE Model Results |
|
|
331 | (1) |
|
|
332 | (3) |
|
|
333 | (1) |
|
Appendix 23.B Testing CCA Samsizes |
|
|
333 | (1) |
|
Appendix 23.C CCA-CIA Datasets |
|
|
333 | (1) |
|
Appendix 23.D Ones and Zeros |
|
|
333 | (1) |
|
|
334 | (1) |
|
24 Art, Science, Numbers, and Poetry |
|
|
335 | (6) |
|
|
335 | (1) |
|
|
336 | (1) |
|
|
336 | (2) |
|
24.4 The Statistical Golden Rule: Measuring the Art and Science of Statistical Practice |
|
|
338 | (2) |
|
|
338 | (1) |
|
24.4.1.1 The Statistical Golden Rule |
|
|
339 | (1) |
|
|
340 | (1) |
|
|
340 | (1) |
|
25 Identifying Your Best Customers: Descriptive, Predictive, and Look-Alike Profiling |
|
|
341 | (14) |
|
|
341 | (1) |
|
|
341 | (1) |
|
25.3 Illustration of a Flawed Targeting Effort |
|
|
342 | (1) |
|
25.4 Well-Defined Targeting Effort |
|
|
343 | (2) |
|
|
345 | (3) |
|
|
348 | (2) |
|
25.7 Look-Alike Profiling |
|
|
350 | (3) |
|
25.8 Look-Alike Tree Characteristics |
|
|
353 | (1) |
|
|
353 | (2) |
|
26 Assessment of Marketing Models |
|
|
355 | (12) |
|
|
355 | (1) |
|
26.2 Accuracy for Response Model |
|
|
355 | (1) |
|
26.3 Accuracy for Profit Model |
|
|
356 | (2) |
|
26.4 Decile Analysis and Cum Lift for Response Model |
|
|
358 | (1) |
|
26.5 Decile Analysis and Cum Lift for Profit Model |
|
|
359 | (1) |
|
26.6 Precision for Response Model |
|
|
360 | (2) |
|
26.7 Precision for Profit Model |
|
|
362 | (1) |
|
26.7.1 Construction of SWMAD |
|
|
363 | (1) |
|
26.8 Separability for Response and Profit Models |
|
|
363 | (1) |
|
26.9 Guidelines for Using Cum Lift, HL/SWMAD, and CV |
|
|
364 | (1) |
|
|
364 | (3) |
|
27 Decile Analysis: Perspective and Performance |
|
|
367 | (20) |
|
|
367 | (1) |
|
|
367 | (4) |
|
|
369 | (1) |
|
27.2.1.1 Discussion of Classification Table of RESPONSE Model |
|
|
370 | (1) |
|
27.3 Assessing Performance: RESPONSE Model versus Chance Model |
|
|
371 | (1) |
|
27.4 Assessing Performance: The Decile Analysis |
|
|
372 | (5) |
|
27.4.1 The RESPONSE Decile Analysis |
|
|
372 | (5) |
|
|
377 | (10) |
|
Appendix 27.A Incremental Gain in Accuracy: Model versus Chance |
|
|
378 | (1) |
|
Appendix 27.B Incremental Gain in Precision: Model versus Chance |
|
|
379 | (1) |
|
Appendix 27.C RESPONSE Model Decile PROB_est Values |
|
|
380 | (2) |
|
Appendix 27.D 2×2 Tables by Decile |
|
|
382 | (3) |
|
|
385 | (2) |
|
28 Net T-C Lift Model: Assessing the Net Effects of Test and Control Campaigns |
|
|
387 | (26) |
|
|
387 | (1) |
|
|
387 | (2) |
|
28.3 Building TEST and CONTROL Response Models |
|
|
389 | (5) |
|
28.3.1 Building TEST Response Model |
|
|
390 | (2) |
|
28.3.2 Building CONTROL Response Model |
|
|
392 | (2) |
|
|
394 | (4) |
|
28.4.1 Building the Net T-C Lift Model |
|
|
395 | (1) |
|
28.4.1.1 Discussion of the Net T-C Lift Model |
|
|
395 | (2) |
|
28.4.1.2 Discussion of Equal-Group Sizes Decile of the Net T-C Lift Model |
|
|
397 | (1) |
|
|
398 | (15) |
|
Appendix 28.A TEST Logistic with Xs |
|
|
400 | (2) |
|
Appendix 28.B CONTROL Logistic with Xs |
|
|
402 | (3) |
|
Appendix 28.C Merge Score |
|
|
405 | (1) |
|
Appendix 28.D NET T-C Decile Analysis |
|
|
406 | (4) |
|
|
410 | (3) |
|
29 Bootstrapping in Marketing: A New Approach for Validating Models |
|
|
413 | (16) |
|
|
413 | (1) |
|
29.2 Traditional Model Validation |
|
|
413 | (1) |
|
|
414 | (1) |
|
|
415 | (1) |
|
29.5 The Bootstrap Method |
|
|
416 | (1) |
|
29.5.1 Traditional Construction of Confidence Intervals |
|
|
416 | (1) |
|
|
417 | (2) |
|
29.6.1 Simple Illustration |
|
|
418 | (1) |
|
29.7 Bootstrap Decile Analysis Validation |
|
|
419 | (1) |
|
|
420 | (1) |
|
29.9 Bootstrap Assessment of Model Implementation Performance |
|
|
421 | (5) |
|
|
424 | (2) |
|
29.10 Bootstrap Assessment of Model Efficiency |
|
|
426 | (2) |
|
|
428 | (1) |
|
|
428 | (1) |
|
30 Validating the Logistic Regression Model: Try Bootstrapping |
|
|
429 | (2) |
|
|
429 | (1) |
|
30.2 Logistic Regression Model |
|
|
429 | (1) |
|
30.3 The Bootstrap Validation Method |
|
|
429 | (1) |
|
|
430 | (1) |
|
|
430 | (1) |
|
31 Visualization of Marketing Models: Data Mining to Uncover Innards of a Model |
|
|
431 | (22) |
|
|
431 | (1) |
|
31.2 Brief History of the Graph |
|
|
431 | (1) |
|
|
432 | (2) |
|
|
433 | (1) |
|
31.4 Star Graphs for Single Variables |
|
|
434 | (1) |
|
31.5 Star Graphs for Many Variables Considered Jointly |
|
|
435 | (2) |
|
31.6 Profile Curves Method |
|
|
437 | (1) |
|
31.6.1 Profile Curves Basics |
|
|
437 | (1) |
|
|
438 | (1) |
|
|
438 | (6) |
|
31.7.1 Profile Curves for RESPONSE Model |
|
|
440 | (2) |
|
31.7.2 Decile Group Profile Curves |
|
|
442 | (2) |
|
|
444 | (9) |
|
Appendix 31.A Star Graphs for Each Demographic Variable about the Deciles |
|
|
445 | (2) |
|
Appendix 31.B Star Graphs for Each Decile about the Demographic Variables |
|
|
447 | (3) |
|
Appendix 31.C Profile Curves: All Deciles |
|
|
450 | (2) |
|
|
452 | (1) |
|
32 The Predictive Contribution Coefficient: A Measure of Predictive Importance |
|
|
453 | (12) |
|
|
453 | (1) |
|
|
453 | (2) |
|
32.3 Illustration of Decision Rule |
|
|
455 | (2) |
|
32.4 Predictive Contribution Coefficient |
|
|
457 | (1) |
|
32.5 Calculation of Predictive Contribution Coefficient |
|
|
458 | (1) |
|
32.6 Extra-Illustration of Predictive Contribution Coefficient |
|
|
459 | (3) |
|
|
462 | (3) |
|
|
463 | (2) |
|
33 Regression Modeling Involves Art, Science, and Poetry, Too |
|
|
465 | (6) |
|
|
465 | (1) |
|
33.2 Shakespearean Modelogue |
|
|
465 | (1) |
|
33.3 Interpretation of the Shakespearean Modelogue |
|
|
466 | (3) |
|
|
469 | (2) |
|
|
469 | (2) |
|
34 Opening the Dataset: A Twelve-Step Program for Dataholics |
|
|
471 | (6) |
|
|
471 | (1) |
|
|
471 | (1) |
|
|
471 | (2) |
|
|
473 | (1) |
|
|
474 | (3) |
|
|
474 | (1) |
|
Appendix 34.B SamsizePlus |
|
|
475 | (1) |
|
Appendix 34.C Copy-Pasteable |
|
|
475 | (1) |
|
|
475 | (1) |
|
|
476 | (1) |
|
35 Genetic and Statistic Regression Models: A Comparison |
|
|
477 | (10) |
|
|
477 | (1) |
|
|
477 | (1) |
|
|
478 | (1) |
|
35.4 The GenIQ Model, the Genetic Logistic Regression |
|
|
478 | (2) |
|
35.4.1 Illustration of "Filling Up the Upper Deciles" |
|
|
479 | (1) |
|
35.5 A Pithy Summary of the Development of Genetic Programming |
|
|
480 | (2) |
|
35.6 The GenIQ Model: A Brief Review of Its Objective and Salient Features |
|
|
482 | (1) |
|
35.6.1 The GenIQ Model Requires Selection of Variables and Function: An Extra Burden? |
|
|
482 | (1) |
|
35.7 The GenIQ Model: How It Works |
|
|
483 | (3) |
|
35.7.1 The GenIQ Model Maximizes the Decile Table |
|
|
485 | (1) |
|
|
486 | (1) |
|
|
486 | (1) |
|
36 Data Reuse: A Powerful Data Mining Effect of the GenIQ Model |
|
|
487 | (8) |
|
|
487 | (1) |
|
|
487 | (1) |
|
36.3 Illustration of Data Reuse |
|
|
488 | (3) |
|
36.3.1 The GenIQ Profit Model |
|
|
488 | (1) |
|
36.3.2 Data-Reused Variables |
|
|
489 | (1) |
|
36.3.3 Data-Reused Variables GenIQvar_1 and GenIQvar_2 |
|
|
490 | (1) |
|
36.4 Modified Data Reuse: A GenIQ-Enhanced Regression Model |
|
|
491 | (2) |
|
36.4.1 Illustration of a GenIQ-Enhanced LRM |
|
|
491 | (2) |
|
|
493 | (2) |
|
37 A Data Mining Method for Moderating Outliers Instead of Discarding Them |
|
|
495 | (6) |
|
|
495 | (1) |
|
|
495 | (1) |
|
37.3 Moderating Outliers Instead of Discarding Them |
|
|
496 | (3) |
|
37.3.1 Illustration of Moderating Outliers Instead of Discarding Them |
|
|
496 | (2) |
|
37.3.2 The GenIQ Model for Moderating the Outlier |
|
|
498 | (1) |
|
|
499 | (2) |
|
|
499 | (2) |
|
38 Overfitting: Old Problem, New Solution |
|
|
501 | (8) |
|
|
501 | (1) |
|
|
501 | (2) |
|
38.2.1 Idiomatic Definition of Overfitting to Help Remember the Concept |
|
|
502 | (1) |
|
38.3 The GenIQ Model Solution to Overfitting |
|
|
503 | (5) |
|
38.3.1 RANDOM.SPLIT GenIQ Model |
|
|
505 | (1) |
|
38.3.2 RANDOM_SPLIT GenIQ Model Decile Analysis |
|
|
505 | (2) |
|
38.3.3 Quasi N-tile Analysis |
|
|
507 | (1) |
|
|
508 | (1) |
|
39 The Importance of Straight Data: Revisited |
|
|
509 | (4) |
|
|
509 | (1) |
|
39.2 Restatement of Why It Is Important to Straighten Data |
|
|
509 | (1) |
|
39.3 Restatement of Section 12.3.1.1 "Reexpressing INCOME" |
|
|
510 | (1) |
|
39.3.1 Complete Exposition of Reexpressing INCOME |
|
|
510 | (1) |
|
39.3.1.1 The GenIQ Model Detail of the gINCOME Structure |
|
|
511 | (1) |
|
39.4 Restatement of Section 5.6 "Data Mining the Relationship of (xx3, yy3)" |
|
|
511 | (1) |
|
39.4.1 The GenIQ Model Detail of the GenIQvar(yy3) Structure |
|
|
511 | (1) |
|
|
512 | (1) |
|
40 The GenIQ Model: Its Definition and an Application |
|
|
513 | (16) |
|
|
513 | (1) |
|
40.2 What Is Optimization? |
|
|
513 | (1) |
|
40.3 What Is Genetic Modeling? |
|
|
514 | (1) |
|
40.4 Genetic Modeling: An Illustration |
|
|
515 | (4) |
|
|
517 | (1) |
|
|
518 | (1) |
|
|
518 | (1) |
|
40.5 Parameters for Controlling a Genetic Model Run |
|
|
519 | (1) |
|
40.6 Genetic Modeling: Strengths and Limitations |
|
|
519 | (1) |
|
40.7 Goals of Marketing Modeling |
|
|
520 | (1) |
|
40.8 The GenIQ Response Model |
|
|
520 | (1) |
|
40.9 The GenIQ Profit Model |
|
|
521 | (1) |
|
40.10 Case Study: Response Model |
|
|
522 | (2) |
|
40.11 Case Study: Profit Model |
|
|
524 | (3) |
|
|
527 | (2) |
|
|
527 | (2) |
|
41 Finding the Best Variables for Marketing Models |
|
|
529 | (18) |
|
|
529 | (1) |
|
|
529 | (2) |
|
41.3 Weakness in the Variable Selection Methods |
|
|
531 | (1) |
|
41.4 Goals of Modeling in Marketing |
|
|
532 | (1) |
|
41.5 Variable Selection with GenIQ |
|
|
533 | (9) |
|
|
535 | (2) |
|
41.5.2 GenIQ Structure Identification |
|
|
537 | (2) |
|
41.5.3 GenIQ Variable Selection |
|
|
539 | (3) |
|
41.6 Nonlinear Alternative to Logistic Regression Model |
|
|
542 | (3) |
|
|
545 | (2) |
|
|
546 | (1) |
|
42 Interpretation of Coefficient-Free Models |
|
|
547 | (22) |
|
|
547 | (1) |
|
42.2 The Linear Regression Coefficient |
|
|
547 | (2) |
|
42.2.1 Illustration for the Simple Ordinary Regression Model |
|
|
548 | (1) |
|
42.2.2 Illustration for the Simple Logistic Regression Model |
|
|
548 | (1) |
|
42.3 The Quasi-Regression Coefficient for Simple Regression Models |
|
|
549 | (4) |
|
42.3.1 Illustration of Quasi-RC for the Simple Ordinary Regression Model |
|
|
549 | (1) |
|
42.3.2 Illustration of Quasi-RC for the Simple Logistic Regression Model |
|
|
550 | (1) |
|
42.3.3 Illustration of Quasi-RC for Nonlinear Predictions |
|
|
551 | (2) |
|
42.4 Partial Quasi-RC for the Everymodel |
|
|
553 | (7) |
|
42.4.1 Calculating the Partial Quasi-RC for the Everymodel |
|
|
554 | (1) |
|
42.4.2 Illustration for the Multiple Logistic Regression Model |
|
|
555 | (5) |
|
42.5 Quasi-RC for a Coefficient-Free Model |
|
|
560 | (7) |
|
42.5.1 Illustration of Quasi-RC for a Coefficient-Free Model |
|
|
560 | (7) |
|
|
567 | (2) |
|
43 Text Mining: Primer, Illustration, and TXTDM Software |
|
|
569 | (24) |
|
|
569 | (1) |
|
|
569 | (2) |
|
43.2.1 Text Mining Software: Free versus Commercial versus TXTDM |
|
|
570 | (1) |
|
43.3 Primer of Text Mining |
|
|
571 | (2) |
|
43.4 Statistics of the Words |
|
|
573 | (1) |
|
43.5 The Binary Dataset of Words in Documents |
|
|
574 | (1) |
|
43.6 Illustration of TXTDM Text Mining |
|
|
575 | (9) |
|
43.7 Analysis of the Text-Mined GenIQ_FAVORED Model |
|
|
584 | (1) |
|
43.7.1 Text-Based Profiling of Respondents Who Prefer GenIQ |
|
|
584 | (1) |
|
43.7.2 Text-Based Profiling of Respondents Who Prefer OLS-Logistic |
|
|
585 | (1) |
|
|
585 | (1) |
|
43.9 Clustering Documents |
|
|
586 | (7) |
|
43.9.1 Clustering GenIQ Survey Documents |
|
|
586 | (6) |
|
43.9.1.1 Conclusion of Clustering GenIQ Survey Documents |
|
|
592 | (1) |
|
|
593 | (1) |
|
|
593 | (52) |
|
Appendix 43.A Loading Corpus TEXT Dataset |
|
|
594 | (1) |
|
Appendix 43.B Intermediate Step Creating Binary Words |
|
|
594 | (1) |
|
Appendix 43.C Creating the Final Binary Words |
|
|
595 | (1) |
|
Appendix 43.D Calculate Statistics TF, DF, NUM_DOCS, and N(=Num of Words) |
|
|
596 | (1) |
|
Appendix 43.E Append GenIQ_FAVORED to WORDS Dataset |
|
|
597 | (1) |
|
Appendix 43.F Logistic GenIQ_FAVORED Model |
|
|
598 | (1) |
|
Appendix 43.G Average Correlation among Words |
|
|
599 | (1) |
|
Appendix 43.H Creating TF--IDF |
|
|
600 | (2) |
|
Appendix 43.I WORD_TF--IDF Weights by Concat of WORDS and TF-IDF |
|
|
602 | (2) |
|
Appendix 43.J WORD_RESP WORD_TF--IDF RESP |
|
|
604 | (1) |
|
|
604 | (1) |
|
Appendix 43.L WORD Times TF--IDF |
|
|
604 | (1) |
|
Appendix 43.M Dataset Weighted with Words for Profile |
|
|
605 | (1) |
|
Appendix 43.N VARCLUS for Two-Class Solution |
|
|
606 | (1) |
|
Appendix 43.O Scoring VARCLUS for Two-Cluster Solution |
|
|
606 | (1) |
|
Appendix 43.P Direction of Words with Its Cluster 1 |
|
|
607 | (2) |
|
Appendix 43.Q Performance of GenIQ Model versus Chance Model |
|
|
609 | (1) |
|
Appendix 43.R Performance of Liberal-Cluster Model versus Chance Model |
|
|
609 | (2) |
|
|
610 | (1) |
|
44 Some of My Favorite Statistical Subroutines |
|
|
611 | (34) |
|
|
611 | (1) |
|
44.2 Smoothplots (Mean and Median) of Chapter 5---XI versus X2 |
|
|
611 | (4) |
|
44.3 Smoothplots of Chapter 10---Logit and Probability |
|
|
615 | (3) |
|
44.4 Average Correlation of Chapter 16---Among Var1 Var2 Var3 |
|
|
618 | (2) |
|
44.5 Bootstrapped Decile Analysis of Chapter 29---Using Data from Table 23.4 |
|
|
620 | (7) |
|
44.6 H-Spread Common Region of Chapter 42 |
|
|
627 | (3) |
|
44.7 Favorite---Proc Corr with Option Rank, Vertical Output |
|
|
630 | (1) |
|
44.8 Favorite---Decile Analysis---Response |
|
|
631 | (4) |
|
44.9 Favorite---Decile Analysis---Profit |
|
|
635 | (3) |
|
44.10 Favorite---Smoothing Time-Series Data (Running Medians of Three) |
|
|
638 | (5) |
|
44.11 Favorite---First Cut Is the Deepest---Among Variables with Large Skew Values |
|
|
643 | (2) |
Index |
|
645 | |