Preface |
|
vii | |
Part I Basics of R |
|
|
|
3 | (8) |
|
|
3 | (1) |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
6 | (4) |
|
|
6 | (1) |
|
|
7 | (1) |
|
|
8 | (1) |
|
1.5.4 When Things Go Wrong |
|
|
9 | (1) |
|
|
10 | (1) |
|
2 An Overview of the R Language |
|
|
11 | (36) |
|
|
11 | (2) |
|
|
11 | (1) |
|
|
12 | (1) |
|
2.2 A Quick Tour of R's Capabilities |
|
|
13 | (4) |
|
2.3 Basics of Working with R Commands |
|
|
17 | (1) |
|
|
18 | (12) |
|
|
19 | (2) |
|
2.4.2 Help! A Brief Detour |
|
|
21 | (3) |
|
2.4.3 More on Vectors and Indexing |
|
|
24 | (2) |
|
2.4.4 aaRgh! A Digression for New Programmers |
|
|
26 | (1) |
|
2.4.5 Missing and Interesting Values |
|
|
26 | (2) |
|
2.4.6 Using R for Mathematical Computation |
|
|
28 | (1) |
|
|
28 | (2) |
|
|
30 | (4) |
|
2.6 Loading and Saving Data |
|
|
34 | (4) |
|
|
36 | (1) |
|
|
36 | (2) |
|
2.7 Writing Your Own Functions* |
|
|
38 | (4) |
|
2.7.1 Language Structures* |
|
|
40 | (1) |
|
2.7.2 Anonymous Functions* |
|
|
41 | (1) |
|
|
42 | (1) |
|
|
43 | (1) |
|
|
44 | (3) |
Part II Fundamentals of Data Analysis |
|
|
|
47 | (30) |
|
|
47 | (5) |
|
3.1.1 Store Data: Setting the Structure |
|
|
48 | (2) |
|
3.1.2 Store Data: Simulating Data Points |
|
|
50 | (2) |
|
3.2 Functions to Summarize a Variable |
|
|
52 | (4) |
|
|
52 | (2) |
|
3.2.2 Continuous Variables |
|
|
54 | (2) |
|
3.3 Summarizing Data Frames |
|
|
56 | (5) |
|
|
57 | (1) |
|
|
58 | (1) |
|
3.3.3 Recommended Approach to Inspecting Data |
|
|
59 | (1) |
|
|
59 | (2) |
|
3.4 Single Variable Visualization |
|
|
61 | (13) |
|
|
61 | (5) |
|
|
66 | (2) |
|
3.4.3 QQ Plot to Check Normality* |
|
|
68 | (1) |
|
3.4.4 Cumulative Distribution* |
|
|
69 | (1) |
|
3.4.5 Language Brief: by () and aggregate () |
|
|
70 | (2) |
|
|
72 | (2) |
|
|
74 | (1) |
|
|
75 | (2) |
|
4 Relationships Between Continuous Variables |
|
|
77 | (34) |
|
|
77 | (6) |
|
4.1.1 Simulating Customer Data |
|
|
78 | (1) |
|
4.1.2 Simulating Online and In-Store Sales Data |
|
|
79 | (1) |
|
4.1.3 Simulating Satisfaction Survey Responses |
|
|
80 | (2) |
|
4.1.4 Simulating Non-Response Data |
|
|
82 | (1) |
|
4.2 Exploring Associations Between Variables with Scatterplots |
|
|
83 | (7) |
|
4.2.1 Creating a Basic Scatterplot with plot() |
|
|
83 | (3) |
|
4.2.2 Color-Coding Points on a Scatterplot |
|
|
86 | (2) |
|
4.2.3 Adding a Legend to a Plot |
|
|
88 | (1) |
|
4.2.4 Plotting on a Log Scale |
|
|
89 | (1) |
|
4.3 Combining Plots in a Single Graphics Object |
|
|
90 | (2) |
|
|
92 | (3) |
|
|
92 | (1) |
|
4.4.2 scatterplotmatrix() |
|
|
93 | (2) |
|
4.5 Correlation Coefficients |
|
|
95 | (9) |
|
|
97 | (1) |
|
4.5.2 Correlation Matrices |
|
|
98 | (2) |
|
4.5.3 Transforming Variables before Computing Correlations |
|
|
100 | (2) |
|
4.5.4 Typical Marketing Data Transformations |
|
|
102 | (1) |
|
4.5.5 Box—Cox Transformations* |
|
|
102 | (2) |
|
4.6 Exploring Associations in Survey Responses* |
|
|
104 | (3) |
|
|
105 | (1) |
|
|
106 | (1) |
|
|
107 | (1) |
|
|
108 | (3) |
|
5 Comparing Groups: Tables and Visualizations |
|
|
111 | (24) |
|
5.1 Simulating Consumer Segment Data |
|
|
111 | (9) |
|
5.1.1 Segment Data Definition |
|
|
112 | (2) |
|
5.1.2 Language Brief: for() Loops |
|
|
114 | (2) |
|
5.1.3 Language Brief: if() Blocks |
|
|
116 | (2) |
|
5.1.4 Final Segment Data Generation |
|
|
118 | (2) |
|
5.2 Finding Descriptives by Group |
|
|
120 | (12) |
|
5.2.1 Language Brief: Basic Formula Syntax |
|
|
123 | (1) |
|
5.2.2 Descriptives for Two-Way Groups |
|
|
124 | (2) |
|
5.2.3 Visualization by Group: Frequencies and Proportions |
|
|
126 | (3) |
|
5.2.4 Visualization by Group: Continuous Data |
|
|
129 | (3) |
|
|
132 | (1) |
|
|
133 | (2) |
|
6 Comparing Groups: Statistical Tests |
|
|
135 | (24) |
|
6.1 Data for Comparing Groups |
|
|
135 | (1) |
|
6.2 Testing Group Frequencies: chisq.test() |
|
|
136 | (3) |
|
6.3 Testing Observed Proportions: binom.test() |
|
|
139 | (3) |
|
6.3.1 About Confidence Intervals |
|
|
140 | (1) |
|
6.3.2 More About binom. test() and Binomial Distributions |
|
|
141 | (1) |
|
6.4 Testing Group Means: t.test() |
|
|
142 | (2) |
|
6.5 Testing Multiple Group Means: ANOVA |
|
|
144 | (5) |
|
6.5.1 Model Comparison in ANOVA* |
|
|
146 | (1) |
|
6.5.2 Visualizing Group Confidence Intervals |
|
|
147 | (1) |
|
6.5.3 Variable Selection in ANOVA: Stepwise Modeling* |
|
|
148 | (1) |
|
6.6 Bayesian ANOVA: Getting Started* |
|
|
149 | (7) |
|
|
150 | (1) |
|
6.6.2 Basics of Bayesian ANOVA* |
|
|
150 | (2) |
|
6.6.3 Inspecting the Posterior Draws* |
|
|
152 | (3) |
|
6.6.4 Plotting the Bayesian Credible Intervals* |
|
|
155 | (1) |
|
|
156 | (1) |
|
|
157 | (2) |
|
7 Identifying Drivers of Outcomes: Linear Models |
|
|
159 | (36) |
|
|
160 | (2) |
|
7.1.1 Simulating the Amusement Park Data |
|
|
160 | (2) |
|
7.2 Fitting Linear Models with 1m() |
|
|
162 | (11) |
|
7.2.1 Preliminary Data Inspection |
|
|
163 | (2) |
|
7.2.2 Recap: Bivariate Association |
|
|
165 | (1) |
|
7.2.3 Linear Model with a Single Predictor |
|
|
165 | (1) |
|
|
166 | (3) |
|
|
169 | (4) |
|
7.3 Fitting Linear Models with Multiple Predictors |
|
|
173 | (6) |
|
|
175 | (1) |
|
7.3.2 Using a Model to Make Predictions |
|
|
176 | (1) |
|
7.3.3 Standardizing the Predictors |
|
|
177 | (2) |
|
7.4 Using Factors as Predictors |
|
|
179 | (3) |
|
|
182 | (3) |
|
7.5.1 Language Brief: Advanced Formula Syntax* |
|
|
183 | (2) |
|
|
185 | (1) |
|
7.7 Recommended Procedure for Linear Model Fitting |
|
|
186 | (1) |
|
7.8 Bayesian Linear Models with MCMCregress()* |
|
|
186 | (2) |
|
|
188 | (2) |
|
|
190 | (5) |
Part III Advanced Marketing Applications |
|
|
8 Reducing Data Complexity |
|
|
195 | (30) |
|
8.1 Consumer Brand Rating Data |
|
|
195 | (5) |
|
|
197 | (1) |
|
8.1.2 Aggregate Mean Ratings by Brand |
|
|
198 | (2) |
|
8.2 Principal Component Analysis and Perceptual Maps |
|
|
200 | (9) |
|
|
200 | (3) |
|
|
203 | (1) |
|
8.2.3 PCA for Brand Ratings |
|
|
204 | (2) |
|
8.2.4 Perceptual Map of the Brands |
|
|
206 | (2) |
|
8.2.5 Cautions with Perceptual Maps |
|
|
208 | (1) |
|
8.3 Exploratory Factor Analysis |
|
|
209 | (9) |
|
|
210 | (1) |
|
8.3.2 Finding an EFA Solution |
|
|
211 | (2) |
|
|
213 | (3) |
|
8.3.4 Using Factor Scores for Brands |
|
|
216 | (2) |
|
8.4 Multidimensional Scaling |
|
|
218 | (3) |
|
|
219 | (2) |
|
|
221 | (1) |
|
8.5.1 Principal Component Analysis |
|
|
221 | (1) |
|
|
221 | (1) |
|
8.5.3 Multidimensional Scaling |
|
|
222 | (1) |
|
|
222 | (3) |
|
8.6.1 Principal Component Analysis |
|
|
222 | (1) |
|
8.6.2 Exploratory Factor Analysis |
|
|
222 | (1) |
|
8.6.3 Multidimensional Scaling |
|
|
223 | (2) |
|
9 Additional Linear Modeling Topics |
|
|
225 | (42) |
|
9.1 Handling Highly Correlated Variables |
|
|
226 | (5) |
|
9.1.1 An Initial Linear Model of Online Spend |
|
|
226 | (3) |
|
9.1.2 Remediating Collinearity |
|
|
229 | (2) |
|
9.2 Linear Models for Binary Outcomes: Logistic Regression |
|
|
231 | (11) |
|
9.2.1 Basics of the Logistic Regression Model |
|
|
231 | (1) |
|
9.2.2 Data for Logistic Regression of Season Passes |
|
|
232 | (1) |
|
|
233 | (1) |
|
9.2.4 Language Brief: Classes and Attributes of Objects* |
|
|
234 | (2) |
|
9.2.5 Finalizing the Data |
|
|
236 | (1) |
|
9.2.6 Fitting a Logistic Regression Model |
|
|
237 | (2) |
|
9.2.7 Reconsidering the Model |
|
|
239 | (3) |
|
9.2.8 Additional Discussion |
|
|
242 | (1) |
|
9.3 Hierarchical Linear Models |
|
|
242 | (10) |
|
|
243 | (1) |
|
9.3.2 Ratings-Based Conjoint Analysis for the Amusement Park |
|
|
244 | (1) |
|
9.3.3 Simulating Ratings-Based Conjoint Data |
|
|
245 | (1) |
|
9.3.4 An Initial Linear Model |
|
|
246 | (2) |
|
9.3.5 Hierarchical Linear Model with lme4 |
|
|
248 | (1) |
|
9.3.6 The Complete Hierarchical Linear Model |
|
|
249 | (2) |
|
9.3.7 Summary of HLM with lme4 |
|
|
251 | (1) |
|
9.4 Bayesian Hierarchical Linear Models* |
|
|
252 | (7) |
|
9.4.1 Initial Linear Model with MCMCregress()* |
|
|
253 | (1) |
|
9.4.2 Hierarchical Linear Model with MCMChregress()* |
|
|
253 | (3) |
|
9.4.3 Inspecting Distribution of Preference* |
|
|
256 | (3) |
|
9.5 A Quick Comparison of Frequentist & Bayesian HLMs* |
|
|
259 | (4) |
|
|
263 | (1) |
|
|
263 | (1) |
|
9.6.2 Logistic Regression |
|
|
263 | (1) |
|
9.6.3 Hierarchical Models |
|
|
263 | (1) |
|
9.6.4 Bayesian Hierarchical Models |
|
|
263 | (1) |
|
|
264 | (3) |
|
|
264 | (1) |
|
9.7.2 Logistic Regression |
|
|
264 | (1) |
|
9.7.3 Hierarchical Linear Models |
|
|
265 | (1) |
|
9.7.4 Bayesian Methods for Hierarchical Linear Models |
|
|
266 | (1) |
|
10 Confirmatory Factor Analysis and Structural Equation Modeling |
|
|
267 | (32) |
|
10.1 The Motivation for Structural Models |
|
|
268 | (2) |
|
10.1.1 Structural Models in This Chapter |
|
|
269 | (1) |
|
10.2 Scale Assessment: CFA |
|
|
270 | (13) |
|
10.2.1 Simulating PIES CFA Data |
|
|
272 | (5) |
|
10.2.2 Estimating the PIES CFA Model |
|
|
277 | (1) |
|
10.2.3 Assessing the PIES CFA Model |
|
|
278 | (5) |
|
10.3 General Models: Structural Equation Models |
|
|
283 | (5) |
|
10.3.1 The Repeat Purchase Model in R |
|
|
284 | (2) |
|
10.3.2 Assessing the Repeat Purchase Model |
|
|
286 | (2) |
|
10.4 The Partial Least Squares (PLS) Alternative |
|
|
288 | (9) |
|
10.4.1 PLS-SEM for Repeat Purchase |
|
|
289 | (3) |
|
10.4.2 Visualizing the Fitted PLS Model* |
|
|
292 | (1) |
|
10.4.3 Assessing the PLS-SEM Model |
|
|
293 | (2) |
|
10.4.4 PLS-SEM with the Larger Sample |
|
|
295 | (2) |
|
|
297 | (1) |
|
|
297 | (2) |
|
11 Segmentation: Clustering and Classification |
|
|
299 | (40) |
|
11.1 Segmentation Philosophy |
|
|
299 | (3) |
|
11.1.1 The Difficulty of Segmentation |
|
|
299 | (1) |
|
11.1.2 Segmentation as Clustering and Classification |
|
|
300 | (2) |
|
|
302 | (1) |
|
|
302 | (20) |
|
11.3.1 The Steps of Clustering |
|
|
303 | (2) |
|
11.3.2 Hierarchical Clustering: hclust() Basics |
|
|
305 | (4) |
|
11.3.3 Hierarchical Clustering Continued: Groups from hclust() |
|
|
309 | (2) |
|
11.3.4 Mean-Based Clustering: kmeans() |
|
|
311 | (3) |
|
11.3.5 Model-Based Clustering: Mc lust() |
|
|
314 | (1) |
|
11.3.6 Comparing Models with BIC() |
|
|
315 | (2) |
|
11.3.7 Latent Class Analysis: poLCA() |
|
|
317 | (3) |
|
11.3.8 Comparing Cluster Solutions |
|
|
320 | (2) |
|
11.3.9 Recap of Clustering |
|
|
322 | (1) |
|
|
322 | (11) |
|
11.4.1 Naive Bayes Classification: naiveBayes() |
|
|
323 | (4) |
|
11.4.2 Random Forest Classification: randomForest() |
|
|
327 | (3) |
|
11.4.3 Random Forest Variable Importance |
|
|
330 | (3) |
|
11.5 Prediction: Identifying Potential Customers* |
|
|
333 | (3) |
|
|
336 | (1) |
|
|
337 | (2) |
|
12 Association Rules for Market Basket Analysis |
|
|
339 | (24) |
|
12.1 The Basics of Association Rules |
|
|
340 | (1) |
|
|
340 | (1) |
|
12.2 Retail Transaction Data: Market Baskets |
|
|
341 | (5) |
|
12.2.1 Example Data: Groceries |
|
|
342 | (2) |
|
|
344 | (2) |
|
12.3 Finding and Visualizing Association Rules |
|
|
346 | (10) |
|
12.3.1 Finding and Plotting Subsets of Rules |
|
|
348 | (1) |
|
12.3.2 Using Profit Margin Data with Transactions: An Initial Start |
|
|
349 | (2) |
|
12.3.3 Language Brief: A Function for Margin Using an Object's class* |
|
|
351 | (5) |
|
12.4 Rules in Non-Transactional Data: Exploring Segments Again |
|
|
356 | (4) |
|
12.4.1 Language Brief: Slicing Continuous Data with cut() |
|
|
356 | (1) |
|
12.4.2 Exploring Segment Associations |
|
|
357 | (3) |
|
|
360 | (1) |
|
|
360 | (3) |
|
|
363 | (40) |
|
13.1 Choice-Based Conjoint Analysis Surveys |
|
|
364 | (1) |
|
13.2 Simulating Choice Data* |
|
|
365 | (5) |
|
13.3 Fitting a Choice Model |
|
|
370 | (13) |
|
13.3.1 Inspecting Choice Data |
|
|
371 | (1) |
|
13.3.2 Fitting Choice Models with mlogit() |
|
|
372 | (3) |
|
13.3.3 Reporting Choice Model Findings |
|
|
375 | (5) |
|
13.3.4 Share Predictions for Identical Alternatives |
|
|
380 | (1) |
|
13.3.5 Planning the Sample Size for a Conjoint Study |
|
|
381 | (2) |
|
13.4 Adding Consumer Heterogeneity to Choice Models |
|
|
383 | (5) |
|
13.4.1 Estimating Mixed Logit Models with mlogit() |
|
|
383 | (3) |
|
13.4.2 Share Prediction for Heterogeneous Choice Models |
|
|
386 | (2) |
|
13.5 Hierarchical Bayes Choice Models |
|
|
388 | (9) |
|
13.5.1 Estimating Hierarchical Bayes Choice Models with ChoiceModelR |
|
|
388 | (7) |
|
13.5.2 Share Prediction for Hierarchical Bayes Choice Models |
|
|
395 | (2) |
|
13.6 Design of Choice-Based Conjoint Surveys* |
|
|
397 | (1) |
|
|
398 | (1) |
|
|
399 | (2) |
Conclusion |
|
401 | (2) |
A Appendix: R Versions and Related Software |
|
403 | (8) |
|
|
403 | (1) |
|
|
404 | (1) |
|
A.3 Emacs Speaks Statistics |
|
|
405 | (1) |
|
|
406 | (1) |
|
|
407 | (1) |
|
|
408 | (3) |
|
|
408 | (1) |
|
|
408 | (1) |
|
|
409 | (1) |
|
|
409 | (1) |
|
A.6.5 TIBCO Enterprise Runtime for R |
|
|
409 | (2) |
B Appendix: Scaling Up |
|
411 | (12) |
|
|
411 | (4) |
|
|
411 | (1) |
|
B.1.2 Microsoft Excel: gdata |
|
|
412 | (1) |
|
B.1.3 SAS, SPSS, and Other Statistics Packages: foreign |
|
|
412 | (1) |
|
B.1.4 SQL: RSQLite, sqldf and RODBC |
|
|
413 | (2) |
|
B.2 Handling Large Data Sets |
|
|
415 | (1) |
|
B.3 Speeding Up Computation |
|
|
416 | (2) |
|
B.3.1 Efficient Coding and Data Storage |
|
|
416 | (1) |
|
B.3.2 Enhancing the R Engine |
|
|
417 | (1) |
|
B.4 Time Series Analysis, Repeated Measures, and Longitudinal Analysis |
|
|
418 | (1) |
|
B.5 Automated and Interactive Reporting |
|
|
419 | (4) |
C Appendix: Packages Used |
|
423 | (8) |
|
C.1 Core and Frequentist Statistics |
|
|
424 | (1) |
|
|
424 | (1) |
|
|
425 | (1) |
|
|
426 | (1) |
|
|
426 | (1) |
|
|
427 | (1) |
|
|
428 | (3) |
D Appendix: Online Materials and Data Files |
|
431 | (4) |
|
|
431 | (1) |
|
D.2 Data File URL Cross-Reference |
|
|
432 | (3) |
|
D.2.1 Update on Data Locations |
|
|
432 | (3) |
References |
|
435 | (12) |
Index |
|
447 | |