Preface |
|
xiii | |
Acknowledgments |
|
xxi | |
About the Author |
|
xxiii | |
1 Data Mining: A Gentle Introduction |
|
1 | (14) |
|
|
1 | (1) |
|
1.2 Data Mining: Why It Is Successful in the IT World |
|
|
2 | (2) |
|
1.2.1 Availability of Large Databases: Data Warehousing |
|
|
2 | (1) |
|
1.2.2 Price Drop in Data Storage and Efficient Computer Processing |
|
|
3 | (1) |
|
1.2.3 New Advancements in Analytical Methodology |
|
|
3 | (1) |
|
1.3 Benefits of Data Mining |
|
|
4 | (1) |
|
|
4 | (2) |
|
|
6 | (1) |
|
|
6 | (4) |
|
1.6.1 Identification of Problem and Defining the Data Mining Study Goal |
|
|
6 | (1) |
|
|
6 | (1) |
|
1.6.3 Data Exploration and Descriptive Analysis |
|
|
7 | (1) |
|
1.6.4 Data Mining Solutions: Unsupervised Learning Methods |
|
|
8 | (1) |
|
1.6.5 Data Mining Solutions: Supervised Learning Methods |
|
|
8 | (1) |
|
|
9 | (1) |
|
1.6.7 Interpret and Make Decision, |
|
|
10 | (1) |
|
1.7 Problems in the Data Mining Process |
|
|
10 | (1) |
|
1.8 SAS Software the Leader in Data Mining |
|
|
10 | (2) |
|
1.8.1 SEM MA: The SAS Data Mining Process |
|
|
11 | (1) |
|
1.8.2 SAS Enterprise Miner for Comprehensive Data Mining Solution |
|
|
11 | (1) |
|
1.9 Introduction of User-Friendly SAS Macros for Statistical Data Mining |
|
|
12 | (1) |
|
1.9.1 Limitations of These SAS Macros |
|
|
13 | (1) |
|
|
13 | (1) |
|
|
13 | (2) |
2 Preparing Data for Data Mining |
|
15 | (20) |
|
|
15 | (1) |
|
2.2 Data Requirements in Data Mining |
|
|
15 | (1) |
|
2.3 Ideal Structures of Data for Data Mining |
|
|
16 | (1) |
|
2.4 Understanding the Measurement Scale of Variables |
|
|
16 | (1) |
|
2.5 Entire Database or Representative Sample |
|
|
17 | (1) |
|
2.6 Sampling for Data Mining |
|
|
17 | (1) |
|
|
18 | (1) |
|
2.7 User-Friendly SAS Applications Used in Data Preparation |
|
|
18 | (15) |
|
2.7.1 Preparing PC Data Files before Importing into SAS Data |
|
|
18 | (2) |
|
2.7.2 Converting PC Data Files to SAS Datasets Using the SAS Import Wizard |
|
|
20 | (1) |
|
2.7.3 EXLSAS2 SAS Macro Application to Convert PC Data Formats to SAS Datasets |
|
|
21 | (1) |
|
2.7.4 Steps Involved in Running the EXLSAS2 Macro |
|
|
22 | (2) |
|
2.7.5 Case Study 1: Importing an Excel File Called "Fraud" to a Permanent SAS Dataset Called "Fraud" |
|
|
24 | (1) |
|
2.7.6 SAS Macro Applications—RANSPLIT2: Random Sampling from the Entire Database |
|
|
25 | (1) |
|
2.7.7 Steps Involved in Running the RANSPLIT2 Macro |
|
|
26 | (4) |
|
2.7.8 Case Study 2: Drawing Training (400), Validation (300), and Test (All Left-Over Observations) Samples from the SAS Data Called "Fraud" |
|
|
30 | (3) |
|
|
33 | (1) |
|
|
33 | (2) |
3 Exploratory Data Analysis |
|
35 | (32) |
|
|
35 | (1) |
|
3.2 Exploring Continuous Variables |
|
|
35 | (7) |
|
3.2.1 Descriptive Statistics |
|
|
35 | (4) |
|
3.2.1.1 Measures of Location or Central Tendency |
|
|
36 | (1) |
|
3.2.1.2 Robust Measures of Location |
|
|
36 | (1) |
|
3.2.1.3 Five-Number Summary Statistics |
|
|
37 | (1) |
|
3.2.1.4 Measures of Dispersion |
|
|
37 | (1) |
|
3.2.1.5 Standard Errors and Confidence Interval Estimates |
|
|
38 | (1) |
|
3.2.1.6 Detecting Deviation from Normally Distributed Data |
|
|
38 | (1) |
|
3.2.2 Graphical Techniques Used in EDA of Continuous Data |
|
|
39 | (3) |
|
3.3 Data Exploration: Categorical Variable |
|
|
42 | (2) |
|
3.3.1 Descriptive Statistical Estimates of Categorical Variables |
|
|
42 | (1) |
|
3.3.2 Graphical Displays for Categorical Data |
|
|
43 | (1) |
|
3.4 SAS Macro Applications Used in Data Exploration |
|
|
44 | (20) |
|
3.4.1 Exploring Categorical Variables Using the SAS Macro FREQ2 |
|
|
44 | (3) |
|
3.4.1.1 Steps Involved in Running the FREQ2 Macro |
|
|
46 | (1) |
|
3.4.2 Case Study 1: Exploring Categorical Variables in a SAS Dataset |
|
|
47 | (2) |
|
3.4.3 EDA Analysis of Continuous Variables Using SAS Macro UNIVAR2 |
|
|
49 | (4) |
|
3.4.3.1 Steps Involved in Running the UNIVAR2 Macro |
|
|
51 | (2) |
|
3.4.4 Case Study 2: Data Exploration of a Continuous Variable Using UNIVAR2 |
|
|
53 | (5) |
|
3.4.5 Case Study 3: Exploring Continuous Data by a Group Variable Using UNIVAR2 |
|
|
58 | (11) |
|
3.4.5.1 Data Descriptions |
|
|
58 | (6) |
|
|
64 | (1) |
|
|
64 | (3) |
4 Unsupervised Learning Methods |
|
67 | (76) |
|
|
67 | (1) |
|
4.2 Applications of Unsupervised Learning Methods |
|
|
68 | (1) |
|
4.3 Principal Component Analysis |
|
|
69 | (2) |
|
|
70 | (1) |
|
4.4 Exploratory Factor Analysis |
|
|
71 | (9) |
|
4.4.1 Exploratory Factor Analysis versus Principal Component Analysis |
|
|
72 | (1) |
|
4.4.2 Exploratory Factor Analysis Terminology |
|
|
73 | (7) |
|
4.4.2.1 Communalities and Uniqueness |
|
|
73 | (1) |
|
|
73 | (1) |
|
4.4.2.3 Cronbach Coefficient Alpha |
|
|
74 | (1) |
|
4.4.2.4 Factor Analysis Methods |
|
|
74 | (1) |
|
4.4.2.5 Sampling Adequacy Check in Factor Analysis |
|
|
75 | (1) |
|
4.4.2.6 Estimating the Number of Factors |
|
|
75 | (1) |
|
|
76 | (1) |
|
|
76 | (1) |
|
|
77 | (1) |
|
4.4.2.10 Confidence Intervals and the Significance of Factor Loading Converge |
|
|
78 | (1) |
|
4.4.2.11 Standardized Factor Score |
|
|
78 | (2) |
|
4.5 Disjoint Cluster Analysis |
|
|
80 | (2) |
|
4.5.1 Types of Cluster Analysis |
|
|
80 | (1) |
|
4.5.2 FASTCLUS: SAS Procedure to Perform Disjoint Cluster Analysis |
|
|
81 | (1) |
|
4.6 Biplot Display of PCA, EFA, and DCA Results |
|
|
82 | (1) |
|
4.7 PCA and EFA Using SAS Macro FACTOR2 |
|
|
82 | (39) |
|
4.7.1 Steps Involved in Running the FACTOR2 Macro |
|
|
83 | (1) |
|
4.7.2 Case Study 1: Principal Component Analysis of 1993 Car Attribute Data |
|
|
84 | (13) |
|
|
84 | (1) |
|
4.7.2.2 Data Descriptions |
|
|
85 | (12) |
|
4.7.3 Case Study 2: Maximum Likelihood FACTOR Analysis with VARIMAX Rotation of 1993 Car Attribute Data |
|
|
97 | (19) |
|
|
97 | (1) |
|
4.7.3.2 Data Descriptions |
|
|
97 | (19) |
|
4.7.3 CASE Study 3: Maximum Likelihood FACTOR Analysis with VARIMAX Rotation Using a Multivariate Data in the Form of Correlation Matrix |
|
|
116 | (5) |
|
|
116 | (1) |
|
4.7.3.2 Data Descriptions |
|
|
117 | (4) |
|
4.8 Disjoint Cluster Analysis Using SAS Macro DISJCLS2 |
|
|
121 | (19) |
|
4.8.1 Steps Involved in Running the DISJCLS2 Macro |
|
|
124 | (1) |
|
4.8.2 Case Study 4: Disjoint Cluster Analysis of 1993 Car Attribute Data |
|
|
125 | (20) |
|
|
125 | (1) |
|
4.8.2.2 Data Descriptions |
|
|
126 | (14) |
|
|
140 | (1) |
|
|
140 | (3) |
5 Supervised Learning Methods: Prediction |
|
143 | (162) |
|
|
143 | (1) |
|
5.2 Applications of Supervised Predictive Methods |
|
|
144 | (1) |
|
5.3 Multiple Linear Regression Modeling |
|
|
145 | (13) |
|
5.3.1 Multiple Linear Regressions: Key Concepts and Terminology |
|
|
145 | (3) |
|
5.3.2 Model Selection in Multiple Linear Regression |
|
|
148 | (2) |
|
5.3.2.1 Best Candidate Models Selected Based on AICC and SBC |
|
|
149 | (1) |
|
5.3.2.2 Model Selection Based on the New SAS PROC GLMSELECT |
|
|
149 | (1) |
|
5.3.3 Exploratory Analysis Using Diagnostic Plots |
|
|
150 | (4) |
|
5.3.4 Violations of Regression Model Assumptions |
|
|
154 | (2) |
|
5.3.4.1 Model Specification Error |
|
|
154 | (1) |
|
5.3.4.2 Serial Correlation among the Residual |
|
|
154 | (1) |
|
5.3.4.3 Influential Outliers |
|
|
155 | (1) |
|
5.3.4.4 Multicollinearity |
|
|
155 | (1) |
|
5.3.4.5 Heteroscedasticity in Residual Variance |
|
|
155 | (1) |
|
5.3.4.6 Nonnormality of Residuals |
|
|
156 | (1) |
|
5.3.5 Regression Model Validation |
|
|
156 | (1) |
|
|
156 | (1) |
|
|
157 | (1) |
|
5.4 Binary Logistic Regression Modeling |
|
|
158 | (7) |
|
5.4.1 Terminology and Key Concepts |
|
|
158 | (3) |
|
5.4.2 Model Selection in Logistic Regression |
|
|
161 | (1) |
|
5.4.3 Exploratory Analysis Using Diagnostic Plots |
|
|
162 | (2) |
|
|
163 | (1) |
|
5.4.3.2 Two-Factor Interaction Plots between Continuous Variables |
|
|
164 | (1) |
|
5.4.4 Checking for Violations of Regression Model Assumptions |
|
|
164 | (3) |
|
5.4.4.1 Model Specification Error |
|
|
164 | (1) |
|
5.4.4.2 Influential Outlier |
|
|
164 | (1) |
|
5.4.4.3 Multicollinearity |
|
|
165 | (1) |
|
|
165 | (1) |
|
5.5 Ordinal Logistic Regression |
|
|
165 | (1) |
|
5.6 Survey Logistic Regression |
|
|
166 | (1) |
|
5.7 Multiple Linear Regression Using SAS Macro REGDIAG? |
|
|
167 | (2) |
|
5.7.1 Steps Involved in Running the REGDIAG2 Macro |
|
|
168 | (1) |
|
5.8 Lift Chart Using SAS Macro LIFT2 |
|
|
169 | (1) |
|
5.8.1 Steps Involved in Running the LIFT2 Macro |
|
|
170 | (1) |
|
5.9 Scoring New Regression Data Using the SAS Macro RSCORE2 |
|
|
170 | (2) |
|
5.9.1 Steps Involved in Running the RSCORE2 Macro |
|
|
171 | (1) |
|
5.10 Logistic Regression Using SAS Macro LOGEST7 |
|
|
172 | (1) |
|
5.11 Scoring New Logistic Regression Data Using the SAS Macro RSCORE |
|
|
173 | (1) |
|
5.12 Case Study 1: Modeling Multiple Linear Regressions |
|
|
173 | (33) |
|
|
173 | (33) |
|
5.12.1.1 Step 1: Preliminary Model Selection |
|
|
175 | (4) |
|
5.12.1.2 Step 2: Graphical Exploratory Analysis and Regression Diagnostic Plots |
|
|
179 | (12) |
|
5.12.1.3 Step 3: Fitting the Regression Model and Checking for the Violations of Regression Assumptions |
|
|
191 | (12) |
|
5.12.1.4 Remedial Measure: Robust Regression to Adjust the Regression Parameter Estimates to Extreme Outliers |
|
|
203 | (3) |
|
5.13 Case Study 2: If—Then Analysis and Lift Charts |
|
|
206 | (6) |
|
|
208 | (4) |
|
5.14 Case Study 3: Modeling Multiple Linear Regression with Categorical Variables |
|
|
212 | (20) |
|
|
212 | (1) |
|
|
212 | (20) |
|
5.15 Case Study 4: Modeling Binary Logistic Regression |
|
|
232 | (28) |
|
|
232 | (2) |
|
|
234 | (26) |
|
5.15.2.1 Step 1: Best Candidate Model Selection |
|
|
235 | (2) |
|
5.15.2.2 Step 2: Exploratory Analysis/Diagnostic Plots |
|
|
237 | (2) |
|
5.15.2.3 Step 3: Fitting Binary Logistic Regression |
|
|
239 | (21) |
|
5.16 Case Study: 5 Modeling Binary Multiple Logistic Regression |
|
|
260 | (26) |
|
|
260 | (1) |
|
|
261 | (25) |
|
5.17 Case Study: 6 Modeling Ordinal Multiple Logistic Regression |
|
|
286 | (15) |
|
|
286 | (1) |
|
|
286 | (15) |
|
|
301 | (1) |
|
|
301 | (4) |
6 Supervised Learning Methods: Classification |
|
305 | (72) |
|
|
305 | (1) |
|
6.2 Discriminant Analysis |
|
|
306 | (1) |
|
6.3 Stepwise Discriminant Analysis |
|
|
306 | (2) |
|
6.4 Canonical Discriminant Analysis |
|
|
308 | (2) |
|
6.4.1 Canonical Discriminant Analysis Assumptions |
|
|
308 | (1) |
|
6.4.2 Key Concepts and Terminology in Canonical Discriminant Analysis |
|
|
309 | (1) |
|
6.5 Discriminant Function Analysis |
|
|
310 | (3) |
|
6.5.1 Key Concepts and Terminology in Discriminant Function Analysis |
|
|
310 | (3) |
|
6.6 Applications of Discriminant Analysis |
|
|
313 | (1) |
|
6.7 Classification Tree Based on CHAID |
|
|
313 | (3) |
|
6.7.1 Key Concepts and Terminology in Classification Tree Methods |
|
|
314 | (2) |
|
6.8 Applications of CHAID |
|
|
316 | (1) |
|
6.9 Discriminant Analysis Using SAS Macro DISCRIM2 |
|
|
316 | (2) |
|
6.9.1 Steps Involved in Running the DISCRIM2 Macro |
|
|
317 | (1) |
|
6.10 Decision Tree Using SAS Macro CHAID2 |
|
|
318 | (2) |
|
6.10.1 Steps Involved in Running the CHAID2 Macro |
|
|
319 | (1) |
|
6.11 Case Study 1: Canonical Discriminant Analysis and Parametric Discriminant Function Analysis |
|
|
320 | (26) |
|
|
320 | (1) |
|
6.11.2 Case Study 1: Parametric Discriminant Analysis |
|
|
321 | (25) |
|
6.11.2.1 Canonical Discriminant Analysis (CDA) |
|
|
328 | (18) |
|
6.12 Case Study 2: Nonparametric Discriminant Function Analysis |
|
|
346 | (17) |
|
|
346 | (1) |
|
|
347 | (16) |
|
6.13 Case Study 3: Classification Tree Using CH AID |
|
|
363 | (12) |
|
|
364 | (1) |
|
|
364 | (11) |
|
|
375 | (1) |
|
|
376 | (1) |
7 Advanced Analytics and Other SAS Data Mining Resources |
|
377 | (6) |
|
|
377 | (1) |
|
7.2 Artificial Neural Network Methods |
|
|
378 | (1) |
|
7.3 Market Basket Analysis |
|
|
379 | (2) |
|
|
380 | (1) |
|
7.3.2 Limitations of Market Basket Analysis |
|
|
380 | (1) |
|
7.4 SAS Software: The Leader in Data Mining |
|
|
381 | (1) |
|
|
382 | (1) |
|
|
382 | (1) |
Appendix I: Instruction for Using the SAS Macros |
|
383 | (4) |
Appendix II: Data Mining SAS Macro Help Files |
|
387 | (54) |
Appendix III: Instruction for Using the SAS Macros with Enterprise Guide Code Window |
|
441 | (2) |
Index |
|
443 | |