Muutke küpsiste eelistusi

E-raamat: Statistical Data Mining Using SAS Applications 2nd edition [Taylor & Francis e-raamat]

(University of Nevada, Reno, USA)
Teised raamatud teemal:
  • Taylor & Francis e-raamat
  • Hind: 216,96 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
  • Tavahind: 309,94 €
  • Säästad 30%
Teised raamatud teemal:
Statistical Data Mining Using SAS Applications, Second Edition describes statistical data mining concepts and demonstrates the features of user-friendly data mining SAS tools. Integrating the statistical and graphical analysis tools available in SAS systems, the book provides complete statistical data mining solutions without writing SAS program codes or using the point-and-click approach. Each chapter emphasizes step-by-step instructions for using SAS macros and interpreting the results. Compiled data mining SAS macro files are available for download on the authors website. By following the step-by-step instructions and downloading the SAS macros, analysts can perform complete data mining analysis fast and effectively.

New to the Second EditionGeneral Features











Access to SAS macros directly from desktop Compatible with SAS version 9, SAS Enterprise Guide, and SAS Learning Edition

Reorganization of all help files to an appendix Ability to create publication quality graphics Macro-call error check

New Features in These SAS-Specific Macro Applications











Converting PC data files to SAS data (EXLSAS2 macro) Randomly splitting data (RANSPLIT2) Frequency analysis (FREQ2) Univariate analysis (UNIVAR2) PCA and factor analysis (FACTOR2) Multiple linear regressions (REGDIAG2) Logistic regression (LOGIST2) CHAID analysis (CHAID2)

Requiring no experience with SAS programming, this resource supplies instructions and tools for quickly performing exploratory statistical methods, regression analysis, logistic regression multivariate methods, and classification analysis. It presents an accessible, SAS macro-oriented approach while offering comprehensive data mining solutions.
Preface xiii
Acknowledgments xxi
About the Author xxiii
1 Data Mining: A Gentle Introduction 1(14)
1.1 Introduction
1(1)
1.2 Data Mining: Why It Is Successful in the IT World
2(2)
1.2.1 Availability of Large Databases: Data Warehousing
2(1)
1.2.2 Price Drop in Data Storage and Efficient Computer Processing
3(1)
1.2.3 New Advancements in Analytical Methodology
3(1)
1.3 Benefits of Data Mining
4(1)
1.4 Data Mining: Users
4(2)
1.5 Data Mining: Tools
6(1)
1.6 Data Mining: Steps
6(4)
1.6.1 Identification of Problem and Defining the Data Mining Study Goal
6(1)
1.6.2 Data Processing
6(1)
1.6.3 Data Exploration and Descriptive Analysis
7(1)
1.6.4 Data Mining Solutions: Unsupervised Learning Methods
8(1)
1.6.5 Data Mining Solutions: Supervised Learning Methods
8(1)
1.6.6 Model Validation
9(1)
1.6.7 Interpret and Make Decision,
10(1)
1.7 Problems in the Data Mining Process
10(1)
1.8 SAS Software the Leader in Data Mining
10(2)
1.8.1 SEM MA: The SAS Data Mining Process
11(1)
1.8.2 SAS Enterprise Miner for Comprehensive Data Mining Solution
11(1)
1.9 Introduction of User-Friendly SAS Macros for Statistical Data Mining
12(1)
1.9.1 Limitations of These SAS Macros
13(1)
1.10 Summary
13(1)
References
13(2)
2 Preparing Data for Data Mining 15(20)
2.1 Introduction
15(1)
2.2 Data Requirements in Data Mining
15(1)
2.3 Ideal Structures of Data for Data Mining
16(1)
2.4 Understanding the Measurement Scale of Variables
16(1)
2.5 Entire Database or Representative Sample
17(1)
2.6 Sampling for Data Mining
17(1)
2.6.1 Sample Size
18(1)
2.7 User-Friendly SAS Applications Used in Data Preparation
18(15)
2.7.1 Preparing PC Data Files before Importing into SAS Data
18(2)
2.7.2 Converting PC Data Files to SAS Datasets Using the SAS Import Wizard
20(1)
2.7.3 EXLSAS2 SAS Macro Application to Convert PC Data Formats to SAS Datasets
21(1)
2.7.4 Steps Involved in Running the EXLSAS2 Macro
22(2)
2.7.5 Case Study 1: Importing an Excel File Called "Fraud" to a Permanent SAS Dataset Called "Fraud"
24(1)
2.7.6 SAS Macro Applications—RANSPLIT2: Random Sampling from the Entire Database
25(1)
2.7.7 Steps Involved in Running the RANSPLIT2 Macro
26(4)
2.7.8 Case Study 2: Drawing Training (400), Validation (300), and Test (All Left-Over Observations) Samples from the SAS Data Called "Fraud"
30(3)
2.8 Summary
33(1)
References
33(2)
3 Exploratory Data Analysis 35(32)
3.1 Introduction
35(1)
3.2 Exploring Continuous Variables
35(7)
3.2.1 Descriptive Statistics
35(4)
3.2.1.1 Measures of Location or Central Tendency
36(1)
3.2.1.2 Robust Measures of Location
36(1)
3.2.1.3 Five-Number Summary Statistics
37(1)
3.2.1.4 Measures of Dispersion
37(1)
3.2.1.5 Standard Errors and Confidence Interval Estimates
38(1)
3.2.1.6 Detecting Deviation from Normally Distributed Data
38(1)
3.2.2 Graphical Techniques Used in EDA of Continuous Data
39(3)
3.3 Data Exploration: Categorical Variable
42(2)
3.3.1 Descriptive Statistical Estimates of Categorical Variables
42(1)
3.3.2 Graphical Displays for Categorical Data
43(1)
3.4 SAS Macro Applications Used in Data Exploration
44(20)
3.4.1 Exploring Categorical Variables Using the SAS Macro FREQ2
44(3)
3.4.1.1 Steps Involved in Running the FREQ2 Macro
46(1)
3.4.2 Case Study 1: Exploring Categorical Variables in a SAS Dataset
47(2)
3.4.3 EDA Analysis of Continuous Variables Using SAS Macro UNIVAR2
49(4)
3.4.3.1 Steps Involved in Running the UNIVAR2 Macro
51(2)
3.4.4 Case Study 2: Data Exploration of a Continuous Variable Using UNIVAR2
53(5)
3.4.5 Case Study 3: Exploring Continuous Data by a Group Variable Using UNIVAR2
58(11)
3.4.5.1 Data Descriptions
58(6)
3.5 Summary
64(1)
References
64(3)
4 Unsupervised Learning Methods 67(76)
4.1 Introduction
67(1)
4.2 Applications of Unsupervised Learning Methods
68(1)
4.3 Principal Component Analysis
69(2)
4.3.1 PCA Terminology
70(1)
4.4 Exploratory Factor Analysis
71(9)
4.4.1 Exploratory Factor Analysis versus Principal Component Analysis
72(1)
4.4.2 Exploratory Factor Analysis Terminology
73(7)
4.4.2.1 Communalities and Uniqueness
73(1)
4.4.2.2 Heywood Case
73(1)
4.4.2.3 Cronbach Coefficient Alpha
74(1)
4.4.2.4 Factor Analysis Methods
74(1)
4.4.2.5 Sampling Adequacy Check in Factor Analysis
75(1)
4.4.2.6 Estimating the Number of Factors
75(1)
4.4.2.7 Eigenvalues
76(1)
4.4.2.8 Factor Loadings
76(1)
4.4.2.9 Factor Rotation
77(1)
4.4.2.10 Confidence Intervals and the Significance of Factor Loading Converge
78(1)
4.4.2.11 Standardized Factor Score
78(2)
4.5 Disjoint Cluster Analysis
80(2)
4.5.1 Types of Cluster Analysis
80(1)
4.5.2 FASTCLUS: SAS Procedure to Perform Disjoint Cluster Analysis
81(1)
4.6 Biplot Display of PCA, EFA, and DCA Results
82(1)
4.7 PCA and EFA Using SAS Macro FACTOR2
82(39)
4.7.1 Steps Involved in Running the FACTOR2 Macro
83(1)
4.7.2 Case Study 1: Principal Component Analysis of 1993 Car Attribute Data
84(13)
4.7.2.1 Study Objectives
84(1)
4.7.2.2 Data Descriptions
85(12)
4.7.3 Case Study 2: Maximum Likelihood FACTOR Analysis with VARIMAX Rotation of 1993 Car Attribute Data
97(19)
4.7.3.1 Study Objectives
97(1)
4.7.3.2 Data Descriptions
97(19)
4.7.3 CASE Study 3: Maximum Likelihood FACTOR Analysis with VARIMAX Rotation Using a Multivariate Data in the Form of Correlation Matrix
116(5)
4.7.3.1 Study Objectives
116(1)
4.7.3.2 Data Descriptions
117(4)
4.8 Disjoint Cluster Analysis Using SAS Macro DISJCLS2
121(19)
4.8.1 Steps Involved in Running the DISJCLS2 Macro
124(1)
4.8.2 Case Study 4: Disjoint Cluster Analysis of 1993 Car Attribute Data
125(20)
4.8.2.1 Study Objectives
125(1)
4.8.2.2 Data Descriptions
126(14)
4.9 Summary
140(1)
References
140(3)
5 Supervised Learning Methods: Prediction 143(162)
5.1 Introduction
143(1)
5.2 Applications of Supervised Predictive Methods
144(1)
5.3 Multiple Linear Regression Modeling
145(13)
5.3.1 Multiple Linear Regressions: Key Concepts and Terminology
145(3)
5.3.2 Model Selection in Multiple Linear Regression
148(2)
5.3.2.1 Best Candidate Models Selected Based on AICC and SBC
149(1)
5.3.2.2 Model Selection Based on the New SAS PROC GLMSELECT
149(1)
5.3.3 Exploratory Analysis Using Diagnostic Plots
150(4)
5.3.4 Violations of Regression Model Assumptions
154(2)
5.3.4.1 Model Specification Error
154(1)
5.3.4.2 Serial Correlation among the Residual
154(1)
5.3.4.3 Influential Outliers
155(1)
5.3.4.4 Multicollinearity
155(1)
5.3.4.5 Heteroscedasticity in Residual Variance
155(1)
5.3.4.6 Nonnormality of Residuals
156(1)
5.3.5 Regression Model Validation
156(1)
5.3.6 Robust Regression
156(1)
5.3.7 Survey Regression
157(1)
5.4 Binary Logistic Regression Modeling
158(7)
5.4.1 Terminology and Key Concepts
158(3)
5.4.2 Model Selection in Logistic Regression
161(1)
5.4.3 Exploratory Analysis Using Diagnostic Plots
162(2)
5.4.3.1 Interpretation
163(1)
5.4.3.2 Two-Factor Interaction Plots between Continuous Variables
164(1)
5.4.4 Checking for Violations of Regression Model Assumptions
164(3)
5.4.4.1 Model Specification Error
164(1)
5.4.4.2 Influential Outlier
164(1)
5.4.4.3 Multicollinearity
165(1)
5.4.4.4 Overdispersion
165(1)
5.5 Ordinal Logistic Regression
165(1)
5.6 Survey Logistic Regression
166(1)
5.7 Multiple Linear Regression Using SAS Macro REGDIAG?
167(2)
5.7.1 Steps Involved in Running the REGDIAG2 Macro
168(1)
5.8 Lift Chart Using SAS Macro LIFT2
169(1)
5.8.1 Steps Involved in Running the LIFT2 Macro
170(1)
5.9 Scoring New Regression Data Using the SAS Macro RSCORE2
170(2)
5.9.1 Steps Involved in Running the RSCORE2 Macro
171(1)
5.10 Logistic Regression Using SAS Macro LOGEST7
172(1)
5.11 Scoring New Logistic Regression Data Using the SAS Macro RSCORE
173(1)
5.12 Case Study 1: Modeling Multiple Linear Regressions
173(33)
5.12.1 Study Objectives
173(33)
5.12.1.1 Step 1: Preliminary Model Selection
175(4)
5.12.1.2 Step 2: Graphical Exploratory Analysis and Regression Diagnostic Plots
179(12)
5.12.1.3 Step 3: Fitting the Regression Model and Checking for the Violations of Regression Assumptions
191(12)
5.12.1.4 Remedial Measure: Robust Regression to Adjust the Regression Parameter Estimates to Extreme Outliers
203(3)
5.13 Case Study 2: If—Then Analysis and Lift Charts
206(6)
5.13.1 Data Descriptions
208(4)
5.14 Case Study 3: Modeling Multiple Linear Regression with Categorical Variables
212(20)
5.14.1 Study Objectives
212(1)
5.14.2 Data Descriptions
212(20)
5.15 Case Study 4: Modeling Binary Logistic Regression
232(28)
5.15.1 Study Objectives
232(2)
5.15.2 Data Descriptions
234(26)
5.15.2.1 Step 1: Best Candidate Model Selection
235(2)
5.15.2.2 Step 2: Exploratory Analysis/Diagnostic Plots
237(2)
5.15.2.3 Step 3: Fitting Binary Logistic Regression
239(21)
5.16 Case Study: 5 Modeling Binary Multiple Logistic Regression
260(26)
5.16.1 Study Objectives
260(1)
5.16.2 Data Descriptions
261(25)
5.17 Case Study: 6 Modeling Ordinal Multiple Logistic Regression
286(15)
5.17.1 Study Objectives
286(1)
5.17.2 Data Descriptions
286(15)
5.18 Summary
301(1)
References
301(4)
6 Supervised Learning Methods: Classification 305(72)
6.1 Introduction
305(1)
6.2 Discriminant Analysis
306(1)
6.3 Stepwise Discriminant Analysis
306(2)
6.4 Canonical Discriminant Analysis
308(2)
6.4.1 Canonical Discriminant Analysis Assumptions
308(1)
6.4.2 Key Concepts and Terminology in Canonical Discriminant Analysis
309(1)
6.5 Discriminant Function Analysis
310(3)
6.5.1 Key Concepts and Terminology in Discriminant Function Analysis
310(3)
6.6 Applications of Discriminant Analysis
313(1)
6.7 Classification Tree Based on CHAID
313(3)
6.7.1 Key Concepts and Terminology in Classification Tree Methods
314(2)
6.8 Applications of CHAID
316(1)
6.9 Discriminant Analysis Using SAS Macro DISCRIM2
316(2)
6.9.1 Steps Involved in Running the DISCRIM2 Macro
317(1)
6.10 Decision Tree Using SAS Macro CHAID2
318(2)
6.10.1 Steps Involved in Running the CHAID2 Macro
319(1)
6.11 Case Study 1: Canonical Discriminant Analysis and Parametric Discriminant Function Analysis
320(26)
6.11.1 Study Objectives
320(1)
6.11.2 Case Study 1: Parametric Discriminant Analysis
321(25)
6.11.2.1 Canonical Discriminant Analysis (CDA)
328(18)
6.12 Case Study 2: Nonparametric Discriminant Function Analysis
346(17)
6.12.1 Study Objectives
346(1)
6.12.2 Data Descriptions
347(16)
6.13 Case Study 3: Classification Tree Using CH AID
363(12)
6.13.1 Study Objectives
364(1)
6.13.2 Data Descriptions
364(11)
6.14 Summary
375(1)
References
376(1)
7 Advanced Analytics and Other SAS Data Mining Resources 377(6)
7.1 Introduction
377(1)
7.2 Artificial Neural Network Methods
378(1)
7.3 Market Basket Analysis
379(2)
7.3.1 Benefits of MBA
380(1)
7.3.2 Limitations of Market Basket Analysis
380(1)
7.4 SAS Software: The Leader in Data Mining
381(1)
7.5 Summary
382(1)
References
382(1)
Appendix I: Instruction for Using the SAS Macros 383(4)
Appendix II: Data Mining SAS Macro Help Files 387(54)
Appendix III: Instruction for Using the SAS Macros with Enterprise Guide Code Window 441(2)
Index 443
George Fernandez is a professor of applied statistical methods and the director of the Center for Research Design and Analysis at the University of Nevada in Reno.