1 Introduction |
|
1 | (24) |
|
1.1 The Concept of the SPSS Modeler |
|
|
2 | (3) |
|
1.2 Structure and Features of This Book |
|
|
5 | (8) |
|
1.2.1 Prerequisites for Using This Book |
|
|
5 | (1) |
|
1.2.2 Structure of the Book and the Exercise/Solution Concept |
|
|
6 | (2) |
|
1.2.3 Using the Data and Streams Provided with the Book |
|
|
8 | (1) |
|
1.2.4 Datasets Provided with This Book |
|
|
9 | (1) |
|
1.2.5 Template Concept of This Book |
|
|
10 | (3) |
|
1.3 Introducing the Modeling Process |
|
|
13 | (9) |
|
|
16 | (2) |
|
|
18 | (4) |
|
|
22 | (3) |
2 Basic Functions of the SPSS Modeler |
|
25 | (160) |
|
2.1 Defining Streams and Scrolling Through a Dataset |
|
|
25 | (7) |
|
2.2 Switching Between Different Streams |
|
|
32 | (2) |
|
2.3 Defining or Modifying Value Labels |
|
|
34 | (6) |
|
2.4 Adding Comments to a Stream |
|
|
40 | (3) |
|
|
43 | (1) |
|
|
44 | (5) |
|
2.7 Data Handling and Sampling Methods |
|
|
49 | (135) |
|
|
49 | (1) |
|
|
50 | (6) |
|
|
56 | (5) |
|
2.7.4 Extracting/Selecting Records |
|
|
61 | (4) |
|
|
65 | (8) |
|
2.7.6 Data Standardization: Z-Transformation |
|
|
73 | (9) |
|
2.7.7 Partitioning Datasets |
|
|
82 | (6) |
|
|
88 | (23) |
|
|
111 | (13) |
|
|
124 | (8) |
|
|
132 | (15) |
|
|
147 | (37) |
|
|
184 | (1) |
3 Univariate Statistics |
|
185 | (102) |
|
|
185 | (9) |
|
3.1.1 Discrete Versus Continuous Variables |
|
|
185 | (2) |
|
3.1.2 Scales of Measurement |
|
|
187 | (1) |
|
|
188 | (3) |
|
|
191 | (3) |
|
3.2 Simple Data Examination Tasks |
|
|
194 | (92) |
|
|
194 | (1) |
|
3.2.2 Frequency Distribution of Discrete Variables |
|
|
194 | (5) |
|
3.2.3 Frequency Distribution of Continuous Variables |
|
|
199 | (3) |
|
3.2.4 Distribution Analysis with the Data Audit Node |
|
|
202 | (5) |
|
3.2.5 Concept of "SuperNodes" and Transforming a Variable to Normality |
|
|
207 | (17) |
|
3.2.6 Reclassifying Values |
|
|
224 | (12) |
|
3.2.7 Binning Continuous Data |
|
|
236 | (12) |
|
|
248 | (11) |
|
|
259 | (27) |
|
|
286 | (1) |
4 Multivariate Statistics |
|
287 | (60) |
|
|
287 | (3) |
|
|
290 | (6) |
|
|
296 | (6) |
|
|
302 | (8) |
|
|
310 | (4) |
|
4.6 Exclusion of Spurious Correlations |
|
|
314 | (1) |
|
|
315 | (8) |
|
|
323 | (2) |
|
|
325 | (20) |
|
|
345 | (2) |
5 Regression Models |
|
347 | (166) |
|
5.1 Introduction to Regression Models |
|
|
348 | (5) |
|
5.1.1 Motivating Examples |
|
|
348 | (2) |
|
5.1.2 Concept of the Modeling Process and Cross-Validation |
|
|
350 | (3) |
|
5.2 Simple Linear Regression |
|
|
353 | (37) |
|
|
353 | (3) |
|
5.2.2 Building the Stream in SPSS Modeler |
|
|
356 | (4) |
|
5.2.3 Identification and Interpretation of the Model Parameters |
|
|
360 | (2) |
|
5.2.4 Assessment of the Goodness of Fit |
|
|
362 | (3) |
|
5.2.5 Predicting Unknown Values |
|
|
365 | (2) |
|
|
367 | (2) |
|
|
369 | (21) |
|
5.3 Multiple Linear Regression |
|
|
390 | (58) |
|
|
390 | (2) |
|
5.3.2 Building the Model in SPSS Modeler |
|
|
392 | (5) |
|
5.3.3 Final MLR Model and Its Goodness of Fit |
|
|
397 | (7) |
|
5.3.4 Prediction of Unknown Values |
|
|
404 | (1) |
|
5.3.5 Cross-Validation of the Model |
|
|
404 | (2) |
|
5.3.6 Boosting and Bagging (for Regression Models) |
|
|
406 | (9) |
|
|
415 | (3) |
|
|
418 | (30) |
|
5.4 Generalized Linear (Mixed) Model |
|
|
448 | (40) |
|
|
448 | (2) |
|
5.4.2 Building a Model with the GLMM Node |
|
|
450 | (5) |
|
|
455 | (3) |
|
5.4.4 Cross-Validation and Fitting a Quadric Regression Model |
|
|
458 | (10) |
|
|
468 | (1) |
|
|
469 | (19) |
|
5.5 The Auto Numeric Node |
|
|
488 | (23) |
|
5.5.1 Building a Stream with the Auto Numeric Node |
|
|
490 | (7) |
|
5.5.2 The Auto Numeric Model Nugget |
|
|
497 | (3) |
|
|
500 | (1) |
|
|
500 | (11) |
|
|
511 | (2) |
6 Factor Analysis |
|
513 | (74) |
|
|
513 | (2) |
|
6.2 General Theory of Factor Analysis |
|
|
515 | (4) |
|
6.3 Principal Component Analysis |
|
|
519 | (50) |
|
|
519 | (1) |
|
6.3.2 Building a Model in SPSS Modeler |
|
|
520 | (27) |
|
|
547 | (3) |
|
|
550 | (19) |
|
6.4 Principal Factor Analysis |
|
|
569 | (15) |
|
|
569 | (4) |
|
|
573 | (6) |
|
|
579 | (1) |
|
|
579 | (5) |
|
|
584 | (3) |
7 Cluster Analysis |
|
587 | (126) |
|
|
587 | (2) |
|
7.2 General Theory of Cluster Analysis |
|
|
589 | (12) |
|
|
596 | (2) |
|
|
598 | (3) |
|
7.3 TwoStep Hierarchical Agglomerative Clustering |
|
|
601 | (39) |
|
7.3.1 Theory of Hierarchical Clustering |
|
|
601 | (13) |
|
7.3.2 Characteristics of the TwoStep Algorithm |
|
|
614 | (1) |
|
7.3.3 Building a Model in SPSS Modeler |
|
|
615 | (12) |
|
|
627 | (2) |
|
|
629 | (11) |
|
7.4 K-Means Partitioning Clustering |
|
|
640 | (45) |
|
|
640 | (2) |
|
7.4.2 Building a Model in SPSS Modeler |
|
|
642 | (17) |
|
|
659 | (3) |
|
|
662 | (23) |
|
|
685 | (25) |
|
7.5.1 Motivation and Implementation of the Auto Cluster Node |
|
|
685 | (2) |
|
7.5.2 Building a Model in SPSS Modeler |
|
|
687 | (12) |
|
|
699 | (1) |
|
|
700 | (10) |
|
|
710 | (1) |
|
|
711 | (2) |
8 Classification Models |
|
713 | (272) |
|
|
714 | (2) |
|
8.2 General Theory of Classification Models |
|
|
716 | (17) |
|
8.2.1 Process of Training and Using a Classification Model |
|
|
716 | (2) |
|
8.2.2 Classification Algorithms |
|
|
718 | (2) |
|
8.2.3 Classification vs. Clustering |
|
|
720 | (1) |
|
8.2.4 Making a Decision and the Decision Boundary |
|
|
721 | (2) |
|
8.2.5 Performance Measures of Classification Models |
|
|
723 | (2) |
|
|
725 | (2) |
|
|
727 | (3) |
|
|
730 | (3) |
|
|
733 | (43) |
|
|
734 | (2) |
|
8.3.2 Building the Model in SPSS Modeler |
|
|
736 | (7) |
|
8.3.3 Optional: Model Types and Variable Interactions |
|
|
743 | (3) |
|
8.3.4 Final Model and Its Goodness of Fit |
|
|
746 | (4) |
|
8.3.5 Classification of Unknown Values |
|
|
750 | (1) |
|
8.3.6 Cross-Validation of the Model |
|
|
751 | (5) |
|
|
756 | (2) |
|
|
758 | (18) |
|
8.4 Linear Discriminate Classification |
|
|
776 | (32) |
|
|
776 | (3) |
|
8.4.2 Building the Model with SPSS Modeler |
|
|
779 | (6) |
|
8.4.3 The Model Nugget and the Estimated Model Parameters |
|
|
785 | (3) |
|
|
788 | (1) |
|
|
789 | (19) |
|
8.5 Support Vector Machine |
|
|
808 | (35) |
|
|
809 | (1) |
|
8.5.2 Building the Model with SPSS Modeler |
|
|
810 | (10) |
|
|
820 | (1) |
|
|
821 | (1) |
|
|
822 | (21) |
|
|
843 | (35) |
|
|
844 | (2) |
|
8.6.2 Building a Network with SPSS Modeler |
|
|
846 | (10) |
|
|
856 | (4) |
|
|
860 | (2) |
|
|
862 | (16) |
|
|
878 | (39) |
|
|
878 | (4) |
|
8.7.2 Building the Model with SPSS Modeler |
|
|
882 | (9) |
|
|
891 | (2) |
|
8.7.4 Dimensional Reduction with PCA for Data Preprocessing |
|
|
893 | (8) |
|
|
901 | (2) |
|
|
903 | (14) |
|
|
917 | (43) |
|
|
917 | (8) |
|
8.8.2 Building a Decision Tree with the C5.0 Node |
|
|
925 | (4) |
|
|
929 | (3) |
|
8.8.4 Building a decision tree with the CHAID node |
|
|
932 | (6) |
|
|
938 | (1) |
|
|
939 | (21) |
|
8.9 The Auto Classifier Node |
|
|
960 | (23) |
|
8.9.1 Building a Stream with the Auto Classifier Node |
|
|
961 | (10) |
|
8.9.2 The Auto Classifier Model Nugget |
|
|
971 | (2) |
|
|
973 | (1) |
|
|
974 | (9) |
|
|
983 | (2) |
9 Using R with the Modeler |
|
985 | (52) |
|
9.1 Advantages of R with the Modeler |
|
|
985 | (1) |
|
|
986 | (4) |
|
9.3 Test the SPSS Modeler Connection to R |
|
|
990 | (4) |
|
9.4 Calculating New Variables in R |
|
|
994 | (5) |
|
|
999 | (9) |
|
|
1008 | (10) |
|
|
1018 | (17) |
|
|
1035 | (2) |
10 Appendix |
|
1037 | |
|
10.1 Data Sets Used in This Book |
|
|
1037 | (20) |
|
10.1.1 adult_income_data.txt |
|
|
1037 | (1) |
|
|
1037 | (1) |
|
|
1037 | (2) |
|
|
1039 | (1) |
|
10.1.5 car_sales_modified. sav |
|
|
1039 | (1) |
|
10.1.6 chess_endgame_data.txt |
|
|
1039 | (1) |
|
10.1.7 customer_bank_data.csv |
|
|
1040 | (1) |
|
10.1.8 diabetes_data_reduced.sav |
|
|
1040 | (1) |
|
|
1041 | (1) |
|
10.1.10 EEG_Sleep_Signals.csv |
|
|
1042 | (1) |
|
10.1.11 employee_dataset_001 and employee_dataset_002 |
|
|
1042 | (1) |
|
10.1.12 England Payment Datasets |
|
|
1042 | (2) |
|
10.1.13 Features_eeg_signals.csv |
|
|
1044 | (1) |
|
10.1.14 gene_expre ssion_leukemi a. csv |
|
|
1044 | (1) |
|
10.1.15 gene_expression_leukemia_short.csv |
|
|
1045 | (1) |
|
10.1.16 gravity_constant_data.csv |
|
|
1045 | (1) |
|
|
1046 | (1) |
|
|
1046 | (1) |
|
|
1047 | (1) |
|
10.1.20 IT user satisfaction.sav |
|
|
1047 | (1) |
|
|
1047 | (2) |
|
|
1049 | (1) |
|
|
1050 | (1) |
|
10.1.24 nutrition_habites.sav |
|
|
1051 | (1) |
|
10.1.25 optdigits_training.txt, optdigits_test.txt |
|
|
1051 | (1) |
|
|
1052 | (1) |
|
|
1052 | (1) |
|
10.1.28 pisa2012_math_q45.sav |
|
|
1052 | (2) |
|
|
1054 | (1) |
|
|
1054 | (1) |
|
|
1054 | (1) |
|
|
1055 | (1) |
|
|
1055 | (1) |
|
|
1056 | (1) |
|
10.1.35 WisconsinBreastCancerData.csv |
|
|
1056 | (1) |
|
10.1.36 z_pm_customer1.sav |
|
|
1057 | (1) |
|
|
1057 | |