List of Figures |
|
xv | |
List of Tables |
|
xvii | |
Preface |
|
xix | |
1 Introduction |
|
1 | (8) |
|
|
1 | (1) |
|
1.2 What Is a Statistical Model? |
|
|
2 | (1) |
|
|
3 | (1) |
|
|
4 | (1) |
|
1.5 Characteristics of Good Modelers |
|
|
5 | (2) |
|
1.6 The Future of Predictive Analytics |
|
|
7 | (2) |
2 Properties of Statistical Distributions |
|
9 | (54) |
|
2.1 Fundamental Distributions |
|
|
9 | (29) |
|
2.1.1 Uniform Distribution |
|
|
9 | (1) |
|
2.1.2 Details of the Normal (Gaussian) Distribution |
|
|
10 | (9) |
|
2.1.3 Lognormal Distribution |
|
|
19 | (1) |
|
|
20 | (2) |
|
2.1.5 Chi-Squared Distribution |
|
|
22 | (3) |
|
2.1.6 Non-Central Chi-Squared Distribution |
|
|
25 | (3) |
|
2.1.7 Student's t-Distribution |
|
|
28 | (1) |
|
2.1.8 Multivariate t-Distribution |
|
|
29 | (2) |
|
|
31 | (1) |
|
2.1.10 Binomial Distribution |
|
|
31 | (1) |
|
2.1.11 Poisson Distribution |
|
|
32 | (1) |
|
2.1.12 Exponential Distribution |
|
|
32 | (1) |
|
2.1.13 Geometric Distribution |
|
|
33 | (1) |
|
2.1.14 Hypergeometric Distribution |
|
|
33 | (1) |
|
2.1.15 Negative Binomial Distribution |
|
|
34 | (1) |
|
2.1.16 Inverse Gaussian (IG) Distribution |
|
|
35 | (1) |
|
2.1.17 Normal Inverse Gaussian (NIG) Distribution |
|
|
36 | (2) |
|
2.2 Central Limit Theorem |
|
|
38 | (2) |
|
2.3 Estimate of Mean, Variance, Skewness, and Kurtosis from Sample Data |
|
|
40 | (1) |
|
2.4 Estimate of the Standard Deviation of the Sample Mean |
|
|
40 | (1) |
|
2.5 (Pseudo) Random Number Generators |
|
|
41 | (2) |
|
2.5.1 Mersenne Twister Pseudorandom Number Generator |
|
|
42 | (1) |
|
2.5.2 Box-Muller Transform for Generating a Normal Distribution |
|
|
42 | (1) |
|
2.6 Transformation of a Distribution Function |
|
|
43 | (1) |
|
2.7 Distribution of a Function of Random Variables |
|
|
43 | (3) |
|
|
44 | (1) |
|
|
44 | (1) |
|
2.7.3 (Z1, Z2, ..., Zn) = (X1, X2, ..., Xn) · Y |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
45 | (1) |
|
|
45 | (1) |
|
2.8 Moment Generating Function |
|
|
46 | (2) |
|
2.8.1 Moment Generating Function of Binomial Distribution |
|
|
46 | (1) |
|
2.8.2 Moment Generating Function of Normal Distribution |
|
|
47 | (1) |
|
2.8.3 Moment Generating Function of the Γ Distribution |
|
|
47 | (1) |
|
2.8.4 Moment Generating Function of Chi-Square Distribution |
|
|
47 | (1) |
|
2.8.5 Moment Generating Function of the Poisson Distribution |
|
|
48 | (1) |
|
2.9 Cumulant Generating Function |
|
|
48 | (2) |
|
2.10 Characteristic Function |
|
|
50 | (3) |
|
2.10.1 Relationship between Cumulative Function and Characteristic Function |
|
|
51 | (1) |
|
2.10.2 Characteristic Function of Normal Distribution |
|
|
52 | (1) |
|
2.10.3 Characteristic Function of F Distribution |
|
|
52 | (1) |
|
2.11 Chebyshev's Inequality |
|
|
53 | (1) |
|
|
54 | (1) |
|
2.13 Gram-Charlier Series |
|
|
54 | (1) |
|
|
55 | (1) |
|
2.15 Cornish-Fisher Expansion |
|
|
56 | (2) |
|
2.15.1 Lagrange Inversion Theorem |
|
|
56 | (1) |
|
2.15.2 Cornish-Fisher Expansion |
|
|
57 | (1) |
|
|
58 | (5) |
|
|
60 | (1) |
|
|
61 | (1) |
|
2.16.3 Archimedean Copula |
|
|
62 | (1) |
3 Important Matrix Relationships |
|
63 | (20) |
|
3.1 Pseudo-Inverse of a Matrix |
|
|
63 | (1) |
|
3.2 A Lemma of Matrix Inversion |
|
|
64 | (2) |
|
3.3 Identity for a Matrix Determinant |
|
|
66 | (1) |
|
3.4 Inversion of Partitioned Matrix |
|
|
66 | (1) |
|
3.5 Determinant of Partitioned Matrix |
|
|
67 | (1) |
|
3.6 Matrix Sweep and Partial Correlation |
|
|
67 | (2) |
|
3.7 Singular Value Decomposition (SVD) |
|
|
69 | (2) |
|
3.8 Diagonalization of a Matrix |
|
|
71 | (4) |
|
3.9 Spectral Decomposition of a Positive Semi-Definite Matrix |
|
|
75 | (1) |
|
3.10 Normalization in Vector Space |
|
|
76 | (1) |
|
3.11 Conjugate Decomposition of a Symmetric Definite Matrix |
|
|
77 | (1) |
|
3.12 Cholesky Decomposition |
|
|
77 | (3) |
|
3.13 Cauchy-Schwartz Inequality |
|
|
80 | (1) |
|
3.14 Relationship of Correlation among Three Variables |
|
|
81 | (2) |
4 Linear Modeling and Regression |
|
83 | (46) |
|
4.1 Properties of Maximum Likelihood Estimators |
|
|
84 | (4) |
|
4.1.1 Likelihood Ratio Test |
|
|
87 | (1) |
|
|
87 | (1) |
|
4.1.3 Lagrange Multiplier Statistic |
|
|
88 | (1) |
|
|
88 | (18) |
|
4.2.1 Ordinary Least Squares (OLS) Regression |
|
|
89 | (6) |
|
4.2.2 Interpretation of the Coefficients of Linear Regression |
|
|
95 | (2) |
|
4.2.3 Regression on Weighted Data |
|
|
97 | (3) |
|
4.2.4 Incrementally Updating a Regression Model with Additional Data |
|
|
100 | (1) |
|
4.2.5 Partitioned Regression |
|
|
101 | (1) |
|
4.2.6 How Does the Regression Change When Adding One More Variable? |
|
|
101 | (2) |
|
4.2.7 Linearly Restricted Least Squares Regression |
|
|
103 | (2) |
|
4.2.8 Significance of the Correlation Coefficient |
|
|
105 | (1) |
|
4.2.9 Partial Correlation |
|
|
105 | (1) |
|
|
105 | (1) |
|
4.3 Fisher's Linear Discriminant Analysis |
|
|
106 | (3) |
|
4.4 Principal Component Regression (PCR) |
|
|
109 | (1) |
|
|
110 | (1) |
|
4.6 Partial Least Squares Regression (PLSR) |
|
|
111 | (2) |
|
4.7 Generalized Linear Model (GLM) |
|
|
113 | (3) |
|
4.8 Logistic Regression: Binary |
|
|
116 | (3) |
|
4.9 Logistic Regression: Multiple Nominal |
|
|
119 | (2) |
|
4.10 Logistic Regression: Proportional Multiple Ordinal |
|
|
121 | (2) |
|
4.11 Fisher Scoring Method for Logistic Regression |
|
|
123 | (2) |
|
4.12 Tobit Model: A Censored Regression Model |
|
|
125 | (4) |
|
4.12.1 Some Properties of the Normal Distribution |
|
|
125 | (1) |
|
4.12.2 Formulation of the Tobit Model |
|
|
126 | (3) |
5 Nonlinear Modeling |
|
129 | (44) |
|
5.1 Naive Bayesian Classifier |
|
|
129 | (2) |
|
|
131 | (6) |
|
5.2.1 Back Propagation Neural Network |
|
|
131 | (6) |
|
5.3 Segmentation and Tree Models |
|
|
137 | (14) |
|
|
137 | (1) |
|
|
138 | (2) |
|
5.3.3 Sweeping to Find the Best Cutpoint |
|
|
140 | (3) |
|
5.3.4 Impurity Measure of a Population: Entropy and Gini Index |
|
|
143 | (4) |
|
5.3.5 Chi-Square Splitting Rule |
|
|
147 | (1) |
|
5.3.6 Implementation of Decision Trees |
|
|
148 | (3) |
|
|
151 | (7) |
|
|
153 | (1) |
|
5.4.2 Least Squares Regression Boosting Tree |
|
|
154 | (1) |
|
5.4.3 Binary Logistic Regression Boosting Tree |
|
|
155 | (3) |
|
5.5 Support Vector Machine (SVM) |
|
|
158 | (10) |
|
|
158 | (1) |
|
5.5.2 Linearly Separable Problem |
|
|
159 | (2) |
|
5.5.3 Linearly Inseparable Problem |
|
|
161 | (1) |
|
5.5.4 Constructing Higher-Dimensional Space and Kernel |
|
|
162 | (1) |
|
|
163 | (1) |
|
5.5.6 C-Support Vector Classification (C-SVC) for Classification |
|
|
164 | (1) |
|
5.5.7 E-Support Vector Regression (E-SVR) for Regression |
|
|
164 | (3) |
|
5.5.8 The Probability Estimate |
|
|
167 | (1) |
|
|
168 | (1) |
|
5.6.1 A Simple Fuzzy Logic System |
|
|
168 | (1) |
|
|
169 | (4) |
|
5.7.1 K Means, Fuzzy C Means |
|
|
170 | (1) |
|
5.7.2 Nearest Neighbor, K Nearest Neighbor (KNN) |
|
|
171 | (1) |
|
5.7.3 Comments on Clustering Methods |
|
|
171 | (2) |
6 Time Series Analysis |
|
173 | (22) |
|
6.1 Fundamentals of Forecasting |
|
|
173 | (8) |
|
6.1.1 Box-Cox Transformation |
|
|
174 | (1) |
|
6.1.2 Smoothing Algorithms |
|
|
175 | (1) |
|
6.1.3 Convolution of Linear Filters |
|
|
176 | (1) |
|
6.1.4 Linear Difference Equation |
|
|
177 | (1) |
|
6.1.5 The Autocovariance Function and Autocorrelation Function |
|
|
178 | (1) |
|
6.1.6 The Partial Autocorrelation Function |
|
|
179 | (2) |
|
|
181 | (6) |
|
|
182 | (2) |
|
|
184 | (2) |
|
|
186 | (1) |
|
6.3 Survival Data Analysis |
|
|
187 | (4) |
|
|
190 | (1) |
|
6.4 Exponentially Weighted Moving Average (EWMA) and GARCH(1, 1) |
|
|
191 | (4) |
|
6.4.1 Exponentially Weighted Moving Average (EWMA) |
|
|
191 | (1) |
|
6.4.2 ARCH and GARCH Models |
|
|
192 | (3) |
7 Data Preparation and Variable Selection |
|
195 | (18) |
|
7.1 Data Quality and Exploration |
|
|
196 | (1) |
|
7.2 Variable Scaling and Transformation |
|
|
197 | (1) |
|
|
197 | (2) |
|
|
198 | (1) |
|
|
198 | (1) |
|
|
199 | (1) |
|
7.4 Interpolation in One and Two Dimensions |
|
|
199 | (1) |
|
7.5 Weight of Evidence (WOE) Transformation |
|
|
200 | (4) |
|
7.6 Variable Selection Overview |
|
|
204 | (2) |
|
7.7 Missing Data Imputation |
|
|
206 | (1) |
|
7.8 Stepwise Selection Methods |
|
|
207 | (2) |
|
7.8.1 Forward Selection in Linear Regression |
|
|
208 | (1) |
|
7.8.2 Forward Selection in Logistic Regression |
|
|
208 | (1) |
|
7.9 Mutual Information, KL Distance |
|
|
209 | (1) |
|
7.10 Detection of Multicollinearity |
|
|
210 | (3) |
8 Model Goodness Measures |
|
213 | (18) |
|
8.1 Training, Testing, Validation |
|
|
213 | (2) |
|
8.2 Continuous Dependent Variable |
|
|
215 | (3) |
|
8.2.1 Example: Linear Regression |
|
|
217 | (1) |
|
8.3 Binary Dependent Variable (Two-Group Classification) |
|
|
218 | (9) |
|
8.3.1 Kolmogorov-Smirnov (KS) Statistic |
|
|
218 | (2) |
|
|
220 | (1) |
|
8.3.3 Concordant and Discordant |
|
|
221 | (2) |
|
8.3.4 R2 for Logistic Regression |
|
|
223 | (1) |
|
|
224 | (1) |
|
8.3.6 Hosmer-Lemeshow Goodness-of-Fit Test |
|
|
224 | (1) |
|
8.3.7 Example: Logistic Regression |
|
|
225 | (2) |
|
8.4 Population Stability Index Using Relative Entropy |
|
|
227 | (4) |
9 Optimization Methods |
|
231 | (40) |
|
|
232 | (2) |
|
9.2 Gradient Descent Method |
|
|
234 | (2) |
|
9.3 Newton-Raphson Method |
|
|
236 | (2) |
|
9.4 Conjugate Gradient Method |
|
|
238 | (2) |
|
|
240 | (2) |
|
9.6 Genetic Algorithms (GA) |
|
|
242 | (1) |
|
|
242 | (1) |
|
|
243 | (4) |
|
9.9 Nonlinear Programming (NLP) |
|
|
247 | (16) |
|
9.9.1 General Nonlinear Programming (GNLP) |
|
|
248 | (1) |
|
9.9.2 Lagrange Dual Problem |
|
|
249 | (1) |
|
9.9.3 Quadratic Programming (QP) |
|
|
250 | (4) |
|
9.9.4 Linear Complementarity Programming (LCP) |
|
|
254 | (2) |
|
9.9.5 Sequential Quadratic Programming (SQP) |
|
|
256 | (7) |
|
|
263 | (1) |
|
9.11 Expectation-Maximization (EM) Algorithm |
|
|
264 | (4) |
|
9.12 Optimal Design of Experiment |
|
|
268 | (3) |
10 Miscellaneous Topics |
|
271 | (20) |
|
10.1 Multidimensional Scaling |
|
|
271 | (3) |
|
|
274 | (4) |
|
10.3 Odds Normalization and Score Transformation |
|
|
278 | (2) |
|
|
280 | (1) |
|
10.5 Dempster-Shafer Theory of Evidence |
|
|
281 | (10) |
|
10.5.1 Some Properties in Set Theory |
|
|
281 | (1) |
|
10.5.2 Basic Probability Assignment, Belief Function, and Plausibility Function |
|
|
282 | (3) |
|
10.5.3 Dempster-Shafer's Rule of Combination |
|
|
285 | (2) |
|
10.5.4 Applications of Dempster-Shafer Theory of Evidence: Multiple Classifier Function |
|
|
287 | (4) |
Appendix A Useful Mathematical Relations |
|
291 | (8) |
|
A.1 Information Inequality |
|
|
291 | (1) |
|
|
291 | (1) |
|
|
292 | (1) |
|
|
293 | (1) |
|
A.5 Convex Function and Jensen's Inequality |
|
|
294 | (5) |
Appendix B DataMinerXL - Microsoft Excel Add-In for Building Predictive Models |
|
299 | (10) |
|
|
299 | (1) |
|
|
299 | (1) |
|
B.3 Data Manipulation Functions |
|
|
300 | (1) |
|
B.4 Basic Statistical Functions |
|
|
300 | (1) |
|
B.5 Modeling Functions for All Models |
|
|
301 | (1) |
|
B.6 Weight of Evidence Transformation Functions |
|
|
301 | (1) |
|
B.7 Linear Regression Functions |
|
|
302 | (1) |
|
B.8 Partial Least Squares Regression Functions |
|
|
302 | (1) |
|
B.9 Logistic Regression Functions |
|
|
303 | (1) |
|
B.10 Time Series Analysis Functions |
|
|
303 | (1) |
|
B.11 Naive Bayes Classifier Functions |
|
|
303 | (1) |
|
B.12 Tree-Based Model Functions |
|
|
304 | (1) |
|
B.13 Clustering and Segmentation Functions |
|
|
304 | (1) |
|
B.14 Neural Network Functions |
|
|
304 | (1) |
|
B.15 Support Vector Machine Functions |
|
|
304 | (1) |
|
B.16 Optimization Functions |
|
|
305 | (1) |
|
B.17 Matrix Operation Functions |
|
|
305 | (1) |
|
B.18 Numerical Integration Functions |
|
|
306 | (1) |
|
B.19 Excel Built-in Statistical Distribution Functions |
|
|
306 | (3) |
Bibliography |
|
309 | (4) |
Index |
|
313 | |