1 Introduction |
|
1 | |
Part I Methodology |
|
5 | |
|
2 Organisation of the data |
|
|
7 | |
|
2.1 Statistical units and statistical variables |
|
|
7 | |
|
2.2 Data matrices and their transformations |
|
|
9 | |
|
2.3 Complex data structures |
|
|
10 | |
|
|
11 | |
|
|
13 | |
|
3.1 Univariate exploratory analysis |
|
|
13 | |
|
3.1.1 Measures of location |
|
|
13 | |
|
3.1.2 Measures of variability |
|
|
15 | |
|
3.1.3 Measures of heterogeneity |
|
|
16 | |
|
3.1.4 Measures of concentration |
|
|
17 | |
|
3.1.5 Measures of asymmetry |
|
|
19 | |
|
3.1.6 Measures of kurtosis |
|
|
20 | |
|
3.2 Bivariate exploratory analysis of quantitative data |
|
|
22 | |
|
3.3 Multivariate exploratory analysis of quantitative data |
|
|
25 | |
|
3.4 Multivariate exploratory analysis of qualitative data |
|
|
27 | |
|
3.4.1 Independence and association |
|
|
28 | |
|
|
29 | |
|
3.4.3 Dependency measures |
|
|
31 | |
|
3.4.4 Model-based measures |
|
|
32 | |
|
3.5 Reduction of dimensionality |
|
|
34 | |
|
3.5.1 Interpretation of the principal components |
|
|
36 | |
|
|
39 | |
|
|
41 | |
|
|
42 | |
|
|
43 | |
|
4.1.2 Similarity measures |
|
|
44 | |
|
4.1.3 Multidimensional scaling |
|
|
46 | |
|
|
47 | |
|
4.2.1 Hierarchical methods |
|
|
49 | |
|
4.2.2 Evaluation of hierarchical methods |
|
|
53 | |
|
4.2.3 Non-hierarchical methods |
|
|
55 | |
|
|
57 | |
|
4.3.1 B iv ari ate linear regression |
|
|
57 | |
|
4.3.2 Properties of the residuals |
|
|
60 | |
|
|
62 | |
|
4.3.4 Multiple linear regression |
|
|
63 | |
|
|
67 | |
|
4.4.1 Interpretation of logistic regression |
|
|
68 | |
|
4.4.2 Discriminant analysis |
|
|
70 | |
|
|
71 | |
|
|
73 | |
|
|
74 | |
|
|
76 | |
|
4.6.1 Architecture of a neural network |
|
|
79 | |
|
4.6.2 The multilayer perceptron |
|
|
81 | |
|
|
87 | |
|
4.7 Nearest-neighbour models |
|
|
89 | |
|
|
90 | |
|
|
90 | |
|
4.8.2 Retrieval by content |
|
|
96 | |
|
4.9 Uncertainty measures and inference |
|
|
96 | |
|
|
97 | |
|
|
99 | |
|
4.9.3 Statistical inference |
|
|
103 | |
|
4.10 Non-parametric modelling |
|
|
109 | |
|
4.11 The normal linear model |
|
|
112 | |
|
4.11.1 Main inferential results |
|
|
113 | |
|
4.12 Generalised linear models |
|
|
116 | |
|
4.12.1 The exponential family |
|
|
117 | |
|
4.12.2 Definition of generalised linear models |
|
|
118 | |
|
4.12.3 The logistic regression model |
|
|
125 | |
|
|
126 | |
|
4.13.1 Construction of a log-linear model |
|
|
126 | |
|
4.13.2 Interpretation of a log-linear model |
|
|
128 | |
|
4.13.3 Graphical log-linear models |
|
|
129 | |
|
4.13.4 Log-linear model comparison |
|
|
132 | |
|
|
133 | |
|
4.14.1 Symmetric graphical models |
|
|
135 | |
|
4.14.2 Recursive graphical models |
|
|
139 | |
|
4.14.3 Graphical models and neural networks |
|
|
141 | |
|
4.15 Survival analysis models |
|
|
142 | |
|
|
144 | |
|
|
147 | |
|
5.1 Criteria based on statistical tests |
|
|
148 | |
|
5.1.1 Distance between statistical models |
|
|
148 | |
|
5.1.2 Discrepancy of a statistical model |
|
|
150 | |
|
5.1.3 Kullback–Leibler discrepancy |
|
|
151 | |
|
5.2 Criteria based on scoring functions |
|
|
153 | |
|
|
155 | |
|
5.4 Computational criteria |
|
|
156 | |
|
5.5 Criteria based on loss functions |
|
|
159 | |
|
|
162 | |
Part II Business case studies |
|
163 | |
|
6 Describing website visitors |
|
|
165 | |
|
6.1 Objectives of the analysis |
|
|
165 | |
|
6.2 Description of the data |
|
|
165 | |
|
|
167 | |
|
|
167 | |
|
|
168 | |
|
|
169 | |
|
|
171 | |
|
|
172 | |
|
|
175 | |
|
7.1 Objectives of the analysis |
|
|
175 | |
|
7.2 Description of the data |
|
|
176 | |
|
7.3 Exploratory data analysis |
|
|
178 | |
|
|
181 | |
|
|
181 | |
|
|
184 | |
|
|
186 | |
|
|
191 | |
|
8 Describing customer satisfaction |
|
|
193 | |
|
8.1 Objectives of the analysis |
|
|
193 | |
|
8.2 Description of the data |
|
|
194 | |
|
8.3 Exploratory data analysis |
|
|
194 | |
|
|
197 | |
|
|
201 | |
|
9 Predicting credit risk of small businesses |
|
|
203 | |
|
9.1 Objectives of the analysis |
|
|
203 | |
|
9.2 Description of the data |
|
|
203 | |
|
9.3 Exploratory data analysis |
|
|
205 | |
|
|
206 | |
|
|
209 | |
|
|
210 | |
|
10 Predicting e-learning student performance |
|
|
211 | |
|
10.1 Objectives of the analysis |
|
|
211 | |
|
10.2 Description of the data |
|
|
212 | |
|
10.3 Exploratory data analysis |
|
|
212 | |
|
|
214 | |
|
|
217 | |
|
|
218 | |
|
11 Predicting customer lifetime value |
|
|
219 | |
|
11.1 Objectives of the analysis |
|
|
219 | |
|
11.2 Description of the data |
|
|
220 | |
|
11.3 Exploratory data analysis |
|
|
221 | |
|
|
223 | |
|
|
224 | |
|
|
225 | |
|
12 Operational risk management |
|
|
227 | |
|
12.1 Context and objectives of the analysis |
|
|
227 | |
|
12.2 Exploratory data analysis |
|
|
228 | |
|
|
230 | |
|
|
232 | |
|
|
235 | |
References |
|
237 | |
Index |
|
243 | |