|
|
1 | (12) |
|
1.1 The Statistical Modeling Cycle |
|
|
1 | (2) |
|
1.2 Preliminaries on Probability Theory |
|
|
3 | (4) |
|
1.3 Lab: Exploratory Data Analysis |
|
|
7 | (2) |
|
|
9 | (4) |
|
2 Exponential Dispersion Family |
|
|
13 | (36) |
|
|
13 | (15) |
|
2.1.1 Definition and Properties |
|
|
13 | (5) |
|
2.1.2 Single-Parameter Linear EF: Count Variable Examples |
|
|
18 | (2) |
|
2.1.3 Vector-Valued Parameter EF: Absolutely Continuous Examples |
|
|
20 | (7) |
|
2.1.4 Vector-Valued Parameter EF: Count Variable Example |
|
|
27 | (1) |
|
2.2 Exponential Dispersion Family |
|
|
28 | (12) |
|
2.2.1 Definition and Properties |
|
|
28 | (3) |
|
2.2.2 Exponential Dispersion Family Examples |
|
|
31 | (3) |
|
2.2.3 Tweedie's Distributions |
|
|
34 | (3) |
|
2.2.4 Steepness of the Cumulant Function |
|
|
37 | (1) |
|
2.2.5 Lab: Large Claims Modeling |
|
|
38 | (2) |
|
2.3 Information Geometry in Exponential Families |
|
|
40 | (9) |
|
2.3.1 Kullback--Leibler Divergence |
|
|
40 | (2) |
|
2.3.2 Unit Deviance and Bregman Divergence |
|
|
42 | (7) |
|
|
49 | (26) |
|
3.1 Introduction to Decision Theory |
|
|
49 | (2) |
|
|
51 | (5) |
|
|
56 | (11) |
|
3.3.1 Cramer--Rao Information Bound |
|
|
56 | (6) |
|
3.3.2 Information Bound in the Exponential Family Case |
|
|
62 | (5) |
|
3.4 Asymptotic Behavior of Estimators |
|
|
67 | (8) |
|
|
67 | (2) |
|
3.4.2 Asymptotic Normality |
|
|
69 | (6) |
|
4 Predictive Modeling and Forecast Evaluation |
|
|
75 | (36) |
|
|
75 | (20) |
|
4.1.1 Mean Squared Error of Prediction |
|
|
76 | (3) |
|
4.1.2 Unit Deviances and Deviance Generalization Loss |
|
|
79 | (9) |
|
4.1.3 A Decision-Theoretic Approach to Forecast Evaluation |
|
|
88 | (7) |
|
|
95 | (11) |
|
4.2.1 In-Sample and Out-of-Sample Losses |
|
|
95 | (3) |
|
4.2.2 Cross-Validation Techniques |
|
|
98 | (5) |
|
4.2.3 Akaike's Information Criterion |
|
|
103 | (3) |
|
|
106 | (5) |
|
4.3.1 Non-parametric Bootstrap Simulation |
|
|
106 | (3) |
|
4.3.2 Parametric Bootstrap Simulation |
|
|
109 | (2) |
|
5 Generalized Linear Models |
|
|
111 | (96) |
|
5.1 Generalized Linear Models and Log-Likelihoods |
|
|
112 | (14) |
|
5.1.1 Regression Modeling |
|
|
112 | (1) |
|
5.1.2 Definition of Generalized Linear Models |
|
|
113 | (2) |
|
5.1.3 Link Functions and Feature Engineering |
|
|
115 | (1) |
|
5.1.4 Log-Likelihood Function and Maximum Likelihood Estimation |
|
|
116 | (6) |
|
5.1.5 Balance Property Under the Canonical Link Choice |
|
|
122 | (1) |
|
5.1.6 Asymptotic Normality |
|
|
123 | (1) |
|
5.1.7 Maximum Likelihood Estimation and Unit Deviances |
|
|
124 | (2) |
|
5.2 Actuarial Applications of Generalized Linear Models |
|
|
126 | (15) |
|
5.2.1 Selection of a Generalized Linear Model |
|
|
126 | (1) |
|
5.2.2 Feature Engineering |
|
|
127 | (5) |
|
|
132 | (1) |
|
5.2.4 Lab: Poisson GLM for Car Insurance Frequencies |
|
|
133 | (8) |
|
|
141 | (39) |
|
5.3.1 Residuals and Dispersion |
|
|
141 | (4) |
|
|
145 | (2) |
|
5.3.3 Analysis of Variance |
|
|
147 | (3) |
|
5.3.4 Lab: Poisson GLM for Car Insurance Frequencies, Revisited |
|
|
150 | (5) |
|
5.3.5 Over-Dispersion in Claim Counts Modeling |
|
|
155 | (7) |
|
5.3.6 Zero-Inflated Poisson Model |
|
|
162 | (5) |
|
5.3.7 Lab: Gamma GLM for Claim Sizes |
|
|
167 | (6) |
|
5.3.8 Lab: Inverse Gaussian GLM for Claim Sizes |
|
|
173 | (3) |
|
5.3.9 Log-Normal Model for Claim Sizes: A Short Discussion |
|
|
176 | (4) |
|
|
180 | (2) |
|
5.5 Double Generalized Linear Model |
|
|
182 | (8) |
|
5.5.1 The Dispersion Submodel |
|
|
182 | (1) |
|
5.5.2 Saddlepoint Approximation |
|
|
183 | (3) |
|
5.5.3 Residual Maximum Likelihood Estimation |
|
|
186 | (1) |
|
5.5.4 Lab: Double GLM Algorithm for Gamma Claim Sizes |
|
|
187 | (2) |
|
5.5.5 Tweedie's Compound Poisson GLM |
|
|
189 | (1) |
|
|
190 | (5) |
|
|
190 | (2) |
|
5.6.2 Case Deletion and Generalized Cross-Validation |
|
|
192 | (3) |
|
5.7 Generalized Linear Models with Categorical Responses |
|
|
195 | (3) |
|
5.7.1 Logistic Categorical Generalized Linear Model |
|
|
195 | (1) |
|
5.7.2 Maximum Likelihood Estimation in Categorical Models |
|
|
196 | (2) |
|
5.8 Further Topics of Regression Modeling |
|
|
198 | (9) |
|
5.8.1 Longitudinal Data and Random Effects |
|
|
198 | (1) |
|
5.8.2 Regression Models Beyond the GLM Framework |
|
|
199 | (3) |
|
5.8.3 Quantile Regression |
|
|
202 | (5) |
|
6 Bayesian Methods, Regularization and Expectation-Maximization |
|
|
207 | (60) |
|
6.1 Bayesian Parameter Estimation |
|
|
207 | (3) |
|
|
210 | (20) |
|
6.2.1 Maximal a Posterior Estimator |
|
|
210 | (2) |
|
6.2.2 Ridge vs. LASSO Regularization |
|
|
212 | (3) |
|
|
215 | (2) |
|
6.2.4 LASSO Regularization |
|
|
217 | (9) |
|
6.2.5 Group LASSO Regularization |
|
|
226 | (4) |
|
6.3 Expectation-Maximization Algorithm |
|
|
230 | (18) |
|
6.3.1 Mixture Distributions |
|
|
230 | (2) |
|
6.3.2 Incomplete and Complete Log-Likelihoods |
|
|
232 | (1) |
|
6.3.3 Expectation-Maximization Algorithm for Mixtures |
|
|
233 | (7) |
|
6.3.4 Lab: Mixture Distribution Applications |
|
|
240 | (8) |
|
6.4 Truncated and Censored Data |
|
|
248 | (19) |
|
6.4.1 Lower-Truncation and Right-Censoring |
|
|
248 | (2) |
|
6.4.2 Parameter Estimation Under Right-Censoring |
|
|
250 | (4) |
|
6.4.3 Parameter Estimation Under Lower-Truncation |
|
|
254 | (10) |
|
|
264 | (3) |
|
|
267 | (114) |
|
7.1 Deep Learning and Representation Learning |
|
|
267 | (2) |
|
7.2 Generic Feed-Forward Neural Networks |
|
|
269 | (24) |
|
7.2.1 Construction of Feed-Forward Neural Networks |
|
|
269 | (5) |
|
7.2.2 Universality Theorems |
|
|
274 | (4) |
|
7.2.3 Gradient Descent Methods |
|
|
278 | (15) |
|
7.3 Feed-Forward Neural Network Examples |
|
|
293 | (5) |
|
7.3.1 Feature Pre-processing |
|
|
293 | (2) |
|
7.3.2 Lab: Poisson FN Network for Car Insurance Frequencies |
|
|
295 | (3) |
|
7.4 Special Features in Networks |
|
|
298 | (44) |
|
7.4.1 Special Purpose Layers |
|
|
298 | (7) |
|
7.4.2 The Balance Property in Neural Networks |
|
|
305 | (10) |
|
7.4.3 Boosting Regression Models with Network Features |
|
|
315 | (4) |
|
7.4.4 Network Ensemble Learning |
|
|
319 | (21) |
|
7.4.5 Identifiability in Feed-Forward Neural Networks |
|
|
340 | (2) |
|
|
342 | (15) |
|
7.5.1 Standardization of the Data Matrix |
|
|
343 | (1) |
|
7.5.2 Introduction to Auto-encoders |
|
|
343 | (1) |
|
7.5.3 Principal Components Analysis |
|
|
344 | (3) |
|
7.5.4 Lab: Lee-Carter Mortality Model |
|
|
347 | (4) |
|
7.5.5 Bottleneck Neural Network |
|
|
351 | (6) |
|
|
357 | (19) |
|
7.6.1 Variable Permutation Importance |
|
|
357 | (2) |
|
7.6.2 Partial Dependence Plots |
|
|
359 | (6) |
|
7.6.3 Interaction Strength |
|
|
365 | (1) |
|
7.6.4 Local Model-Agnostic Methods |
|
|
366 | (1) |
|
7.6.5 Marginal Attribution by Conditioning on Quantiles |
|
|
366 | (10) |
|
7.7 Lab: Analysis of the Fitted Networks |
|
|
376 | (5) |
|
8 Recurrent Neural Networks |
|
|
381 | (26) |
|
8.1 Motivation for Recurrent Neural Networks |
|
|
381 | (2) |
|
8.2 Plain-Vanilla Recurrent Neural Network |
|
|
383 | (7) |
|
8.2.1 Recurrent Neural Network Layer |
|
|
383 | (2) |
|
8.2.2 Deep Recurrent Neural Network Architectures |
|
|
385 | (2) |
|
8.2.3 Designing the Network Output |
|
|
387 | (1) |
|
8.2.4 Time-Distributed Layer |
|
|
388 | (2) |
|
8.3 Special Recurrent Neural Networks |
|
|
390 | (4) |
|
8.3.1 Long Short-Term Memory Network |
|
|
390 | (2) |
|
8.3.2 Gated Recurrent Unit Network |
|
|
392 | (2) |
|
8.4 Lab: Mortality Forecasting with RN Networks |
|
|
394 | (13) |
|
8.4.1 Lee--Carter Model, Revisited |
|
|
394 | (8) |
|
8.4.2 Direct LSTM Mortality Forecasting |
|
|
402 | (5) |
|
9 Convolutional Neural Networks |
|
|
407 | (18) |
|
9.1 Plain-Vanilla Convolutional Neural Network Layer |
|
|
407 | (6) |
|
9.1.1 Input Tensors and Channels |
|
|
408 | (1) |
|
9.1.2 Generic Convolutional Neural Network Layer |
|
|
408 | (3) |
|
9.1.3 Example: Time-Series Analysis and Image Recognition |
|
|
411 | (2) |
|
9.2 Special Purpose Tools for Convolutional Neural Networks |
|
|
413 | (3) |
|
|
413 | (1) |
|
|
414 | (1) |
|
|
414 | (1) |
|
|
415 | (1) |
|
|
416 | (1) |
|
9.3 Convolutional Neural Network Architectures |
|
|
416 | (9) |
|
9.3.1 Illustrative Example of a CN Network Architecture |
|
|
416 | (2) |
|
9.3.2 Lab: Telematics Data |
|
|
418 | (4) |
|
9.3.3 Lab: Mortality Surface Modeling |
|
|
422 | (3) |
|
10 Natural Language Processing |
|
|
425 | (28) |
|
10.1 Feature Pre-processing and Bag-of-Words |
|
|
425 | (4) |
|
|
429 | (11) |
|
10.2.1 Word to Vector Algorithms |
|
|
430 | (6) |
|
10.2.2 Global Vectors Algorithm |
|
|
436 | (4) |
|
10.3 Lab: Predictive Modeling Using Word Embeddings |
|
|
440 | (5) |
|
10.4 Lab: Deep Word Representation Learning |
|
|
445 | (3) |
|
10.5 Outlook: Creating Attention |
|
|
448 | (5) |
|
11 Selected Topics in Deep Learning |
|
|
453 | (84) |
|
11.1 Deep Learning Under Model Uncertainty |
|
|
453 | (23) |
|
11.1.1 Recap: Tweedie's Family |
|
|
454 | (4) |
|
11.1.2 Lab: Claim Size Modeling Under Model Uncertainty |
|
|
458 | (8) |
|
11.1.3 Lab: Deep Dispersion Modeling |
|
|
466 | (6) |
|
11.1.4 Pseudo Maximum Likelihood Estimator |
|
|
472 | (4) |
|
11.2 Deep Quantile Regression |
|
|
476 | (7) |
|
11.2.1 Deep Quantile Regression: Single Quantile |
|
|
477 | (1) |
|
11.2.2 Deep Quantile Regression: Multiple Quantiles |
|
|
478 | (1) |
|
11.2.3 Lab: Deep Quantile Regression |
|
|
479 | (4) |
|
11.3 Deep Composite Model Regression |
|
|
483 | (9) |
|
11.3.1 Joint Elicitability of Quantiles and Expected Shortfalls |
|
|
483 | (4) |
|
11.3.2 Lab: Deep Composite Model Regression |
|
|
487 | (5) |
|
11.4 Model Uncertainty: A Bootstrap Approach |
|
|
492 | (3) |
|
11.5 LocalGLMnet: An Interpretable Network Architecture |
|
|
495 | (18) |
|
11.5.1 Definition of the LocalGLMnet |
|
|
495 | (2) |
|
11.5.2 Variable Selection in LocalGLMnets |
|
|
497 | (2) |
|
11.5.3 Lab: LocalGLMnet for Claim Frequency Modeling |
|
|
499 | (8) |
|
11.5.4 Variable Selection Through Regularization of the LocalGLMnet |
|
|
507 | (2) |
|
11.5.5 Lab: LASSO Regularization of LocalGLMnet |
|
|
509 | (4) |
|
11.6 Selected Applications |
|
|
513 | (24) |
|
11.6.1 Mixture Density Networks |
|
|
513 | (8) |
|
11.6.2 Estimation of Conditional Expectations |
|
|
521 | (9) |
|
11.6.3 Bayesian Networks: An Outlook |
|
|
530 | (7) |
|
12 Appendix A: Technical Results on Networks |
|
|
537 | (16) |
|
12.1 Universality Theorems |
|
|
537 | (3) |
|
12.2 Consistency and Asymptotic Normality |
|
|
540 | (6) |
|
12.3 Functional Limit Theorem |
|
|
546 | (3) |
|
|
549 | (4) |
|
13 Appendix B: Data and Examples |
|
|
553 | (24) |
|
13.1 French Motor Third Party Liability Data |
|
|
553 | (11) |
|
13.2 Swedish Motorcycle Data |
|
|
564 | (6) |
|
13.3 Wisconsin Local Government Property Insurance Fund |
|
|
570 | (3) |
|
13.4 Swiss Accident Insurance Data |
|
|
573 | (4) |
Bibliography |
|
577 | (18) |
Index |
|
595 | |