Preface |
|
xiii | |
|
|
1 | (54) |
|
|
3 | (6) |
|
|
3 | (1) |
|
|
4 | (5) |
|
|
9 | (4) |
|
|
9 | (1) |
|
2.2 Portfolio construction: the workflow |
|
|
10 | (1) |
|
2.3 Machine learning is no magic wand |
|
|
11 | (2) |
|
3 Factor investing and asset pricing anomalies |
|
|
13 | (22) |
|
|
14 | (1) |
|
|
15 | (12) |
|
|
15 | (1) |
|
3.2.2 Simple portfolio sorts |
|
|
15 | (2) |
|
|
17 | (5) |
|
3.2.4 Fama-Macbeth regressions |
|
|
22 | (3) |
|
|
25 | (1) |
|
3.2.6 Advanced techniques |
|
|
26 | (1) |
|
3.3 Factors or characteristics? |
|
|
27 | (1) |
|
3.4 Hot topics: momentum, timing and ESG |
|
|
28 | (2) |
|
|
28 | (1) |
|
|
29 | (1) |
|
|
30 | (1) |
|
3.5 The links with machine learning |
|
|
30 | (4) |
|
3.5.1 A short list of recent references |
|
|
31 | (1) |
|
3.5.2 Explicit connections with asset pricing models |
|
|
31 | (3) |
|
|
34 | (1) |
|
|
35 | (20) |
|
|
35 | (3) |
|
|
38 | (2) |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
41 | (1) |
|
4.4.2 Scaling the predictors |
|
|
41 | (1) |
|
|
42 | (5) |
|
|
42 | (1) |
|
|
43 | (1) |
|
4.5.3 The triple barrier method |
|
|
44 | (1) |
|
4.5.4 Filtering the sample |
|
|
45 | (1) |
|
|
46 | (1) |
|
|
47 | (1) |
|
|
47 | (3) |
|
4.7.1 Transforming features |
|
|
47 | (1) |
|
4.7.2 Macro-economic variables |
|
|
48 | (1) |
|
|
48 | (2) |
|
4.8 Additional code and results |
|
|
50 | (3) |
|
4.8.1 Impact of rescaling: graphical representation |
|
|
50 | (2) |
|
4.8.2 Impact of rescaling: toy example |
|
|
52 | (1) |
|
|
53 | (2) |
|
II Common supervised algorithms |
|
|
55 | (88) |
|
5 Penalized regressions and sparse hedging for minimum variance portfolios |
|
|
57 | (12) |
|
5.1 Penalized regressions |
|
|
57 | (5) |
|
|
57 | (1) |
|
5.1.2 Forms of penalizations |
|
|
58 | (2) |
|
|
60 | (2) |
|
5.2 Sparse hedging for minimum variance portfolios |
|
|
62 | (5) |
|
5.2.1 Presentation and derivations |
|
|
62 | (3) |
|
|
65 | (2) |
|
5.3 Predictive regressions |
|
|
67 | (1) |
|
5.3.1 Literature review and principle |
|
|
67 | (1) |
|
|
68 | (1) |
|
|
68 | (1) |
|
|
69 | (22) |
|
|
69 | (7) |
|
|
69 | (2) |
|
6.1.2 Further details on classification |
|
|
71 | (1) |
|
|
72 | (1) |
|
6.1.4 Code and interpretation |
|
|
73 | (3) |
|
|
76 | (3) |
|
|
76 | (2) |
|
|
78 | (1) |
|
6.3 Boosted trees: Adaboost |
|
|
79 | (3) |
|
|
79 | (3) |
|
|
82 | (1) |
|
6.4 Boosted trees: extreme gradient boosting |
|
|
82 | (7) |
|
|
83 | (1) |
|
|
83 | (1) |
|
|
84 | (1) |
|
|
85 | (1) |
|
|
86 | (1) |
|
|
86 | (2) |
|
|
88 | (1) |
|
|
89 | (1) |
|
|
90 | (1) |
|
|
91 | (32) |
|
7.1 The original perceptron |
|
|
92 | (1) |
|
7.2 Multilayer perceptron |
|
|
93 | (8) |
|
7.2.1 Introduction and notations |
|
|
93 | (3) |
|
7.2.2 Universal approximation |
|
|
96 | (1) |
|
7.2.3 Learning via back-propagation |
|
|
97 | (3) |
|
7.2.4 Further details on classification |
|
|
100 | (1) |
|
7.3 How deep we should go and other practical issues |
|
|
101 | (3) |
|
7.3.1 Architectural choices |
|
|
101 | (1) |
|
7.3.2 Frequency of weight updates and learning duration |
|
|
102 | (1) |
|
7.3.3 Penalizations and dropout |
|
|
103 | (1) |
|
7.4 Code samples and comments for vanilla MLP |
|
|
104 | (8) |
|
|
104 | (3) |
|
7.4.2 Classification example |
|
|
107 | (4) |
|
|
111 | (1) |
|
|
112 | (5) |
|
|
112 | (2) |
|
|
114 | (3) |
|
7.6 Other common architectures |
|
|
117 | (4) |
|
7.6.1 Generative adversarial networks |
|
|
117 | (1) |
|
|
118 | (1) |
|
7.6.3 A word on convolutional networks |
|
|
119 | (2) |
|
7.6.4 Advanced architectures |
|
|
121 | (1) |
|
|
121 | (2) |
|
8 Support vector machines |
|
|
123 | (6) |
|
8.1 SVM for classification |
|
|
123 | (3) |
|
|
126 | (1) |
|
|
127 | (1) |
|
|
128 | (1) |
|
|
129 | (14) |
|
9.1 The Bayesian framework |
|
|
129 | (2) |
|
|
131 | (1) |
|
|
131 | (1) |
|
9.2.2 Metropolis-Hastings sampling |
|
|
131 | (1) |
|
9.3 Bayesian linear regression |
|
|
132 | (3) |
|
9.4 Naive Bayes classifier |
|
|
135 | (3) |
|
9.5 Bayesian additive trees |
|
|
138 | (5) |
|
9.5.1 General formulation |
|
|
138 | (1) |
|
|
138 | (1) |
|
9.5.3 Sampling and predictions |
|
|
139 | (2) |
|
|
141 | (2) |
|
III From predictions to portfolios |
|
|
143 | (54) |
|
|
145 | (20) |
|
|
145 | (6) |
|
10.1.1 Regression analysis |
|
|
145 | (2) |
|
10.1.2 Classification analysis |
|
|
147 | (4) |
|
|
151 | (7) |
|
10.2.1 The variance-bias tradeoff: theory |
|
|
151 | (3) |
|
10.2.2 The variance-bias tradeoff: illustration |
|
|
154 | (2) |
|
10.2.3 The risk of overfitting: principle |
|
|
156 | (1) |
|
10.2.4 The risk of overfitting: some solutions |
|
|
157 | (1) |
|
10.3 The search for good hyperparameters |
|
|
158 | (5) |
|
|
158 | (2) |
|
10.3.2 Example: grid search |
|
|
160 | (2) |
|
10.3.3 Example: Bayesian optimization |
|
|
162 | (1) |
|
10.4 Short discussion on validation in backtests |
|
|
163 | (2) |
|
|
165 | (12) |
|
|
166 | (4) |
|
|
166 | (1) |
|
|
167 | (3) |
|
|
170 | (2) |
|
11.2.1 Two-stage training |
|
|
170 | (1) |
|
|
170 | (2) |
|
|
172 | (4) |
|
11.3.1 Exogenous variables |
|
|
172 | (1) |
|
11.3.2 Shrinking inter-model correlations |
|
|
173 | (3) |
|
|
176 | (1) |
|
|
177 | (20) |
|
12.1 Setting the protocol |
|
|
177 | (2) |
|
12.2 Turning signals into portfolio weights |
|
|
179 | (2) |
|
|
181 | (4) |
|
|
181 | (1) |
|
12.3.2 Pure performance and risk indicators |
|
|
182 | (1) |
|
12.3.3 Factor-based evaluation |
|
|
183 | (1) |
|
12.3.4 Risk-adjusted measures |
|
|
184 | (1) |
|
12.3.5 Transaction costs and turnover |
|
|
184 | (1) |
|
12.4 Common errors and issues |
|
|
185 | (2) |
|
12.4.1 Forward looking data |
|
|
185 | (1) |
|
12.4.2 Backtest overfitting |
|
|
185 | (2) |
|
|
187 | (1) |
|
12.5 Implication of non-stationarity: forecasting is hard |
|
|
187 | (2) |
|
|
187 | (1) |
|
12.5.2 The no free lunch theorem |
|
|
188 | (1) |
|
12.6 First example: a complete backtest |
|
|
189 | (4) |
|
12.7 Second example: backtest overfitting |
|
|
193 | (3) |
|
|
196 | (1) |
|
IV Further important topics |
|
|
197 | (64) |
|
|
199 | (16) |
|
13.1 Global interpretations |
|
|
200 | (6) |
|
13.1.1 Simple models as surrogates |
|
|
200 | (1) |
|
13.1.2 Variable importance (tree-based) |
|
|
201 | (2) |
|
13.1.3 Variable importance (agnostic) |
|
|
203 | (2) |
|
13.1.4 Partial dependence plot |
|
|
205 | (1) |
|
13.2 Local interpretations |
|
|
206 | (9) |
|
|
207 | (3) |
|
|
210 | (2) |
|
|
212 | (3) |
|
14 Two key concepts: causality and non-stationarity |
|
|
215 | (18) |
|
|
216 | (7) |
|
|
216 | (1) |
|
14.1.2 Causal additive models |
|
|
217 | (3) |
|
14.1.3 Structural time series models |
|
|
220 | (3) |
|
14.2 Dealing with changing environments |
|
|
223 | (10) |
|
14.2.1 Non-stationarity: yet another illustration |
|
|
225 | (2) |
|
|
227 | (2) |
|
14.2.3 Homogeneous transfer learning |
|
|
229 | (4) |
|
|
233 | (14) |
|
15.1 The problem with correlated predictors |
|
|
233 | (2) |
|
15.2 Principal component analysis and autoencoders |
|
|
235 | (6) |
|
|
236 | (1) |
|
|
236 | (3) |
|
|
239 | (1) |
|
|
240 | (1) |
|
15.3 Clustering via k-means |
|
|
241 | (2) |
|
|
243 | (2) |
|
|
245 | (2) |
|
16 Reinforcement learning |
|
|
247 | (14) |
|
|
247 | (5) |
|
|
247 | (2) |
|
|
249 | (2) |
|
|
251 | (1) |
|
16.2 The curse of dimensionality |
|
|
252 | (1) |
|
|
253 | (3) |
|
|
253 | (1) |
|
|
254 | (2) |
|
|
256 | (3) |
|
16.4.1 Q-learning with simulations |
|
|
256 | (1) |
|
16.4.2 Q-learning with market data |
|
|
257 | (2) |
|
|
259 | (1) |
|
|
260 | (1) |
|
|
261 | (28) |
|
|
263 | (4) |
|
18 Solutions to exercises |
|
|
267 | (22) |
|
|
267 | (2) |
|
|
269 | (4) |
|
|
273 | (1) |
|
|
273 | (3) |
|
18.5 Chapter 8: the autoencoder model |
|
|
276 | (2) |
|
|
278 | (1) |
|
18.7 Chapter 12: ensemble neural network |
|
|
279 | (2) |
|
|
281 | (4) |
|
18.8.1 EW portfolios with the tidyverse |
|
|
281 | (1) |
|
18.8.2 Advanced weighting function |
|
|
282 | (1) |
|
18.8.3 Functional programming in the backtest |
|
|
283 | (2) |
|
|
285 | (1) |
|
|
285 | (4) |
Bibliography |
|
289 | (30) |
Index |
|
319 | |