Preface |
|
xiii | |
Notation |
|
xv | |
|
Part I Bandits, Probability and Concentration |
|
|
1 | (72) |
|
|
3 | (9) |
|
1.1 The Language of Bandits |
|
|
4 | (3) |
|
|
7 | (3) |
|
|
10 | (1) |
|
1.4 Bibliographic Remarks |
|
|
10 | (2) |
|
2 Foundations of Probability () |
|
|
12 | (24) |
|
2.1 Probability Spaces and Random Elements |
|
|
12 | (7) |
|
2.2 σ-Algebras and Knowledge |
|
|
19 | (1) |
|
2.3 Conditional Probabilities |
|
|
20 | (1) |
|
|
21 | (2) |
|
2.5 Integration and Expectation |
|
|
23 | (2) |
|
2.6 Conditional Expectation |
|
|
25 | (4) |
|
|
29 | (3) |
|
2.8 Bibliographic Remarks |
|
|
32 | (1) |
|
|
33 | (3) |
|
3 Stochastic Processes and Markov Chains () |
|
|
36 | (9) |
|
|
37 | (1) |
|
|
37 | (2) |
|
3.3 Martingales and Stopping Times |
|
|
39 | (2) |
|
|
41 | (1) |
|
3.5 Bibliographic Remarks |
|
|
42 | (1) |
|
|
43 | (2) |
|
|
45 | (15) |
|
|
45 | (1) |
|
4.2 The Learning Objective |
|
|
45 | (1) |
|
4.3 Knowledge and Environment Classes |
|
|
46 | (2) |
|
|
48 | (2) |
|
4.5 Decomposing the Regret |
|
|
50 | (1) |
|
4.6 The Canonical Bandit Model () |
|
|
51 | (2) |
|
4.7 The Canonical Bandit Model for Uncountable Action Sets () |
|
|
53 | (1) |
|
|
54 | (1) |
|
4.9 Bibliographical Remarks |
|
|
55 | (1) |
|
|
56 | (4) |
|
5 Concentration of Measure |
|
|
60 | (13) |
|
|
60 | (1) |
|
5.2 The Inequalities of Markov and Chebyshev |
|
|
61 | (1) |
|
5.3 The Cramer-Chernoff Method and Subgaussian Random Variables |
|
|
62 | (2) |
|
|
64 | (2) |
|
5.5 Bibliographical Remarks |
|
|
66 | (1) |
|
|
66 | (7) |
|
Part II Stochastic Bandits with Finitely Many Arms |
|
|
73 | (50) |
|
6 The Explore-Then-Commit Algorithm |
|
|
75 | (9) |
|
6.1 Algorithm and Regret Analysis |
|
|
75 | (3) |
|
|
78 | (1) |
|
6.3 Bibliographical Remarks |
|
|
78 | (1) |
|
|
79 | (5) |
|
7 The Upper Confidence Bound Algorithm |
|
|
84 | (13) |
|
7.1 The Optimism Principle |
|
|
84 | (7) |
|
|
91 | (1) |
|
7.3 Bibliographical Remarks |
|
|
92 | (1) |
|
|
92 | (5) |
|
8 The Upper Confidence Bound Algorithm: Asymptotic Optimality |
|
|
97 | (6) |
|
8.1 Asymptotically Optimal UCB |
|
|
97 | (3) |
|
|
100 | (1) |
|
8.3 Bibliographic Remarks |
|
|
100 | (1) |
|
|
101 | (2) |
|
9 The Upper Confidence Bound Algorithm: Minimax Optimality |
|
|
103 | (9) |
|
|
103 | (3) |
|
|
106 | (2) |
|
|
108 | (1) |
|
9.4 Bibliographic Remarks |
|
|
108 | (1) |
|
|
109 | (3) |
|
10 The Upper Confidence Bound Algorithm: Bernoulli Noise |
|
|
112 | (11) |
|
10.1 Concentration for Sums of Bernoulli Random Variables |
|
|
112 | (3) |
|
10.2 The KL-UCB Algorithm |
|
|
115 | (3) |
|
|
118 | (1) |
|
10.4 Bibliographic Remarks |
|
|
119 | (1) |
|
|
119 | (4) |
|
Part III Adversarial Bandits with Finitely Many Arms |
|
|
123 | (30) |
|
|
127 | (15) |
|
11.1 Adversarial Bandit Environments |
|
|
127 | (2) |
|
11.2 Importance-Weighted Estimators |
|
|
129 | (2) |
|
|
131 | (1) |
|
|
131 | (4) |
|
|
135 | (2) |
|
11.6 Bibliographic Remarks |
|
|
137 | (1) |
|
|
138 | (4) |
|
|
142 | (11) |
|
12.1 The Exp3-IX Algorithm |
|
|
142 | (2) |
|
|
144 | (4) |
|
|
148 | (1) |
|
12.4 Bibliographic Remarks |
|
|
149 | (1) |
|
|
149 | (4) |
|
Part IV Lower Bounds for Bandits with Finitely Many Arms |
|
|
153 | (38) |
|
13 Lower Bounds: Basic Ideas |
|
|
155 | (5) |
|
13.1 Main Ideas Underlying Minimax Lower Bounds |
|
|
155 | (3) |
|
|
158 | (1) |
|
13.3 Bibliographic Remarks |
|
|
158 | (1) |
|
|
159 | (1) |
|
14 Foundations of Information Theory |
|
|
160 | (10) |
|
14.1 Entropy and Optimal Coding |
|
|
160 | (2) |
|
|
162 | (3) |
|
|
165 | (2) |
|
14.4 Bibliographic Remarks |
|
|
167 | (1) |
|
|
167 | (3) |
|
|
170 | (7) |
|
15.1 Relative Entropy Between Bandits |
|
|
170 | (1) |
|
15.2 Minimax Lower Bounds |
|
|
171 | (2) |
|
|
173 | (1) |
|
15.4 Bibliographic Remarks |
|
|
174 | (1) |
|
|
174 | (3) |
|
16 Instance-Dependent Lower Bounds |
|
|
177 | (8) |
|
|
177 | (3) |
|
|
180 | (1) |
|
|
181 | (1) |
|
16.4 Bibliographic Remarks |
|
|
181 | (1) |
|
|
181 | (4) |
|
17 High-Probability Lower Bounds |
|
|
185 | (6) |
|
|
186 | (2) |
|
|
188 | (2) |
|
|
190 | (1) |
|
17.4 Bibliographic Remarks |
|
|
190 | (1) |
|
|
190 | (1) |
|
Part V Contextual and Linear Bandits |
|
|
191 | (74) |
|
|
193 | (12) |
|
18.1 Contextual Bandits: One Bandit per Context |
|
|
193 | (2) |
|
18.2 Bandits with Expert Advice |
|
|
195 | (2) |
|
|
197 | (1) |
|
|
198 | (2) |
|
|
200 | (1) |
|
18.6 Bibliographic Remarks |
|
|
201 | (1) |
|
|
202 | (3) |
|
19 Stochastic Linear Bandits |
|
|
205 | (14) |
|
19.1 Stochastic Contextual Bandits |
|
|
205 | (2) |
|
19.2 Stochastic Linear Bandits |
|
|
207 | (2) |
|
|
209 | (2) |
|
|
211 | (2) |
|
19.5 Bibliographic Remarks |
|
|
213 | (1) |
|
|
214 | (5) |
|
20 Confidence Bounds for Least Squares Estimators |
|
|
219 | (12) |
|
20.1 Martingales and the Method of Mixtures |
|
|
221 | (4) |
|
|
225 | (1) |
|
20.3 Bibliographic Remarks |
|
|
226 | (1) |
|
|
227 | (4) |
|
21 Optimal Design for Least Squares Estimators |
|
|
231 | (5) |
|
21.1 The Kiefer--Wolfowitz Theorem |
|
|
231 | (2) |
|
|
233 | (2) |
|
21.3 Bibliographic Remarks |
|
|
235 | (1) |
|
|
235 | (1) |
|
22 Stochastic Linear Bandits with Finitely Many Arms |
|
|
236 | (4) |
|
|
237 | (1) |
|
22.2 Bibliographic Remarks |
|
|
238 | (1) |
|
|
238 | (2) |
|
23 Stochastic Linear Bandits with Sparsity |
|
|
240 | (10) |
|
23.1 Sparse Linear Stochastic Bandits |
|
|
240 | (1) |
|
23.2 Elimination on the Hypercube |
|
|
241 | (3) |
|
23.3 Online to Confidence Set Conversion |
|
|
244 | (4) |
|
23.4 Sparse Online Linear Prediction |
|
|
248 | (1) |
|
|
248 | (1) |
|
23.6 Bibliographical Remarks |
|
|
249 | (1) |
|
|
249 | (1) |
|
24 Minimax Lower Bounds for Stochastic Linear Bandits |
|
|
250 | (8) |
|
|
250 | (2) |
|
|
252 | (1) |
|
24.3 Sparse Parameter Vectors |
|
|
253 | (2) |
|
|
255 | (1) |
|
|
256 | (1) |
|
24.6 Bibliographic Remarks |
|
|
257 | (1) |
|
|
257 | (1) |
|
25 Asymptotic Lower Bounds for Stochastic Linear Bandits |
|
|
258 | (7) |
|
25.1 An Asymptotic Lower Bound for Fixed Action Sets |
|
|
258 | (4) |
|
25.2 Clouds Looming for Optimism |
|
|
262 | (1) |
|
|
263 | (1) |
|
25.4 Bibliographic Remarks |
|
|
264 | (1) |
|
|
264 | (1) |
|
Part VI Adversarial Linear Bandits |
|
|
265 | (48) |
|
26 Foundations of Convex Analysis |
|
|
267 | (11) |
|
26.1 Convex Sets and Functions |
|
|
267 | (2) |
|
|
269 | (1) |
|
|
269 | (2) |
|
|
271 | (2) |
|
|
273 | (1) |
|
|
274 | (1) |
|
|
274 | (1) |
|
26.8 Bibliographic Remarks |
|
|
275 | (1) |
|
|
275 | (3) |
|
27 Exp3 for Adversarial Linear Bandits |
|
|
278 | (8) |
|
27.1 Exponential Weights for Linear Bandits |
|
|
278 | (2) |
|
|
280 | (1) |
|
27.3 Continuous Exponential Weights |
|
|
281 | (2) |
|
|
283 | (1) |
|
27.5 Bibliographic Remarks |
|
|
283 | (1) |
|
|
284 | (2) |
|
28 Follow-the-regularised-Leader and Mirror Descent |
|
|
286 | (20) |
|
28.1 Online Linear Optimisation |
|
|
286 | (4) |
|
|
290 | (4) |
|
28.3 Application to Linear Bandits |
|
|
294 | (1) |
|
28.4 Linear Bandits on the Unit Ball |
|
|
295 | (3) |
|
|
298 | (3) |
|
28.6 Bibliographic Remarks |
|
|
301 | (1) |
|
|
301 | (5) |
|
29 The Relation between Adversarial and Stochastic Linear Bandits |
|
|
306 | (7) |
|
|
306 | (1) |
|
29.2 Reducing Stochastic Linear Bandits to Adversarial Linear Bandits |
|
|
307 | (1) |
|
29.3 Stochastic Linear Bandits with Parameter Noise |
|
|
308 | (1) |
|
29.4 Contextual Linear Bandits |
|
|
309 | (1) |
|
|
310 | (1) |
|
29.6 Bibliographic Remarks |
|
|
311 | (1) |
|
|
311 | (2) |
|
|
313 | (108) |
|
|
317 | (14) |
|
30.1 Notation and Assumptions |
|
|
317 | (1) |
|
|
318 | (1) |
|
|
319 | (1) |
|
30.4 Semi-bandit Feedback and Mirror Descent |
|
|
320 | (1) |
|
30.5 Follow-the-Perturbed-Leader |
|
|
321 | (5) |
|
|
326 | (1) |
|
30.7 Bibliographic Remarks |
|
|
327 | (1) |
|
|
328 | (3) |
|
31 Non-stationary Bandits |
|
|
331 | (9) |
|
|
331 | (3) |
|
|
334 | (2) |
|
|
336 | (1) |
|
31.4 Bibliographic Remarks |
|
|
337 | (1) |
|
|
338 | (2) |
|
|
340 | (13) |
|
|
341 | (2) |
|
|
343 | (2) |
|
|
345 | (4) |
|
|
349 | (2) |
|
32.5 Bibliographic Remarks |
|
|
351 | (1) |
|
|
351 | (2) |
|
|
353 | (16) |
|
|
353 | (2) |
|
33.2 Best-Arm Identification with a Fixed Confidence |
|
|
355 | (6) |
|
33.3 Best-Arm Identification with a Budget |
|
|
361 | (1) |
|
|
362 | (2) |
|
33.5 Bibliographical Remarks |
|
|
364 | (1) |
|
|
365 | (4) |
|
34 Foundations of Bayesian Learning |
|
|
369 | (17) |
|
34.1 Statistical Decision Theory and Bayesian Learning |
|
|
369 | (1) |
|
34.2 Bayesian Learning and the Posterior Distribution |
|
|
370 | (4) |
|
34.3 Conjugate Pairs, Conjugate Priors and the Exponential Family |
|
|
374 | (3) |
|
34.4 The Bayesian Bandit Environment |
|
|
377 | (1) |
|
34.5 Posterior Distributions in Bandits |
|
|
378 | (1) |
|
|
379 | (1) |
|
|
380 | (2) |
|
34.8 Bibliographic Remarks |
|
|
382 | (1) |
|
|
382 | (4) |
|
|
386 | (18) |
|
35.1 Bayesian Optimal Regret for k-Armed Stochastic Bandits |
|
|
386 | (1) |
|
|
387 | (2) |
|
|
389 | (4) |
|
|
393 | (6) |
|
35.5 Computing the Gittins Index |
|
|
399 | (1) |
|
|
399 | (2) |
|
35.7 Bibliographical Remarks |
|
|
401 | (1) |
|
|
402 | (2) |
|
|
404 | (17) |
|
36.1 Finite-Armed Bandits |
|
|
404 | (2) |
|
36.2 Frequentist Analysis |
|
|
406 | (3) |
|
|
409 | (2) |
|
36.4 Information Theoretic Analysis |
|
|
411 | (3) |
|
|
414 | (2) |
|
36.6 Bibliographic Remarks |
|
|
416 | (1) |
|
|
417 | (4) |
|
|
421 | (63) |
|
|
423 | (29) |
|
37.1 Finite Adversarial Partial Monitoring Problems |
|
|
424 | (2) |
|
37.2 The Structure of Partial Monitoring |
|
|
426 | (4) |
|
37.3 Classification of Finite Adversarial Partial Monitoring |
|
|
430 | (1) |
|
|
430 | (5) |
|
37.5 Policy and Upper Bounds |
|
|
435 | (4) |
|
37.6 Proof of Theorem 37.16 |
|
|
439 | (1) |
|
37.7 Proof of Theorem 37.17 |
|
|
440 | (4) |
|
37.8 Proof of the Classification Theorem |
|
|
444 | (1) |
|
|
445 | (2) |
|
37.10 Bibliographical Remarks |
|
|
447 | (2) |
|
|
449 | (3) |
|
38 Markov Decision Processes |
|
|
452 | (32) |
|
|
452 | (3) |
|
38.2 Optimal Policies and the Bellman Optimality Equation |
|
|
455 | (3) |
|
38.3 Finding an Optimal Policy |
|
|
458 | (4) |
|
38.4 Learning in Markov Decision Processes |
|
|
462 | (1) |
|
38.5 Upper Confidence Bounds for Reinforcement Learning |
|
|
463 | (2) |
|
38.6 Proof of Upper Bound |
|
|
465 | (3) |
|
38.7 Proof of Lower Bound |
|
|
468 | (3) |
|
|
471 | (2) |
|
38.9 Bibliographical Remarks |
|
|
473 | (2) |
|
|
475 | (9) |
Bibliography |
|
484 | (29) |
Index |
|
513 | |