Preface |
|
xi | |
|
|
1 | (6) |
|
1.1 What You Can Find in Here |
|
|
1 | (3) |
|
|
4 | (1) |
|
|
5 | (2) |
Part I Fundamentals without Noise |
|
7 | (196) |
|
|
9 | (42) |
|
2.1 You Have a Control Problem |
|
|
9 | (2) |
|
|
11 | (1) |
|
|
12 | (5) |
|
2.4 Stability and Performance |
|
|
17 | (12) |
|
2.5 A Glance Ahead: From Control Theory to RL |
|
|
29 | (3) |
|
2.6 How Can We Ignore Noise? |
|
|
32 | (1) |
|
|
32 | (11) |
|
|
43 | (6) |
|
|
49 | (2) |
|
|
51 | (33) |
|
3.1 Value Function for Total Cost |
|
|
51 | (1) |
|
|
52 | (7) |
|
|
59 | (4) |
|
3.4 Inverse Dynamic Programming |
|
|
63 | (1) |
|
3.5 Bellman Equation Is a Linear Program |
|
|
64 | (1) |
|
3.6 Linear Quadratic Regulator |
|
|
65 | (2) |
|
3.7 A Second Glance Ahead |
|
|
67 | (1) |
|
3.8 Optimal Control in Continuous Time |
|
|
68 | (2) |
|
|
70 | (8) |
|
|
78 | (5) |
|
|
83 | (1) |
|
4 ODE Methods for Algorithm Design |
|
|
84 | (75) |
|
4.1 Ordinary Differential Equations |
|
|
84 | (3) |
|
4.2 A Brief Return to Reality |
|
|
87 | (1) |
|
|
88 | (2) |
|
|
90 | (7) |
|
4.5 Quasistochastic Approximation |
|
|
97 | (16) |
|
4.6 Gradient-Free Optimization |
|
|
113 | (5) |
|
4.7 Quasi Policy Gradient Algorithms |
|
|
118 | (5) |
|
|
123 | (8) |
|
4.9 Convergence Theory for QSA |
|
|
131 | (18) |
|
|
149 | (5) |
|
|
154 | (5) |
|
5 Value Function Approximations |
|
|
159 | (44) |
|
5.1 Function Approximation Architectures |
|
|
160 | (8) |
|
5.2 Exploration and ODE Approximations |
|
|
168 | (3) |
|
5.3 TD-Learning and Linear Regression |
|
|
171 | (5) |
|
5.4 Projected Bellman Equations and TD Algorithms |
|
|
176 | (10) |
|
|
186 | (5) |
|
5.6 Q-Learning in Continuous Time |
|
|
191 | (2) |
|
|
193 | (3) |
|
|
196 | (3) |
|
|
199 | (4) |
Part II Reinforcement Learning and Stochastic Control |
|
203 | (190) |
|
|
205 | (39) |
|
6.1 Markov Models Are State Space Models |
|
|
205 | (3) |
|
|
208 | (3) |
|
6.3 Spectra and Ergodicity |
|
|
211 | (4) |
|
6.4 A Random Glance Ahead |
|
|
215 | (1) |
|
|
216 | (2) |
|
|
218 | (4) |
|
6.7 Simulation: Confidence Bounds and Control Variates |
|
|
222 | (8) |
|
6.8 Sensitivity and Actor-Only Methods |
|
|
230 | (3) |
|
6.9 Ergodic Theory for General Markov Chains |
|
|
233 | (3) |
|
|
236 | (7) |
|
|
243 | (1) |
|
|
244 | (36) |
|
7.1 MDPs: A Quick Introduction |
|
|
244 | (4) |
|
7.2 Fluid Models for Approximation |
|
|
248 | (3) |
|
|
251 | (2) |
|
|
253 | (4) |
|
|
257 | (4) |
|
|
261 | (2) |
|
7.7 Controlling Rover with Partial Information |
|
|
263 | (3) |
|
|
266 | (5) |
|
|
271 | (7) |
|
|
278 | (2) |
|
8 Stochastic Approximation |
|
|
280 | (38) |
|
8.1 Asymptotic Covariance |
|
|
281 | (2) |
|
|
283 | (9) |
|
|
292 | (5) |
|
8.4 Algorithm Design Example |
|
|
297 | (3) |
|
8.5 Zap Stochastic Approximation |
|
|
300 | (4) |
|
|
304 | (3) |
|
|
307 | (7) |
|
|
314 | (1) |
|
|
315 | (3) |
|
9 Temporal Difference Methods |
|
|
318 | (44) |
|
|
319 | (4) |
|
9.2 Function Approximation and Smoothing |
|
|
323 | (2) |
|
|
325 | (2) |
|
|
327 | (3) |
|
9.5 Return to the Q-Function |
|
|
330 | (7) |
|
|
337 | (7) |
|
|
344 | (4) |
|
|
348 | (5) |
|
|
353 | (4) |
|
|
357 | (2) |
|
|
359 | (3) |
|
10 Setting the Stage, Return of the Actors |
|
|
362 | (31) |
|
10.1 The Stage, Projection, and Adjoints |
|
|
363 | (4) |
|
10.2 Advantage and Innovation |
|
|
367 | (2) |
|
|
369 | (2) |
|
10.4 Average Cost and Every Other Criterion |
|
|
371 | (5) |
|
|
376 | (4) |
|
|
380 | (2) |
|
10.7 Advantage and Control Variates |
|
|
382 | (2) |
|
10.8 Natural Gradient and Zap |
|
|
384 | (1) |
|
|
385 | (4) |
|
|
389 | (4) |
Appendices |
|
393 | (22) |
|
A Mathematical Background |
|
|
395 | (6) |
|
A.1 Notation and Math Background |
|
|
395 | (2) |
|
A.2 Probability and Markovian Background |
|
|
397 | (4) |
|
B Markov Decision Processes |
|
|
401 | (8) |
|
B.1 Total Cost and Every Other Criterion |
|
|
401 | (2) |
|
B.2 Computational Aspects of MDPs |
|
|
403 | (6) |
|
C Partial Observations and Belief States |
|
|
409 | (6) |
|
|
409 | (1) |
|
|
410 | (3) |
|
C.3 Belief State Dynamics |
|
|
413 | (2) |
References |
|
415 | (16) |
Glossary of Symbols and Acronyms |
|
431 | (2) |
Index |
|
433 | |