Preface to the Second Edition |
|
xi | |
Preface to the First Edition |
|
xv | |
Acknowledgments |
|
xvii | |
|
1 The Challenges of Dynamic Programming |
|
|
1 | (24) |
|
1.1 A Dynamic Programming Example: A Shortest Path Problem |
|
|
2 | (1) |
|
1.2 The Three Curses of Dimensionality |
|
|
3 | (3) |
|
1.3 Some Real Applications |
|
|
6 | (5) |
|
|
11 | (4) |
|
1.5 The Many Dialects of Dynamic Programming |
|
|
15 | (2) |
|
1.6 What Is New in This Book? |
|
|
17 | (2) |
|
|
19 | (3) |
|
|
22 | (3) |
|
2 Some Illustrative Models |
|
|
25 | (32) |
|
2.1 Deterministic Problems |
|
|
26 | (5) |
|
|
31 | (16) |
|
2.3 Information Acquisition Problems |
|
|
47 | (3) |
|
2.4 A Simple Modeling Framework for Dynamic Programs |
|
|
50 | (4) |
|
|
54 | (3) |
|
|
54 | (3) |
|
3 Introduction to Markov Decision Processes |
|
|
57 | (54) |
|
3.1 The Optimality Equations |
|
|
58 | (7) |
|
3.2 Finite Horizon Problems |
|
|
65 | (1) |
|
3.3 Infinite Horizon Problems |
|
|
66 | (2) |
|
|
68 | (6) |
|
|
74 | (1) |
|
3.6 Hybrid Value-Policy Iteration |
|
|
75 | (1) |
|
3.7 Average Reward Dynamic Programming |
|
|
76 | (1) |
|
3.8 The Linear Programming Method for Dynamic Programs |
|
|
77 | (1) |
|
|
78 | (6) |
|
|
84 | (19) |
|
|
103 | (8) |
|
|
103 | (8) |
|
4 Introduction to Approximate Dynamic Programming |
|
|
111 | (56) |
|
4.1 The Three Curses of Dimensionality (Revisited) |
|
|
112 | (2) |
|
|
114 | (8) |
|
|
122 | (4) |
|
4.4 Real-Time Dynamic Programming |
|
|
126 | (1) |
|
4.5 Approximate Value Iteration |
|
|
127 | (2) |
|
4.6 The Post-Decision State Variable |
|
|
129 | (15) |
|
4.7 Low-Dimensional Representations of Value Functions |
|
|
144 | (2) |
|
4.8 So Just What Is Approximate Dynamic Programming? |
|
|
146 | (3) |
|
|
149 | (6) |
|
|
155 | (1) |
|
|
156 | (11) |
|
|
158 | (9) |
|
5 Modeling Dynamic Programs |
|
|
167 | (54) |
|
|
169 | (1) |
|
|
170 | (4) |
|
|
174 | (4) |
|
5.4 The States of Our System |
|
|
178 | (9) |
|
|
187 | (2) |
|
5.6 The Exogenous Information Process |
|
|
189 | (9) |
|
5.7 The Transition Function |
|
|
198 | (8) |
|
5.8 The Objective Function |
|
|
206 | (5) |
|
5.9 A Measure-Theoretic View of Information |
|
|
211 | (2) |
|
|
213 | (8) |
|
|
214 | (7) |
|
|
221 | (28) |
|
|
224 | (1) |
|
|
224 | (8) |
|
6.3 Policy Function Approximations |
|
|
232 | (3) |
|
6.4 Value Function Approximations |
|
|
235 | (4) |
|
|
239 | (3) |
|
|
242 | (2) |
|
6.7 How to Choose a Policy? |
|
|
244 | (3) |
|
|
247 | (2) |
|
|
247 | (2) |
|
|
249 | (40) |
|
|
250 | (3) |
|
|
253 | (3) |
|
7.3 Direct Policy Search for Finite Alternatives |
|
|
256 | (6) |
|
7.4 The Knowledge Gradient Algorithm for Discrete Alternatives |
|
|
262 | (8) |
|
7.5 Simulation Optimization |
|
|
270 | (4) |
|
|
274 | (11) |
|
|
285 | (4) |
|
|
286 | (3) |
|
8 Approximating Value Functions |
|
|
289 | (48) |
|
8.1 Lookup Tables and Aggregation |
|
|
290 | (14) |
|
|
304 | (10) |
|
8.3 Regression Variations |
|
|
314 | (2) |
|
|
316 | (9) |
|
8.5 Approximations and the Curse of Dimensionality |
|
|
325 | (3) |
|
|
328 | (5) |
|
|
333 | (4) |
|
|
334 | (3) |
|
9 Learning Value Function Approximations |
|
|
337 | (46) |
|
9.1 Sampling the Value of a Policy |
|
|
337 | (10) |
|
9.2 Stochastic Approximation Methods |
|
|
347 | (2) |
|
9.3 Recursive Least Squares for Linear Models |
|
|
349 | (7) |
|
9.4 Temporal Difference Learning with a Linear Model |
|
|
356 | (2) |
|
9.5 Bellman's Equation Using a Linear Model |
|
|
358 | (6) |
|
9.6 Analysis of TD(0), LSTD, and LSPE Using a Single State |
|
|
364 | (2) |
|
9.7 Gradient-Based Methods for Approximate Value Iteration |
|
|
366 | (5) |
|
9.8 Least Squares Temporal Differencing with Kernel Regression |
|
|
371 | (2) |
|
9.9 Value Function Approximations Based on Bayesian Learning |
|
|
373 | (3) |
|
|
376 | (3) |
|
|
379 | (4) |
|
|
381 | (2) |
|
10 Optimizing While Learning |
|
|
383 | (36) |
|
10.1 Overview of Algorithmic Strategies |
|
|
385 | (1) |
|
10.2 Approximate Value Iteration and Q-Learning Using Lookup Tables |
|
|
386 | (11) |
|
10.3 Statistical Bias in the Max Operator |
|
|
397 | (3) |
|
10.4 Approximate Value Iteration and Q-Learning Using Linear Models |
|
|
400 | (2) |
|
10.5 Approximate Policy Iteration |
|
|
402 | (6) |
|
10.6 The Actor-Critic Paradigm |
|
|
408 | (2) |
|
10.7 Policy Gradient Methods |
|
|
410 | (1) |
|
10.8 The Linear Programming Method Using Basis Functions |
|
|
411 | (2) |
|
10.9 Approximate Policy Iteration Using Kernel Regression |
|
|
413 | (2) |
|
10.10 Finite Horizon Approximations for Steady-State Applications |
|
|
415 | (1) |
|
10.11 Bibliographic Notes |
|
|
416 | (3) |
|
|
418 | (1) |
|
11 Adaptive Estimation and Stepsizes |
|
|
419 | (38) |
|
11.1 Learning Algorithms and Stepsizes |
|
|
420 | (5) |
|
11.2 Deterministic Stepsize Recipes |
|
|
425 | (8) |
|
11.3 Stochastic Stepsizes |
|
|
433 | (4) |
|
11.4 Optimal Stepsizes for Nonstationary Time Series |
|
|
437 | (10) |
|
11.5 Optimal Stepsizes for Approximate Value Iteration |
|
|
447 | (2) |
|
|
449 | (2) |
|
11.7 Guidelines for Choosing Stepsize Formulas |
|
|
451 | (1) |
|
|
452 | (5) |
|
|
453 | (4) |
|
12 Exploration Versus Exploitation |
|
|
457 | (40) |
|
12.1 A Learning Exercise: The Nomadic Trucker |
|
|
457 | (3) |
|
12.2 An Introduction to Learning |
|
|
460 | (4) |
|
12.3 Heuristic Learning Policies |
|
|
464 | (6) |
|
12.4 Gittins Indexes for Online Learning |
|
|
470 | (7) |
|
12.5 The Knowledge Gradient Policy |
|
|
477 | (5) |
|
12.6 Learning with a Physical State |
|
|
482 | (10) |
|
|
492 | (5) |
|
|
493 | (4) |
|
13 Value Function Approximations for Resource Allocation Problems |
|
|
497 | (44) |
|
13.1 Value Functions versus Gradients |
|
|
498 | (1) |
|
13.2 Linear Approximations |
|
|
499 | (2) |
|
13.3 Piecewise-Linear Approximations |
|
|
501 | (4) |
|
13.4 Solving a Resource Allocation Problem Using Piecewise-Linear Functions |
|
|
505 | (4) |
|
|
509 | (4) |
|
|
513 | (3) |
|
|
516 | (12) |
|
|
528 | (7) |
|
|
535 | (6) |
|
|
536 | (5) |
|
14 Dynamic Resource Allocation Problems |
|
|
541 | (52) |
|
14.1 An Asset Acquisition Problem |
|
|
541 | (6) |
|
14.2 The Blood Management Problem |
|
|
547 | (10) |
|
14.3 A Portfolio Optimization Problem |
|
|
557 | (3) |
|
14.4 A General Resource Allocation Problem |
|
|
560 | (13) |
|
14.5 A Fleet Management Problem |
|
|
573 | (7) |
|
14.6 A Driver Management Problem |
|
|
580 | (5) |
|
|
585 | (8) |
|
|
586 | (7) |
|
15 Implementation Challenges |
|
|
593 | (14) |
|
15.1 Will ADP Work for Your Problem? |
|
|
593 | (1) |
|
15.2 Designing an ADP Algorithm for Complex Problems |
|
|
594 | (2) |
|
15.3 Debugging an ADP Algorithm |
|
|
596 | (1) |
|
|
597 | (5) |
|
15.5 Modeling Your Problem |
|
|
602 | (2) |
|
15.6 Online versus Offline Models |
|
|
604 | (2) |
|
15.7 If It Works, Patent It! |
|
|
606 | (1) |
Bibliography |
|
607 | (16) |
Index |
|
623 | |