Muutke küpsiste eelistusi

Approximate Dynamic Programming: Solving the Curses of Dimensionality 2nd edition [Kõva köide]

(Princeton University, USA)
  • Formaat: Hardback, 656 pages, kõrgus x laius x paksus: 236x155x41 mm, kaal: 1043 g, Charts: 5 B&W, 0 Color; Photos: 5 B&W, 0 Color; Drawings: 63 B&W, 0 Color; Maps: 2 B&W, 0 Color; Tables: 0 B&W, 0 Color; Graphs: 39 B&W, 0 Color
  • Sari: Wiley Series in Probability and Statistics
  • Ilmumisaeg: 18-Nov-2011
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 047060445X
  • ISBN-13: 9780470604458
Teised raamatud teemal:
  • Formaat: Hardback, 656 pages, kõrgus x laius x paksus: 236x155x41 mm, kaal: 1043 g, Charts: 5 B&W, 0 Color; Photos: 5 B&W, 0 Color; Drawings: 63 B&W, 0 Color; Maps: 2 B&W, 0 Color; Tables: 0 B&W, 0 Color; Graphs: 39 B&W, 0 Color
  • Sari: Wiley Series in Probability and Statistics
  • Ilmumisaeg: 18-Nov-2011
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 047060445X
  • ISBN-13: 9780470604458
Teised raamatud teemal:
Praise for the First Edition "Finally, a book devoted to dynamic programming and written using the language of operations research (OR)! This beautiful book fills a gap in the libraries of OR specialists and practitioners." Computing Reviews

This new edition showcases a focus on modeling and computation for complex classes of approximate dynamic programming problems

Understanding approximate dynamic programming (ADP) is vital in order to develop practical and high-quality solutions to complex industrial problems, particularly when those problems involve making decisions in the presence of uncertainty. Approximate Dynamic Programming, Second Edition uniquely integrates four distinct disciplinesMarkov decision processes, mathematical programming, simulation, and statisticsto demonstrate how to successfully approach, model, and solve a wide range of real-life problems using ADP.

The book continues to bridge the gap between computer science, simulation, and operations research and now adopts the notation and vocabulary of reinforcement learning as well as stochastic search and simulation optimization. The author outlines the essential algorithms that serve as a starting point in the design of practical solutions for real problems. The three curses of dimensionality that impact complex problems are introduced and detailed coverage of implementation challenges is provided. The Second Edition also features:





A new chapter describing four fundamental classes of policies for working with diverse stochastic optimization problems: myopic policies, look-ahead policies, policy function approximations, and policies based on value function approximations



A new chapter on policy search that brings together stochastic search and simulation optimization concepts and introduces a new class of optimal learning strategies



Updated coverage of the exploration exploitation problem in ADP, now including a recently developed method for doing active learning in the presence of a physical state, using the concept of the knowledge gradient



A new sequence of chapters describing statistical methods for approximating value functions, estimating the value of a fixed policy, and value function approximation while searching for optimal policies





The presented coverage of ADP emphasizes models and algorithms, focusing on related applications and computation while also discussing the theoretical side of the topic that explores proofs of convergence and rate of convergence. A related website features an ongoing discussion of the evolving fields of approximation dynamic programming and reinforcement learning, along with additional readings, software, and datasets.

Requiring only a basic understanding of statistics and probability, Approximate Dynamic Programming, Second Edition is an excellent book for industrial engineering and operations research courses at the upper-undergraduate and graduate levels. It also serves as a valuable reference for researchers and professionals who utilize dynamic programming, stochastic programming, and control theory to solve problems in their everyday work.
Preface to the Second Edition xi
Preface to the First Edition xv
Acknowledgments xvii
1 The Challenges of Dynamic Programming
1(24)
1.1 A Dynamic Programming Example: A Shortest Path Problem
2(1)
1.2 The Three Curses of Dimensionality
3(3)
1.3 Some Real Applications
6(5)
1.4 Problem Classes
11(4)
1.5 The Many Dialects of Dynamic Programming
15(2)
1.6 What Is New in This Book?
17(2)
1.7 Pedagogy
19(3)
1.8 Bibliographic Notes
22(3)
2 Some Illustrative Models
25(32)
2.1 Deterministic Problems
26(5)
2.2 Stochastic Problems
31(16)
2.3 Information Acquisition Problems
47(3)
2.4 A Simple Modeling Framework for Dynamic Programs
50(4)
2.5 Bibliographic Notes
54(3)
Problems
54(3)
3 Introduction to Markov Decision Processes
57(54)
3.1 The Optimality Equations
58(7)
3.2 Finite Horizon Problems
65(1)
3.3 Infinite Horizon Problems
66(2)
3.4 Value Iteration
68(6)
3.5 Policy Iteration
74(1)
3.6 Hybrid Value-Policy Iteration
75(1)
3.7 Average Reward Dynamic Programming
76(1)
3.8 The Linear Programming Method for Dynamic Programs
77(1)
3.9 Monotone Policies
78(6)
3.10 Why Does It Work?
84(19)
3.11 Bibliographic Notes
103(8)
Problems
103(8)
4 Introduction to Approximate Dynamic Programming
111(56)
4.1 The Three Curses of Dimensionality (Revisited)
112(2)
4.2 The Basic Idea
114(8)
4.3 Q-Learning and SARSA
122(4)
4.4 Real-Time Dynamic Programming
126(1)
4.5 Approximate Value Iteration
127(2)
4.6 The Post-Decision State Variable
129(15)
4.7 Low-Dimensional Representations of Value Functions
144(2)
4.8 So Just What Is Approximate Dynamic Programming?
146(3)
4.9 Experimental Issues
149(6)
4.10 But Does It Work?
155(1)
4.11 Bibliographic Notes
156(11)
Problems
158(9)
5 Modeling Dynamic Programs
167(54)
5.1 Notational Style
169(1)
5.2 Modeling Time
170(4)
5.3 Modeling Resources
174(4)
5.4 The States of Our System
178(9)
5.5 Modeling Decisions
187(2)
5.6 The Exogenous Information Process
189(9)
5.7 The Transition Function
198(8)
5.8 The Objective Function
206(5)
5.9 A Measure-Theoretic View of Information
211(2)
5.10 Bibliographic Notes
213(8)
Problems
214(7)
6 Policies
221(28)
6.1 Myopic Policies
224(1)
6.2 Lookahead Policies
224(8)
6.3 Policy Function Approximations
232(3)
6.4 Value Function Approximations
235(4)
6.5 Hybrid Strategies
239(3)
6.6 Randomized Policies
242(2)
6.7 How to Choose a Policy?
244(3)
6.8 Bibliographic Notes
247(2)
Problems
247(2)
7 Policy Search
249(40)
7.1 Background
250(3)
7.2 Gradient Search
253(3)
7.3 Direct Policy Search for Finite Alternatives
256(6)
7.4 The Knowledge Gradient Algorithm for Discrete Alternatives
262(8)
7.5 Simulation Optimization
270(4)
7.6 Why Does It Work?
274(11)
7.7 Bibliographic Notes
285(4)
Problems
286(3)
8 Approximating Value Functions
289(48)
8.1 Lookup Tables and Aggregation
290(14)
8.2 Parametric Models
304(10)
8.3 Regression Variations
314(2)
8.4 Nonparametric Models
316(9)
8.5 Approximations and the Curse of Dimensionality
325(3)
8.6 Why Does It Work?
328(5)
8.7 Bibliographic Notes
333(4)
Problems
334(3)
9 Learning Value Function Approximations
337(46)
9.1 Sampling the Value of a Policy
337(10)
9.2 Stochastic Approximation Methods
347(2)
9.3 Recursive Least Squares for Linear Models
349(7)
9.4 Temporal Difference Learning with a Linear Model
356(2)
9.5 Bellman's Equation Using a Linear Model
358(6)
9.6 Analysis of TD(0), LSTD, and LSPE Using a Single State
364(2)
9.7 Gradient-Based Methods for Approximate Value Iteration
366(5)
9.8 Least Squares Temporal Differencing with Kernel Regression
371(2)
9.9 Value Function Approximations Based on Bayesian Learning
373(3)
9.10 Why Does It Work
376(3)
9.11 Bibliographic Notes
379(4)
Problems
381(2)
10 Optimizing While Learning
383(36)
10.1 Overview of Algorithmic Strategies
385(1)
10.2 Approximate Value Iteration and Q-Learning Using Lookup Tables
386(11)
10.3 Statistical Bias in the Max Operator
397(3)
10.4 Approximate Value Iteration and Q-Learning Using Linear Models
400(2)
10.5 Approximate Policy Iteration
402(6)
10.6 The Actor-Critic Paradigm
408(2)
10.7 Policy Gradient Methods
410(1)
10.8 The Linear Programming Method Using Basis Functions
411(2)
10.9 Approximate Policy Iteration Using Kernel Regression
413(2)
10.10 Finite Horizon Approximations for Steady-State Applications
415(1)
10.11 Bibliographic Notes
416(3)
Problems
418(1)
11 Adaptive Estimation and Stepsizes
419(38)
11.1 Learning Algorithms and Stepsizes
420(5)
11.2 Deterministic Stepsize Recipes
425(8)
11.3 Stochastic Stepsizes
433(4)
11.4 Optimal Stepsizes for Nonstationary Time Series
437(10)
11.5 Optimal Stepsizes for Approximate Value Iteration
447(2)
11.6 Convergence
449(2)
11.7 Guidelines for Choosing Stepsize Formulas
451(1)
11.8 Bibliographic Notes
452(5)
Problems
453(4)
12 Exploration Versus Exploitation
457(40)
12.1 A Learning Exercise: The Nomadic Trucker
457(3)
12.2 An Introduction to Learning
460(4)
12.3 Heuristic Learning Policies
464(6)
12.4 Gittins Indexes for Online Learning
470(7)
12.5 The Knowledge Gradient Policy
477(5)
12.6 Learning with a Physical State
482(10)
12.7 Bibliographic Notes
492(5)
Problems
493(4)
13 Value Function Approximations for Resource Allocation Problems
497(44)
13.1 Value Functions versus Gradients
498(1)
13.2 Linear Approximations
499(2)
13.3 Piecewise-Linear Approximations
501(4)
13.4 Solving a Resource Allocation Problem Using Piecewise-Linear Functions
505(4)
13.5 The SHAPE Algorithm
509(4)
13.6 Regression Methods
513(3)
13.7 Cutting Planes
516(12)
13.8 Why Does It Work?
528(7)
13.9 Bibliographic Notes
535(6)
Problems
536(5)
14 Dynamic Resource Allocation Problems
541(52)
14.1 An Asset Acquisition Problem
541(6)
14.2 The Blood Management Problem
547(10)
14.3 A Portfolio Optimization Problem
557(3)
14.4 A General Resource Allocation Problem
560(13)
14.5 A Fleet Management Problem
573(7)
14.6 A Driver Management Problem
580(5)
14.7 Bibliographic Notes
585(8)
Problems
586(7)
15 Implementation Challenges
593(14)
15.1 Will ADP Work for Your Problem?
593(1)
15.2 Designing an ADP Algorithm for Complex Problems
594(2)
15.3 Debugging an ADP Algorithm
596(1)
15.4 Practical Issues
597(5)
15.5 Modeling Your Problem
602(2)
15.6 Online versus Offline Models
604(2)
15.7 If It Works, Patent It!
606(1)
Bibliography 607(16)
Index 623
WARREN B. POWELL, PhD, is Professor of Operations Research and Financial Engineering at Princeton University, where he is founder and Director of CASTLE Laboratory, a research unit that works with industrial partners to test new ideas found in operations research. The recipient of the 2004 INFORMS Fellow Award, Dr. Powell has authored more than 160 published articles on stochastic optimization, approximate dynamicprogramming, and dynamic resource management.