|
|
1 | (2) |
Part I Mathematical background |
|
3 | (186) |
|
2 Real analysis and linear algebra |
|
|
5 | (22) |
|
2.1 Definitions and notation |
|
|
5 | (11) |
|
2.1.1 Numbers, sets and vectors |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
6 | (1) |
|
2.1.4 The supremum and infimum |
|
|
7 | (1) |
|
|
7 | (1) |
|
|
7 | (1) |
|
2.1.7 Sequences and limits |
|
|
8 | (1) |
|
|
9 | (2) |
|
|
11 | (1) |
|
2.1.10 Classes of real valued functions |
|
|
11 | (1) |
|
|
12 | (1) |
|
2.1.12 The binomial coefficient |
|
|
13 | (1) |
|
2.1.13 Stirling's approximation of the factorial |
|
|
13 | (1) |
|
|
14 | (1) |
|
|
14 | (1) |
|
|
14 | (1) |
|
|
15 | (1) |
|
2.2 Equivalence relationships |
|
|
16 | (1) |
|
|
16 | (11) |
|
|
16 | (2) |
|
2.3.2 Eigenvalues and spectral decomposition |
|
|
18 | (3) |
|
2.3.3 Symmetric, Hermitian and positive definite matrices |
|
|
21 | (1) |
|
|
22 | (2) |
|
2.3.5 Stochastic matrices |
|
|
24 | (1) |
|
2.3.6 Nonnegative matrices and graph structure |
|
|
25 | (2) |
|
3 Background - measure theory |
|
|
27 | (24) |
|
|
27 | (3) |
|
3.1.1 Bases of topologies |
|
|
29 | (1) |
|
3.1.2 Metric space topologies |
|
|
29 | (1) |
|
|
30 | (11) |
|
3.2.1 Formal construction of measures |
|
|
31 | (2) |
|
3.2.2 Completion of measures |
|
|
33 | (1) |
|
|
34 | (1) |
|
3.2.4 Extension of measures |
|
|
35 | (1) |
|
|
36 | (1) |
|
|
36 | (1) |
|
|
36 | (1) |
|
3.2.8 Dynkin system theorem |
|
|
37 | (1) |
|
|
37 | (1) |
|
3.2.10 Decomposition of measures |
|
|
38 | (1) |
|
3.2.11 Measurable functions |
|
|
39 | (2) |
|
|
41 | (3) |
|
3.3.1 Convergence of integrals |
|
|
42 | (1) |
|
|
43 | (1) |
|
3.3.3 Radon-Nikodym derivative |
|
|
43 | (1) |
|
|
44 | (7) |
|
|
44 | (1) |
|
|
45 | (4) |
|
3.4.3 The Kolmogorov extension theorem |
|
|
49 | (2) |
|
4 Background - probability theory |
|
|
51 | (46) |
|
4.1 Probability measures - basic properties |
|
|
52 | (7) |
|
4.2 Moment generating functions (MGF) and cumulant generating functions (CGF) |
|
|
59 | (4) |
|
4.2.1 Moments and cumulants |
|
|
61 | (1) |
|
4.2.2 MGF and CGF of independent sums |
|
|
62 | (1) |
|
4.2.3 Relationship of the CGF to the normal distribution |
|
|
62 | (1) |
|
4.2.4 Probability generating functions |
|
|
63 | (1) |
|
4.3 Conditional distributions |
|
|
63 | (3) |
|
|
66 | (2) |
|
|
68 | (1) |
|
4.5 Some important theorems |
|
|
68 | (2) |
|
4.6 Inequalities for tail probabilities |
|
|
70 | (4) |
|
|
71 | (1) |
|
4.6.2 Chernoff bound for the normal distribution |
|
|
71 | (1) |
|
4.6.3 Chernoff bound for the gamma distribution |
|
|
72 | (1) |
|
|
72 | (1) |
|
4.6.5 Some inequalities for bounded random variables |
|
|
73 | (1) |
|
|
74 | (3) |
|
4.7.1 MGF ordering of the gamma and exponential distribution |
|
|
75 | (1) |
|
4.7.2 Improved bounds based on hazard functions |
|
|
76 | (1) |
|
4.8 Theory of stochastic limits |
|
|
77 | (5) |
|
4.8.1 Covergence of random variables |
|
|
77 | (1) |
|
4.8.2 Convergence of measures |
|
|
78 | (1) |
|
4.8.3 Total variation norm |
|
|
79 | (3) |
|
|
82 | (3) |
|
4.9.1 Measurability of measure kernels |
|
|
83 | (1) |
|
4.9.2 Continuity of measure kernels |
|
|
84 | (1) |
|
|
85 | (1) |
|
4.11 The law of large numbers |
|
|
86 | (2) |
|
4.12 Extreme value theory |
|
|
88 | (1) |
|
4.13 Maximum likelihood estimation |
|
|
89 | (2) |
|
4.14 Nonparametric estimates of distributions |
|
|
91 | (1) |
|
4.15 Total variation distance for discrete distributions |
|
|
92 | (5) |
|
5 Background - stochastic processes |
|
|
97 | (28) |
|
|
97 | (3) |
|
|
98 | (1) |
|
|
99 | (1) |
|
|
100 | (11) |
|
5.2.1 Discrete state spaces |
|
|
101 | (1) |
|
5.2.2 Global properties of Markov chains |
|
|
102 | (4) |
|
5.2.3 General state spaces |
|
|
106 | (3) |
|
5.2.4 Geometric ergodicity |
|
|
109 | (2) |
|
5.2.5 Spectral properties of Markov chains |
|
|
111 | (1) |
|
5.3 Continuous-time Markov chains |
|
|
111 | (3) |
|
5.3.1 Birth and death processes |
|
|
114 | (1) |
|
|
114 | (4) |
|
5.4.1 Queueing systems as birth and death processes |
|
|
115 | (1) |
|
|
116 | (1) |
|
5.4.3 General queueing systems and embedded Markov chains |
|
|
116 | (2) |
|
5.5 Adapted counting processes |
|
|
118 | (7) |
|
5.5.1 Asymptotic behavior |
|
|
119 | (4) |
|
5.5.2 Relationship to adapted events |
|
|
123 | (2) |
|
|
125 | (32) |
|
|
126 | (2) |
|
6.1.1 Contractive mappings |
|
|
126 | (2) |
|
6.2 The Banach fixed point theorem |
|
|
128 | (3) |
|
6.2.1 Stopping rules for fixed point algorithms |
|
|
130 | (1) |
|
|
131 | (3) |
|
|
133 | (1) |
|
6.3.2 Basis of a vector space |
|
|
133 | (1) |
|
|
133 | (1) |
|
|
134 | (5) |
|
6.4.1 Banach spaces and completeness |
|
|
136 | (1) |
|
|
137 | (2) |
|
6.5 Norms and norm equivalence |
|
|
139 | (3) |
|
|
140 | (1) |
|
6.5.2 Equivalence properties of norm equivalence classes |
|
|
140 | (2) |
|
6.6 Quotient spaces and seminorms |
|
|
142 | (2) |
|
|
144 | (2) |
|
6.8 Examples of Banach spaces |
|
|
146 | (7) |
|
6.8.1 Finite dimensional spaces |
|
|
146 | (1) |
|
6.8.2 Matrix norms and the submultiplicative property |
|
|
147 | (1) |
|
6.8.3 Weighted norms on function spaces |
|
|
147 | (2) |
|
|
149 | (2) |
|
6.8.5 Operators on span quotient spaces |
|
|
151 | (2) |
|
6.9 Measure kernels as linear operators |
|
|
153 | (4) |
|
6.9.1 The contraction property of stochastic kernels |
|
|
153 | (1) |
|
6.9.2 Stochastic kernels and the span seminorm |
|
|
153 | (4) |
|
|
157 | (16) |
|
7.1 Contraction as a norm equivalence property |
|
|
158 | (2) |
|
7.2 Linear fixed point equations |
|
|
160 | (1) |
|
7.3 The geometric series theorem |
|
|
161 | (2) |
|
7.4 Invariant transformations of fixed point equations |
|
|
163 | (1) |
|
7.5 Fixed point algorithms and the span seminorm |
|
|
164 | (4) |
|
7.5.1 Approximations in the span seminorm |
|
|
166 | (1) |
|
7.5.2 Magnitude of fixed points in the span seminorm |
|
|
167 | (1) |
|
7.6 Stopping rules for fixed point algorithms |
|
|
168 | (1) |
|
7.6.1 Fixed point iteration in the span seminorm |
|
|
169 | (1) |
|
7.7 Perturbations of fixed point equations |
|
|
169 | (4) |
|
8 The distribution of a maximum |
|
|
173 | (16) |
|
|
174 | (1) |
|
8.2 Bounds on M based on MGFs |
|
|
174 | (4) |
|
|
176 | (1) |
|
|
177 | (1) |
|
8.3 Bounds for varying marginal distributions |
|
|
178 | (3) |
|
|
180 | (1) |
|
8.4 Tail probabilities of maxima |
|
|
181 | (4) |
|
8.4.1 Extreme value distributions |
|
|
182 | (1) |
|
8.4.2 Tail probabilities based on Boole's inequality |
|
|
182 | (1) |
|
|
183 | (1) |
|
8.4.4 The gamma(α, λ) case |
|
|
184 | (1) |
|
8.5 Variance mixtures based on random sample sizes |
|
|
185 | (1) |
|
8.6 Bounds for maxima based on the first two moments |
|
|
186 | (5) |
|
|
188 | (1) |
Part II General theory of approximate iterative algorithms |
|
189 | (58) |
|
9 Background - linear convergence |
|
|
191 | (8) |
|
|
191 | (3) |
|
9.2 Construction of envelopes - the nonstochastic case |
|
|
194 | (2) |
|
9.3 Construction of envelopes - the stochastic case |
|
|
196 | (1) |
|
9.4 A version of l'Hopital's rule for series |
|
|
196 | (3) |
|
10 A general theory of approximate iterative algorithms (AIA) |
|
|
199 | (40) |
|
10.1 A general tolerance model |
|
|
201 | (1) |
|
10.2 Example: a preliminary model |
|
|
201 | (1) |
|
10.3 Model elements of an AIA |
|
|
202 | (2) |
|
|
202 | (1) |
|
10.3.2 Lipschitz convolutions |
|
|
203 | (1) |
|
10.4 A classification system for AIAs |
|
|
204 | (4) |
|
10.4.1 Relative error model |
|
|
206 | (2) |
|
10.5 General inequalities |
|
|
208 | (5) |
|
10.5.1 Hilbert space models of AIAs |
|
|
209 | (4) |
|
10.6 Nonexpansive operators |
|
|
213 | (7) |
|
10.6.1 Application of general inequalities to nonexpansive AIAs |
|
|
214 | (2) |
|
10.6.2 Weakly contractive AIAs |
|
|
216 | (1) |
|
|
216 | (2) |
|
10.6.4 Stochastic approximation (Robbins-Monro algorithm) |
|
|
218 | (2) |
|
10.7 Rates of convergence for AIAs |
|
|
220 | (10) |
|
10.7.1 Monotonicity of the Lipschitz kernel |
|
|
221 | (1) |
|
10.7.2 Case I - strongly contractive models with nonvanishing bounds |
|
|
221 | (1) |
|
10.7.3 Case II - rapidly vanishing approximation error |
|
|
222 | (1) |
|
10.7.4 Case III - approximation error decreasing at contraction rate |
|
|
223 | (1) |
|
10.7.5 Case IV - Approximation error greater than contraction rate |
|
|
224 | (1) |
|
10.7.6 Case V - Contraction rates approaching 1 |
|
|
224 | (3) |
|
10.7.7 Adjustments for relative error models |
|
|
227 | (1) |
|
10.7.8 A comparison of Banach space and Hilbert space models |
|
|
228 | (2) |
|
10.8 Stochastic approximation as a weakly contractive algorithm |
|
|
230 | (1) |
|
10.9 Tightness of algorithm tolerance |
|
|
231 | (1) |
|
|
232 | (3) |
|
10.10.1 Numerical example |
|
|
233 | (2) |
|
10.11 Summary of convergence rates for strongly contractive models |
|
|
235 | (4) |
|
11 Selection of approximation schedules for coarse-to-fine AlAs |
|
|
239 | (8) |
|
11.1 Extending the tolerance model |
|
|
239 | (3) |
|
11.1.1 Comparison model for tolerance schedules |
|
|
241 | (1) |
|
11.1.2 Regularity conditions for the computation function |
|
|
242 | (1) |
|
|
242 | (1) |
|
11.3 Examples of cost functions |
|
|
243 | (2) |
|
11.4 A general principle for AIAs |
|
|
245 | (2) |
Part III Application to Markov decision processes |
|
247 | (104) |
|
12 Markov decision processes (MDP) - background |
|
|
249 | (30) |
|
|
250 | (3) |
|
12.2 The optimal control problem |
|
|
253 | (2) |
|
12.2.1 Adaptive control policies |
|
|
253 | (1) |
|
12.2.2 Optimal control policies |
|
|
253 | (2) |
|
12.3 Dynamic programming and linear operators |
|
|
255 | (6) |
|
12.3.1 The dynamic programming operator (DPO) |
|
|
256 | (1) |
|
12.3.2 Finite horizon dynamic programming |
|
|
257 | (1) |
|
12.3.3 Infinite horizon problem |
|
|
258 | (1) |
|
|
259 | (1) |
|
12.3.5 Measurability of the DPO |
|
|
260 | (1) |
|
12.4 Dynamic programming and value iteration |
|
|
261 | (4) |
|
12.4.1 Value iteration and optimality |
|
|
262 | (3) |
|
12.5 Regret and s-optimal solutions |
|
|
265 | (2) |
|
12.6 Banach space structure of dynamic programming |
|
|
267 | (7) |
|
12.6.1 The contraction property |
|
|
269 | (1) |
|
12.6.2 Contraction properties of the DPO |
|
|
269 | (3) |
|
12.6.3 The equivalence of uniform convergence and contraction for the DPO |
|
|
272 | (2) |
|
12.7 Average cost criterion for MDP |
|
|
274 | (5) |
|
13 Markov decision processes - value iteration |
|
|
279 | (16) |
|
13.1 Value iteration on quotient spaces |
|
|
279 | (2) |
|
13.2 Contraction in the span seminorm |
|
|
281 | (2) |
|
13.2.1 Contraction properties of the DPO |
|
|
281 | (2) |
|
13.3 Stopping rules for value iteration |
|
|
283 | (1) |
|
13.4 Value iteration in the span seminorm |
|
|
283 | (1) |
|
13.5 Example: M/D/1/K queueing system |
|
|
284 | (4) |
|
13.6 Efficient calculation of |||QP||| SP |
|
|
288 | (3) |
|
13.7 Example: M/D/1/K system with optimal control of service capacity |
|
|
291 | (1) |
|
|
292 | (1) |
|
13.9 Value iteration for the average cost optimization |
|
|
293 | (2) |
|
14 Model approximation in dynamic programming - general theory |
|
|
295 | (14) |
|
14.1 The general inequality for MDPs |
|
|
295 | (3) |
|
|
298 | (2) |
|
|
300 | (2) |
|
14.4 A comment on the approximation of regret |
|
|
302 | (2) |
|
|
304 | (5) |
|
15 Sampling based approximation methods |
|
|
309 | (18) |
|
|
310 | (10) |
|
15.1.1 Nonuniform sample allocation: Dependence on qmin, and the 'Curse of the Supremum Norm' |
|
|
313 | (1) |
|
15.1.2 Some queueing system examples |
|
|
314 | (1) |
|
15.1.3 Truncated geometric model |
|
|
315 | (1) |
|
15.1.4 M/G/1/K queueing model |
|
|
316 | (2) |
|
15.1.5 Restarting schemes |
|
|
318 | (2) |
|
15.2 Continuous state/action spaces |
|
|
320 | (1) |
|
15.3 Parametric estimation of MDP models |
|
|
321 | (6) |
|
16 Approximate value iteration by truncation |
|
|
327 | (6) |
|
16.1 Truncation algorithm |
|
|
328 | (1) |
|
16.2 Regularity conditions for tolerance-cost model |
|
|
329 | (1) |
|
16.2.1 Suboptimal orderings |
|
|
329 | (1) |
|
|
330 | (3) |
|
17 Grid approximations of MDPs with continuous state/action spaces |
|
|
333 | (8) |
|
17.1 Discretization methods |
|
|
333 | (2) |
|
|
335 | (1) |
|
17.3 Application of approximation schedules |
|
|
336 | (5) |
|
18 Adaptive control of MDPs |
|
|
341 | (10) |
|
18.1 Regret bounds for adaptive policies |
|
|
342 | (1) |
|
18.2 Definition of an adaptive MDP |
|
|
343 | (2) |
|
18.3 Online parameter estimation |
|
|
345 | (2) |
|
18.4 Exploration schedule |
|
|
347 | (4) |
Bibliography |
|
351 | (6) |
Subject index |
|
357 | |