Preface |
|
xi | |
|
|
1 | (8) |
|
1.1 Part I: Stochastic models and Bayesian filtering |
|
|
1 | (1) |
|
1.2 Part II: POMDPs: models and algorithms |
|
|
2 | (2) |
|
1.3 Part III: POMDPs: structural results |
|
|
4 | (1) |
|
1.4 Part IV: Stochastic approximation and reinforcement learning |
|
|
5 | (1) |
|
1.5 Examples of controlled (active) sensing |
|
|
6 | (3) |
Part I Stochastic models and Bayesian filtering |
|
9 | (110) |
|
2 Stochastic state space models |
|
|
11 | (23) |
|
2.1 Stochastic state space model |
|
|
11 | (4) |
|
2.2 Optimal prediction: Chapman-Kolmogorov equation |
|
|
15 | (1) |
|
2.3 Example 1: Linear Gaussian state space model |
|
|
16 | (3) |
|
2.4 Example 2: Finite-state hidden Markov model (HMM) |
|
|
19 | (3) |
|
2.5 Example 3: Jump Markov linear systems (JMLS) |
|
|
22 | (2) |
|
2.6 Modeling moving and maneuvering targets |
|
|
24 | (2) |
|
2.7 Geometric ergodicity of HMM predictor: Dobrushin's coefficient |
|
|
26 | (5) |
|
2.8 Complements and sources |
|
|
31 | (1) |
|
|
32 | (2) |
|
2.A Proof of Theorem 2.7.2 and Lemma 2.7.3 |
|
|
32 | (2) |
|
|
34 | (39) |
|
3.1 Optimal state estimation |
|
|
35 | (1) |
|
3.2 Conditional expectation and Bregman loss functions |
|
|
36 | (4) |
|
3.3 Optimal filtering, prediction and smoothing formulas |
|
|
40 | (4) |
|
|
44 | (6) |
|
3.5 Hidden Markov model (HMM) filter |
|
|
50 | (3) |
|
3.6 Examples: Markov modulated time series, out of sequence measurements and reciprocal processes |
|
|
53 | (4) |
|
3.7 Geometric ergodicity of HMM filter |
|
|
57 | (2) |
|
|
59 | (2) |
|
|
61 | (6) |
|
3.10 Example: Jump Markov linear systems (JMLS) |
|
|
67 | (2) |
|
3.11 Complements and sources |
|
|
69 | (1) |
|
|
70 | (3) |
|
|
70 | (1) |
|
|
71 | (1) |
|
3.C Proof of Theorem 3.7.5 |
|
|
71 | (2) |
|
4 Algorithms for maximum likelihood parameter estimation |
|
|
73 | (20) |
|
4.1 Maximum likelihood estimation criterion |
|
|
73 | (2) |
|
4.2 MLE of partially observed models |
|
|
75 | (4) |
|
4.3 Expectation Maximization (EM) algorithm |
|
|
79 | (6) |
|
4.4 Forward-only filter-based EM algorithms |
|
|
85 | (3) |
|
4.5 Method of moments estimator for HMMs |
|
|
88 | (1) |
|
4.6 Complements and sources |
|
|
89 | (4) |
|
5 Multi-agent sensing: social learning and data incest |
|
|
93 | (26) |
|
|
93 | (3) |
|
5.2 Multi-agent social learning |
|
|
96 | (2) |
|
5.3 Information cascades and constrained social learning |
|
|
98 | (3) |
|
5.4 Data incest in online reputation systems |
|
|
101 | (3) |
|
5.5 Fair online reputation system |
|
|
104 | (6) |
|
5.6 Belief-based expectation polling |
|
|
110 | (4) |
|
5.7 Complements and sources |
|
|
114 | (1) |
|
|
115 | (6) |
|
|
115 | (4) |
Part II Partially observed Markov decision processes: models and applications |
|
119 | (82) |
|
6 Fully observed Markov decision processes |
|
|
121 | (26) |
|
6.1 Finite state finite horizon MDP |
|
|
121 | (3) |
|
6.2 Bellman's stochastic dynamic programming algorithm |
|
|
124 | (3) |
|
|
127 | (1) |
|
6.4 Infinite horizon discounted cost |
|
|
128 | (4) |
|
6.5 Infinite horizon average cost |
|
|
132 | (4) |
|
6.6 Average cost constrained Markov decision process |
|
|
136 | (3) |
|
6.7 Inverse optimal control and revealed preferences |
|
|
139 | (3) |
|
6.8 Complements and sources |
|
|
142 | (1) |
|
|
143 | (4) |
|
6.A Proof of Theorem 6.2.2 |
|
|
143 | (1) |
|
6.B Proof of Theorems 6.5.3 and 6.5.4 |
|
|
144 | (3) |
|
7 Partially observed Markov decision processes (POMDPs) |
|
|
147 | (32) |
|
|
147 | (3) |
|
7.2 Belief state formulation and dynamic programming |
|
|
150 | (3) |
|
7.3 Machine replacement POMDP: toy example |
|
|
153 | (1) |
|
7.4 Finite dimensional controller for finite horizon POMDP |
|
|
154 | (3) |
|
7.5 Algorithms for finite horizon POMDPs with finite observation space |
|
|
157 | (6) |
|
7.6 Discounted infinite horizon POMDPs |
|
|
163 | (6) |
|
7.7 Example: Optimal search for a Markovian moving target |
|
|
169 | (8) |
|
7.8 Complements and sources |
|
|
177 | (2) |
|
8 POMDPs in controlled sensing and sensor scheduling |
|
|
179 | (22) |
|
|
179 | (1) |
|
8.2 State and sensor control for state space models |
|
|
180 | (2) |
|
8.3 Example 1: Linear Gaussian control and controlled radars |
|
|
182 | (3) |
|
8.4 Example 2: POMDPs in controlled sensing |
|
|
185 | (6) |
|
8.5 Example 3: Multi-agent controlled sensing with social learning |
|
|
191 | (4) |
|
8.6 Risk-averse MDPs and POMDPs |
|
|
195 | (4) |
|
|
199 | (1) |
|
|
199 | (4) |
|
|
199 | (2) |
Part III Partially observed Markov decision processes: structural results |
|
201 | (140) |
|
9 Structural results for Markov decision processes |
|
|
203 | (16) |
|
9.1 Submodularity and supermodularity |
|
|
204 | (3) |
|
9.2 First-order stochastic dominance |
|
|
207 | (1) |
|
9.3 Monotone optimal policies for MDPs |
|
|
208 | (2) |
|
9.4 How does the optimal cost depend on the transition matrix? |
|
|
210 | (1) |
|
9.5 Algorithms for monotone policies - exploiting sparsity |
|
|
211 | (2) |
|
9.6 Example: Transmission scheduling over wireless channel |
|
|
213 | (3) |
|
9.7 Complements and sources |
|
|
216 | (1) |
|
|
216 | (3) |
|
|
216 | (3) |
|
10 Structural results for optimal filters |
|
|
219 | (22) |
|
10.1 Monotone likelihood ratio (MLR) stochastic order |
|
|
220 | (3) |
|
10.2 Total positivity and copositivity |
|
|
223 | (1) |
|
10.3 Monotone properties of optimal filter |
|
|
224 | (2) |
|
10.4 Illustrative example |
|
|
226 | (1) |
|
10.5 Discussion and examples of Assumptions (F1)-(F4) |
|
|
227 | (2) |
|
10.6 Example: Reduced complexity HMM filtering with stochastic dominance bounds |
|
|
229 | (9) |
|
10.7 Complements and sources |
|
|
238 | (1) |
|
|
238 | (3) |
|
|
238 | (3) |
|
11 Monotonicity of value function for POMDPs |
|
|
241 | (14) |
|
11.1 Model and assumptions |
|
|
242 | (2) |
|
11.2 Main result: monotone value function |
|
|
244 | (1) |
|
11.3 Example 1: Monotone policies for 2-state POMDPs |
|
|
245 | (2) |
|
11.4 Example 2: POMDP multi-armed bandits structural results |
|
|
247 | (4) |
|
11.5 Complements and sources |
|
|
251 | (1) |
|
|
251 | (4) |
|
11.A Proof of Theorem 11.3.1 |
|
|
251 | (4) |
|
12 Structural results for stopping time POMDPs |
|
|
255 | (29) |
|
|
255 | (2) |
|
12.2 Stopping time POMDP and convexity of stopping set |
|
|
257 | (4) |
|
12.3 Monotone optimal policy for stopping time POMDP |
|
|
261 | (6) |
|
12.4 Characterization of optimal linear decision threshold for stopping time POMDP |
|
|
267 | (4) |
|
12.5 Example: Machine replacement POMDP |
|
|
271 | (1) |
|
12.6 Multivariate stopping time POMDPs |
|
|
271 | (2) |
|
12.7 Radar scheduling with mutual information cost |
|
|
273 | (5) |
|
12.8 Complements and sources |
|
|
278 | (1) |
|
|
279 | (5) |
|
12.A Lattices and submodularity |
|
|
279 | (1) |
|
12.B MLR dominance and submodularity on lines |
|
|
279 | (1) |
|
12.C Proof of Theorem 12.3.4 |
|
|
280 | (4) |
|
13 Stopping time POMDPs for quickest change detection |
|
|
284 | (28) |
|
13.1 Example 1: Quickest detection with phase-distributed change time and variance penalty |
|
|
285 | (5) |
|
13.2 Example 2: Quickest transient detection |
|
|
290 | (1) |
|
13.3 Example 3: Risk-sensitive quickest detection with exponential delay penalty |
|
|
291 | (3) |
|
13.4 Examples 4, 5 and 6: Stopping time POMDPs in multi-agent social learning |
|
|
294 | (10) |
|
13.5 Example 7: Quickest detection with controlled sampling |
|
|
304 | (5) |
|
13.6 Complements and sources |
|
|
309 | (1) |
|
|
310 | (2) |
|
13.A Proof of Theorem 13.4.1 |
|
|
310 | (2) |
|
14 Myopic policy bounds for POMDPs and sensitivity to model parameters |
|
|
312 | (29) |
|
14.1 The partially observed Markov decision process |
|
|
313 | (1) |
|
14.2 Myopic policies using copositive dominance: insight |
|
|
314 | (2) |
|
14.3 Constructing myopic policy bounds for optimal policy using copositive dominance |
|
|
316 | (2) |
|
14.4 Optimizing the myopic policy bounds to match the optimal policy |
|
|
318 | (2) |
|
14.5 Controlled sensing POMDPs with quadratic costs |
|
|
320 | (1) |
|
|
321 | (3) |
|
14.7 Blackwell dominance of observation distributions and optimality of myopic policies |
|
|
324 | (5) |
|
14.8 Ordinal sensitivity: how does optimal POMDP cost vary with state and observation dynamics? |
|
|
329 | (2) |
|
14.9 Cardinal sensitivity of POMDP |
|
|
331 | (1) |
|
14.10 Complements and sources |
|
|
332 | (1) |
|
|
332 | (11) |
|
14.A POMDP numerical examples |
|
|
332 | (4) |
|
14.B Proof of Theorem 14.8.1 |
|
|
336 | (1) |
|
14.C Proof of Theorem 14.9.1 |
|
|
337 | (4) |
Part IV Stochastic approximation and reinforcement learning |
|
341 | (87) |
|
15 Stochastic optimization and gradient estimation |
|
|
343 | (21) |
|
15.1 Stochastic gradient algorithm |
|
|
344 | (3) |
|
15.2 How long to simulate a Markov chain? |
|
|
347 | (1) |
|
15.3 Gradient estimators for Markov processes |
|
|
348 | (1) |
|
15.4 Finite difference gradient estimators and SPSA |
|
|
349 | (1) |
|
15.5 Score function gradient estimator |
|
|
350 | (2) |
|
15.6 Weak derivative gradient estimator |
|
|
352 | (3) |
|
15.7 Bias and variance of gradient estimators |
|
|
355 | (2) |
|
15.8 Complements and sources |
|
|
357 | (1) |
|
|
358 | (6) |
|
15.A Proof of Theorem 15.2.1 |
|
|
358 | (2) |
|
15.B Proof of Theorem 15.7.1 |
|
|
360 | (4) |
|
16 Reinforcement learning |
|
|
364 | (16) |
|
16.1 Q-learning algorithm |
|
|
365 | (3) |
|
16.2 Policy gradient reinforcement learning for MDP |
|
|
368 | (3) |
|
16.3 Score function policy gradient algorithm for MDP |
|
|
371 | (1) |
|
16.4 Weak derivative gradient estimator for MDP |
|
|
372 | (2) |
|
16.5 Numerical comparison of gradient estimators |
|
|
374 | (1) |
|
16.6 Policy gradient reinforcement learning for constrained MDP (CMDP) |
|
|
375 | (2) |
|
16.7 Policy gradient algorithm for POMDPs |
|
|
377 | (2) |
|
16.8 Complements and sources |
|
|
379 | (1) |
|
17 Stochastic approximation algorithms: examples |
|
|
380 | (45) |
|
17.1 A primer on stochastic approximation algorithms |
|
|
381 | (5) |
|
17.2 Example 1: Recursive maximum likelihood parameter estimation of HMMs |
|
|
386 | (3) |
|
17.3 Example 2: HMM state estimation via LMS algorithm |
|
|
389 | (7) |
|
17.4 Example 3: Discrete stochastic optimization for policy search |
|
|
396 | (11) |
|
17.5 Example 4: Mean field population dynamics models for social sensing |
|
|
407 | (6) |
|
17.6 Complements and sources |
|
|
413 | (4) |
|
|
417 | (8) |
|
17.A Proof of Theorem 17.3.1 |
|
|
417 | (3) |
|
17.B Proof of Theorem 17.4.1 |
|
|
420 | (1) |
|
17.C Proof of Theorem 17.4.2 |
|
|
421 | (1) |
|
17.D Proof of Theorem 17.5.2 |
|
|
422 | (3) |
|
18 Summary of algorithms for solving POMDPs |
|
|
425 | (3) |
Appendix A Short primer on stochastic simulation |
|
428 | (14) |
Appendix B Continuous-time HMM filters |
|
442 | (7) |
Appendix C Markov processes |
|
449 | (2) |
Appendix D Some limit theorems |
|
451 | (4) |
References |
|
455 | (16) |
Index |
|
471 | |