Muutke küpsiste eelistusi

E-raamat: Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing

  • Formaat: EPUB+DRM
  • Ilmumisaeg: 21-Mar-2016
  • Kirjastus: Cambridge University Press
  • Keel: eng
  • ISBN-13: 9781316594360
Teised raamatud teemal:
  • Formaat - EPUB+DRM
  • Hind: 102,49 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: EPUB+DRM
  • Ilmumisaeg: 21-Mar-2016
  • Kirjastus: Cambridge University Press
  • Keel: eng
  • ISBN-13: 9781316594360
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Covering formulation, algorithms, and structural results, and linking theory to real-world applications in controlled sensing (including social learning, adaptive radars and sequential detection), this book focuses on the conceptual foundations of partially observed Markov decision processes (POMDPs). It emphasizes structural results in stochastic dynamic programming, enabling graduate students and researchers in engineering, operations research, and economics to understand the underlying unifying themes without getting weighed down by mathematical technicalities. Bringing together research from across the literature, the book provides an introduction to nonlinear filtering followed by a systematic development of stochastic dynamic programming, lattice programming and reinforcement learning for POMDPs. Questions addressed in the book include: when does a POMDP have a threshold optimal policy? When are myopic policies optimal? How do local and global decision makers interact in adaptive decision making in multi-agent social learning where there is herding and data incest? And how can sophisticated radars and sensors adapt their sensing in real time?

This book covers formulation, algorithms, and structural results of partially observed Markov decision processes, whilst linking theory to real-world applications in controlled sensing. Computations are kept to a minimum, enabling students and researchers in engineering, operations research, and economics to understand the methods and determine the structure of their optimal solution.

Muu info

This book covers formulation, algorithms, and structural results of partially observed Markov decision processes, linking theory to real-world applications in controlled sensing.
Preface xi
1 Introduction
1(8)
1.1 Part I: Stochastic models and Bayesian filtering
1(1)
1.2 Part II: POMDPs: models and algorithms
2(2)
1.3 Part III: POMDPs: structural results
4(1)
1.4 Part IV: Stochastic approximation and reinforcement learning
5(1)
1.5 Examples of controlled (active) sensing
6(3)
Part I Stochastic models and Bayesian filtering 9(110)
2 Stochastic state space models
11(23)
2.1 Stochastic state space model
11(4)
2.2 Optimal prediction: Chapman-Kolmogorov equation
15(1)
2.3 Example 1: Linear Gaussian state space model
16(3)
2.4 Example 2: Finite-state hidden Markov model (HMM)
19(3)
2.5 Example 3: Jump Markov linear systems (JMLS)
22(2)
2.6 Modeling moving and maneuvering targets
24(2)
2.7 Geometric ergodicity of HMM predictor: Dobrushin's coefficient
26(5)
2.8 Complements and sources
31(1)
Appendix to
Chapter 2
32(2)
2.A Proof of Theorem 2.7.2 and Lemma 2.7.3
32(2)
3 Optimal filtering
34(39)
3.1 Optimal state estimation
35(1)
3.2 Conditional expectation and Bregman loss functions
36(4)
3.3 Optimal filtering, prediction and smoothing formulas
40(4)
3.4 The Kalman filter
44(6)
3.5 Hidden Markov model (HMM) filter
50(3)
3.6 Examples: Markov modulated time series, out of sequence measurements and reciprocal processes
53(4)
3.7 Geometric ergodicity of HMM filter
57(2)
3.8 Suboptimal filters
59(2)
3.9 Particle filter
61(6)
3.10 Example: Jump Markov linear systems (JMLS)
67(2)
3.11 Complements and sources
69(1)
Appendices to
Chapter 3
70(3)
3.A Proof of Lemma 3.7.2
70(1)
3.B Proof of Lemma 3.7.3
71(1)
3.C Proof of Theorem 3.7.5
71(2)
4 Algorithms for maximum likelihood parameter estimation
73(20)
4.1 Maximum likelihood estimation criterion
73(2)
4.2 MLE of partially observed models
75(4)
4.3 Expectation Maximization (EM) algorithm
79(6)
4.4 Forward-only filter-based EM algorithms
85(3)
4.5 Method of moments estimator for HMMs
88(1)
4.6 Complements and sources
89(4)
5 Multi-agent sensing: social learning and data incest
93(26)
5.1 Social sensing
93(3)
5.2 Multi-agent social learning
96(2)
5.3 Information cascades and constrained social learning
98(3)
5.4 Data incest in online reputation systems
101(3)
5.5 Fair online reputation system
104(6)
5.6 Belief-based expectation polling
110(4)
5.7 Complements and sources
114(1)
Appendix to
Chapter 5
115(6)
5.A Proofs
115(4)
Part II Partially observed Markov decision processes: models and applications 119(82)
6 Fully observed Markov decision processes
121(26)
6.1 Finite state finite horizon MDP
121(3)
6.2 Bellman's stochastic dynamic programming algorithm
124(3)
6.3 Continuous-state MDP
127(1)
6.4 Infinite horizon discounted cost
128(4)
6.5 Infinite horizon average cost
132(4)
6.6 Average cost constrained Markov decision process
136(3)
6.7 Inverse optimal control and revealed preferences
139(3)
6.8 Complements and sources
142(1)
Appendices to
Chapter 6
143(4)
6.A Proof of Theorem 6.2.2
143(1)
6.B Proof of Theorems 6.5.3 and 6.5.4
144(3)
7 Partially observed Markov decision processes (POMDPs)
147(32)
7.1 Finite horizon POMDP
147(3)
7.2 Belief state formulation and dynamic programming
150(3)
7.3 Machine replacement POMDP: toy example
153(1)
7.4 Finite dimensional controller for finite horizon POMDP
154(3)
7.5 Algorithms for finite horizon POMDPs with finite observation space
157(6)
7.6 Discounted infinite horizon POMDPs
163(6)
7.7 Example: Optimal search for a Markovian moving target
169(8)
7.8 Complements and sources
177(2)
8 POMDPs in controlled sensing and sensor scheduling
179(22)
8.1 Introduction
179(1)
8.2 State and sensor control for state space models
180(2)
8.3 Example 1: Linear Gaussian control and controlled radars
182(3)
8.4 Example 2: POMDPs in controlled sensing
185(6)
8.5 Example 3: Multi-agent controlled sensing with social learning
191(4)
8.6 Risk-averse MDPs and POMDPs
195(4)
8.7 Notes and sources
199(1)
Appendix to
Chapter 8
199(4)
8.A Proof of theorems
199(2)
Part III Partially observed Markov decision processes: structural results 201(140)
9 Structural results for Markov decision processes
203(16)
9.1 Submodularity and supermodularity
204(3)
9.2 First-order stochastic dominance
207(1)
9.3 Monotone optimal policies for MDPs
208(2)
9.4 How does the optimal cost depend on the transition matrix?
210(1)
9.5 Algorithms for monotone policies - exploiting sparsity
211(2)
9.6 Example: Transmission scheduling over wireless channel
213(3)
9.7 Complements and sources
216(1)
Appendix to
Chapter 9
216(3)
9.A Proofs of theorems
216(3)
10 Structural results for optimal filters
219(22)
10.1 Monotone likelihood ratio (MLR) stochastic order
220(3)
10.2 Total positivity and copositivity
223(1)
10.3 Monotone properties of optimal filter
224(2)
10.4 Illustrative example
226(1)
10.5 Discussion and examples of Assumptions (F1)-(F4)
227(2)
10.6 Example: Reduced complexity HMM filtering with stochastic dominance bounds
229(9)
10.7 Complements and sources
238(1)
Appendix to
Chapter 10
238(3)
10.A Proofs
238(3)
11 Monotonicity of value function for POMDPs
241(14)
11.1 Model and assumptions
242(2)
11.2 Main result: monotone value function
244(1)
11.3 Example 1: Monotone policies for 2-state POMDPs
245(2)
11.4 Example 2: POMDP multi-armed bandits structural results
247(4)
11.5 Complements and sources
251(1)
Appendix to
Chapter 11
251(4)
11.A Proof of Theorem 11.3.1
251(4)
12 Structural results for stopping time POMDPs
255(29)
12.1 Introduction
255(2)
12.2 Stopping time POMDP and convexity of stopping set
257(4)
12.3 Monotone optimal policy for stopping time POMDP
261(6)
12.4 Characterization of optimal linear decision threshold for stopping time POMDP
267(4)
12.5 Example: Machine replacement POMDP
271(1)
12.6 Multivariate stopping time POMDPs
271(2)
12.7 Radar scheduling with mutual information cost
273(5)
12.8 Complements and sources
278(1)
Appendices to
Chapter 12
279(5)
12.A Lattices and submodularity
279(1)
12.B MLR dominance and submodularity on lines
279(1)
12.C Proof of Theorem 12.3.4
280(4)
13 Stopping time POMDPs for quickest change detection
284(28)
13.1 Example 1: Quickest detection with phase-distributed change time and variance penalty
285(5)
13.2 Example 2: Quickest transient detection
290(1)
13.3 Example 3: Risk-sensitive quickest detection with exponential delay penalty
291(3)
13.4 Examples 4, 5 and 6: Stopping time POMDPs in multi-agent social learning
294(10)
13.5 Example 7: Quickest detection with controlled sampling
304(5)
13.6 Complements and sources
309(1)
Appendix to
Chapter 13
310(2)
13.A Proof of Theorem 13.4.1
310(2)
14 Myopic policy bounds for POMDPs and sensitivity to model parameters
312(29)
14.1 The partially observed Markov decision process
313(1)
14.2 Myopic policies using copositive dominance: insight
314(2)
14.3 Constructing myopic policy bounds for optimal policy using copositive dominance
316(2)
14.4 Optimizing the myopic policy bounds to match the optimal policy
318(2)
14.5 Controlled sensing POMDPs with quadratic costs
320(1)
14.6 Numerical examples
321(3)
14.7 Blackwell dominance of observation distributions and optimality of myopic policies
324(5)
14.8 Ordinal sensitivity: how does optimal POMDP cost vary with state and observation dynamics?
329(2)
14.9 Cardinal sensitivity of POMDP
331(1)
14.10 Complements and sources
332(1)
Appendices to
Chapter 14
332(11)
14.A POMDP numerical examples
332(4)
14.B Proof of Theorem 14.8.1
336(1)
14.C Proof of Theorem 14.9.1
337(4)
Part IV Stochastic approximation and reinforcement learning 341(87)
15 Stochastic optimization and gradient estimation
343(21)
15.1 Stochastic gradient algorithm
344(3)
15.2 How long to simulate a Markov chain?
347(1)
15.3 Gradient estimators for Markov processes
348(1)
15.4 Finite difference gradient estimators and SPSA
349(1)
15.5 Score function gradient estimator
350(2)
15.6 Weak derivative gradient estimator
352(3)
15.7 Bias and variance of gradient estimators
355(2)
15.8 Complements and sources
357(1)
Appendices to
Chapter 15
358(6)
15.A Proof of Theorem 15.2.1
358(2)
15.B Proof of Theorem 15.7.1
360(4)
16 Reinforcement learning
364(16)
16.1 Q-learning algorithm
365(3)
16.2 Policy gradient reinforcement learning for MDP
368(3)
16.3 Score function policy gradient algorithm for MDP
371(1)
16.4 Weak derivative gradient estimator for MDP
372(2)
16.5 Numerical comparison of gradient estimators
374(1)
16.6 Policy gradient reinforcement learning for constrained MDP (CMDP)
375(2)
16.7 Policy gradient algorithm for POMDPs
377(2)
16.8 Complements and sources
379(1)
17 Stochastic approximation algorithms: examples
380(45)
17.1 A primer on stochastic approximation algorithms
381(5)
17.2 Example 1: Recursive maximum likelihood parameter estimation of HMMs
386(3)
17.3 Example 2: HMM state estimation via LMS algorithm
389(7)
17.4 Example 3: Discrete stochastic optimization for policy search
396(11)
17.5 Example 4: Mean field population dynamics models for social sensing
407(6)
17.6 Complements and sources
413(4)
Appendices to
Chapter 17
417(8)
17.A Proof of Theorem 17.3.1
417(3)
17.B Proof of Theorem 17.4.1
420(1)
17.C Proof of Theorem 17.4.2
421(1)
17.D Proof of Theorem 17.5.2
422(3)
18 Summary of algorithms for solving POMDPs
425(3)
Appendix A Short primer on stochastic simulation 428(14)
Appendix B Continuous-time HMM filters 442(7)
Appendix C Markov processes 449(2)
Appendix D Some limit theorems 451(4)
References 455(16)
Index 471
Vikram Krishnamurthy is a Professor and Canada Research Chair in Statistical Signal Processing at the University of British Columbia, Vancouver. His research contributions focus on nonlinear filtering, stochastic approximation algorithms and POMDPs. Dr Krishnamurthy is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and served as a distinguished lecturer for the IEEE Signal Processing Society. In 2013, he received an honorary doctorate from KTH, Royal Institute of Technology, Sweden.