Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing

4.50/5 (4 hinnangut Goodreads-ist)

Vikram Krishnamurthy

Formaat: PDF+DRM
Ilmumisaeg: 21-Mar-2016
Kirjastus: Cambridge University Press
Keel: eng
ISBN-13: 9781316595343

Teised raamatud teemal:

Stochastics

Formaat - PDF+DRM
Hind: 102,49 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Ilmumisaeg: 21-Mar-2016
Kirjastus: Cambridge University Press
Keel: eng
ISBN-13: 9781316595343

Teised raamatud teemal:

Stochastics

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Covering formulation, algorithms, and structural results, and linking theory to real-world applications in controlled sensing (including social learning, adaptive radars and sequential detection), this book focuses on the conceptual foundations of partially observed Markov decision processes (POMDPs). It emphasizes structural results in stochastic dynamic programming, enabling graduate students and researchers in engineering, operations research, and economics to understand the underlying unifying themes without getting weighed down by mathematical technicalities. Bringing together research from across the literature, the book provides an introduction to nonlinear filtering followed by a systematic development of stochastic dynamic programming, lattice programming and reinforcement learning for POMDPs. Questions addressed in the book include: when does a POMDP have a threshold optimal policy? When are myopic policies optimal? How do local and global decision makers interact in adaptive decision making in multi-agent social learning where there is herding and data incest? And how can sophisticated radars and sensors adapt their sensing in real time?

This book covers formulation, algorithms, and structural results of partially observed Markov decision processes, whilst linking theory to real-world applications in controlled sensing. Computations are kept to a minimum, enabling students and researchers in engineering, operations research, and economics to understand the methods and determine the structure of their optimal solution.

Muu info

This book covers formulation, algorithms, and structural results of partially observed Markov decision processes, linking theory to real-world applications in controlled sensing.

Preface

1 Introduction

(8)

1.1 Part I: Stochastic models and Bayesian filtering

(1)

1.2 Part II: POMDPs: models and algorithms

(2)

1.3 Part III: POMDPs: structural results

(1)

1.4 Part IV: Stochastic approximation and reinforcement learning

(1)

1.5 Examples of controlled (active) sensing

(3)

Part I Stochastic models and Bayesian filtering

(110)

2 Stochastic state space models

(23)

2.1 Stochastic state space model

(4)

2.2 Optimal prediction: Chapman-Kolmogorov equation

(1)

2.3 Example 1: Linear Gaussian state space model

(3)

2.4 Example 2: Finite-state hidden Markov model (HMM)

(3)

2.5 Example 3: Jump Markov linear systems (JMLS)

(2)

2.6 Modeling moving and maneuvering targets

(2)

2.7 Geometric ergodicity of HMM predictor: Dobrushin's coefficient

(5)

2.8 Complements and sources

(1)

Appendix to
Chapter 2

(2)

2.A Proof of Theorem 2.7.2 and Lemma 2.7.3

(2)

3 Optimal filtering

(39)

3.1 Optimal state estimation

(1)

3.2 Conditional expectation and Bregman loss functions

(4)

3.3 Optimal filtering, prediction and smoothing formulas

(4)

3.4 The Kalman filter

(6)

3.5 Hidden Markov model (HMM) filter

(3)

3.6 Examples: Markov modulated time series, out of sequence measurements and reciprocal processes

(4)

3.7 Geometric ergodicity of HMM filter

(2)

3.8 Suboptimal filters

(2)

3.9 Particle filter

(6)

3.10 Example: Jump Markov linear systems (JMLS)

(2)

3.11 Complements and sources

(1)

Appendices to
Chapter 3

(3)

3.A Proof of Lemma 3.7.2

(1)

3.B Proof of Lemma 3.7.3

(1)

3.C Proof of Theorem 3.7.5

(2)

4 Algorithms for maximum likelihood parameter estimation

(20)

4.1 Maximum likelihood estimation criterion

(2)

4.2 MLE of partially observed models

(4)

4.3 Expectation Maximization (EM) algorithm

(6)

4.4 Forward-only filter-based EM algorithms

(3)

4.5 Method of moments estimator for HMMs

(1)

4.6 Complements and sources

(4)

5 Multi-agent sensing: social learning and data incest

(26)

5.1 Social sensing

(3)

5.2 Multi-agent social learning

(2)

5.3 Information cascades and constrained social learning

(3)

5.4 Data incest in online reputation systems

101

(3)

5.5 Fair online reputation system

104

(6)

5.6 Belief-based expectation polling

110

(4)

5.7 Complements and sources

114

(1)

Appendix to
Chapter 5

115

(6)

5.A Proofs

115

(4)

Part II Partially observed Markov decision processes: models and applications

119

(82)

6 Fully observed Markov decision processes

121

(26)

6.1 Finite state finite horizon MDP

121

(3)

6.2 Bellman's stochastic dynamic programming algorithm

124

(3)

6.3 Continuous-state MDP

127

(1)

6.4 Infinite horizon discounted cost

128

(4)

6.5 Infinite horizon average cost

132

(4)

6.6 Average cost constrained Markov decision process

136

(3)

6.7 Inverse optimal control and revealed preferences

139

(3)

6.8 Complements and sources

142

(1)

Appendices to
Chapter 6

143

(4)

6.A Proof of Theorem 6.2.2

143

(1)

6.B Proof of Theorems 6.5.3 and 6.5.4

144

(3)

7 Partially observed Markov decision processes (POMDPs)

147

(32)

7.1 Finite horizon POMDP

147

(3)

7.2 Belief state formulation and dynamic programming

150

(3)

7.3 Machine replacement POMDP: toy example

153

(1)

7.4 Finite dimensional controller for finite horizon POMDP

154

(3)

7.5 Algorithms for finite horizon POMDPs with finite observation space

157

(6)

7.6 Discounted infinite horizon POMDPs

163

(6)

7.7 Example: Optimal search for a Markovian moving target

169

(8)

7.8 Complements and sources

177

(2)

8 POMDPs in controlled sensing and sensor scheduling

179

(22)

8.1 Introduction

179

(1)

8.2 State and sensor control for state space models

180

(2)

8.3 Example 1: Linear Gaussian control and controlled radars

182

(3)

8.4 Example 2: POMDPs in controlled sensing

185

(6)

8.5 Example 3: Multi-agent controlled sensing with social learning

191

(4)

8.6 Risk-averse MDPs and POMDPs

195

(4)

8.7 Notes and sources

199

(1)

Appendix to
Chapter 8

199

(4)

8.A Proof of theorems

199

(2)

Part III Partially observed Markov decision processes: structural results

201

(140)

9 Structural results for Markov decision processes

203

(16)

9.1 Submodularity and supermodularity

204

(3)

9.2 First-order stochastic dominance

207

(1)

9.3 Monotone optimal policies for MDPs

208

(2)

9.4 How does the optimal cost depend on the transition matrix?

210

(1)

9.5 Algorithms for monotone policies - exploiting sparsity

211

(2)

9.6 Example: Transmission scheduling over wireless channel

213

(3)

9.7 Complements and sources

216

(1)

Appendix to
Chapter 9

216

(3)

9.A Proofs of theorems

216

(3)

10 Structural results for optimal filters

219

(22)

10.1 Monotone likelihood ratio (MLR) stochastic order

220

(3)

10.2 Total positivity and copositivity

223

(1)

10.3 Monotone properties of optimal filter

224

(2)

10.4 Illustrative example

226

(1)

10.5 Discussion and examples of Assumptions (F1)-(F4)

227

(2)

10.6 Example: Reduced complexity HMM filtering with stochastic dominance bounds

229

(9)

10.7 Complements and sources

238

(1)

Appendix to
Chapter 10

238

(3)

10.A Proofs

238

(3)

11 Monotonicity of value function for POMDPs

241

(14)

11.1 Model and assumptions

242

(2)

11.2 Main result: monotone value function

244

(1)

11.3 Example 1: Monotone policies for 2-state POMDPs

245

(2)

11.4 Example 2: POMDP multi-armed bandits structural results

247

(4)

11.5 Complements and sources

251

(1)

Appendix to
Chapter 11

251

(4)

11.A Proof of Theorem 11.3.1

251

(4)

12 Structural results for stopping time POMDPs

255

(29)

12.1 Introduction

255

(2)

12.2 Stopping time POMDP and convexity of stopping set

257

(4)

12.3 Monotone optimal policy for stopping time POMDP

261

(6)

12.4 Characterization of optimal linear decision threshold for stopping time POMDP

267

(4)

12.5 Example: Machine replacement POMDP

271

(1)

12.6 Multivariate stopping time POMDPs

271

(2)

12.7 Radar scheduling with mutual information cost

273

(5)

12.8 Complements and sources

278

(1)

Appendices to
Chapter 12

279

(5)

12.A Lattices and submodularity

279

(1)

12.B MLR dominance and submodularity on lines

279

(1)

12.C Proof of Theorem 12.3.4

280

(4)

13 Stopping time POMDPs for quickest change detection

284

(28)

13.1 Example 1: Quickest detection with phase-distributed change time and variance penalty

285

(5)

13.2 Example 2: Quickest transient detection

290

(1)

13.3 Example 3: Risk-sensitive quickest detection with exponential delay penalty

291

(3)

13.4 Examples 4, 5 and 6: Stopping time POMDPs in multi-agent social learning

294

(10)

13.5 Example 7: Quickest detection with controlled sampling

304

(5)

13.6 Complements and sources

309

(1)

Appendix to
Chapter 13

310

(2)

13.A Proof of Theorem 13.4.1

310

(2)

14 Myopic policy bounds for POMDPs and sensitivity to model parameters

312

(29)

14.1 The partially observed Markov decision process

313

(1)

14.2 Myopic policies using copositive dominance: insight

314

(2)

14.3 Constructing myopic policy bounds for optimal policy using copositive dominance

316

(2)

14.4 Optimizing the myopic policy bounds to match the optimal policy

318

(2)

14.5 Controlled sensing POMDPs with quadratic costs

320

(1)

14.6 Numerical examples

321

(3)

14.7 Blackwell dominance of observation distributions and optimality of myopic policies

324

(5)

14.8 Ordinal sensitivity: how does optimal POMDP cost vary with state and observation dynamics?

329

(2)

14.9 Cardinal sensitivity of POMDP

331

(1)

14.10 Complements and sources

332

(1)

Appendices to
Chapter 14

332

(11)

14.A POMDP numerical examples

332

(4)

14.B Proof of Theorem 14.8.1

336

(1)

14.C Proof of Theorem 14.9.1

337

(4)

Part IV Stochastic approximation and reinforcement learning

341

(87)

15 Stochastic optimization and gradient estimation

343

(21)

15.1 Stochastic gradient algorithm

344

(3)

15.2 How long to simulate a Markov chain?

347

(1)

15.3 Gradient estimators for Markov processes

348

(1)

15.4 Finite difference gradient estimators and SPSA

349

(1)

15.5 Score function gradient estimator

350

(2)

15.6 Weak derivative gradient estimator

352

(3)

15.7 Bias and variance of gradient estimators

355

(2)

15.8 Complements and sources

357

(1)

Appendices to
Chapter 15

358

(6)

15.A Proof of Theorem 15.2.1

358

(2)

15.B Proof of Theorem 15.7.1

360

(4)

16 Reinforcement learning

364

(16)

16.1 Q-learning algorithm

365

(3)

16.2 Policy gradient reinforcement learning for MDP

368

(3)

16.3 Score function policy gradient algorithm for MDP

371

(1)

16.4 Weak derivative gradient estimator for MDP

372

(2)

16.5 Numerical comparison of gradient estimators

374

(1)

16.6 Policy gradient reinforcement learning for constrained MDP (CMDP)

375

(2)

16.7 Policy gradient algorithm for POMDPs

377

(2)

16.8 Complements and sources

379

(1)

17 Stochastic approximation algorithms: examples

380

(45)

17.1 A primer on stochastic approximation algorithms

381

(5)

17.2 Example 1: Recursive maximum likelihood parameter estimation of HMMs

386

(3)

17.3 Example 2: HMM state estimation via LMS algorithm

389

(7)

17.4 Example 3: Discrete stochastic optimization for policy search

396

(11)

17.5 Example 4: Mean field population dynamics models for social sensing

407

(6)

17.6 Complements and sources

413

(4)

Appendices to
Chapter 17

417

(8)

17.A Proof of Theorem 17.3.1

417

(3)

17.B Proof of Theorem 17.4.1

420

(1)

17.C Proof of Theorem 17.4.2

421

(1)

17.D Proof of Theorem 17.5.2

422

(3)

18 Summary of algorithms for solving POMDPs

425

(3)

Appendix A Short primer on stochastic simulation

428

(14)

Appendix B Continuous-time HMM filters

442

(7)

Appendix C Markov processes

449

(2)

Appendix D Some limit theorems

451

(4)

References

455

(16)

Index

471

Vikram Krishnamurthy is a Professor and Canada Research Chair in Statistical Signal Processing at the University of British Columbia, Vancouver. His research contributions focus on nonlinear filtering, stochastic approximation algorithms and POMDPs. Dr Krishnamurthy is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and served as a distinguished lecturer for the IEEE Signal Processing Society. In 2013, he received an honorary doctorate from KTH, Royal Institute of Technology, Sweden.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97813165953432e.html

Märksõnad:

E-raamat: Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Muu info

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv