Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Bandit Algorithms

4.57/5 (20 hinnangut Goodreads-ist)

Tor Lattimore (University of Alberta), Csaba Szepesvįri (University of Alberta)

Formaat: PDF+DRM
Ilmumisaeg: 16-Jul-2020
Kirjastus: Cambridge University Press
Keel: eng
ISBN-13: 9781108687492

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 51,86 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Ilmumisaeg: 16-Jul-2020
Kirjastus: Cambridge University Press
Keel: eng
ISBN-13: 9781108687492

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

"Decision-making in the face of uncertainty is a significant challenge in machine learning, and the multi-armed bandit model is a commonly used framework to address it. This comprehensive and rigorous introduction to the multi-armed bandit problem examines all the major settings, including stochastic, adversarial, and Bayesian frameworks. A focus on both mathematical intuition and carefully worked proofs makes this an excellent reference for established researchers and a helpful resource for graduate students in computer science, engineering, statistics, applied mathematics and economics. Linear bandits receive special attention as one of the most useful models in applications, while other chapters are dedicated to combinatorial bandits, ranking, non-stationary problems, Thompson sampling and pure exploration. The book ends with a peek into the world beyond bandits with an introduction to partial monitoring and learning in Markov decision processes"--

Arvustused

'This year marks the 68th anniversary of 'multi-armed bandits' introduced by Herbert Robbins in 1952, and the 35th anniversary of his 1985 paper with me that advanced multi-armed bandit theory in new directions via the concept of 'regret' and a sharp asymptotic lower bound for the regret. This vibrant subject has attracted important multidisciplinary developments and applications. Bandit Algorithms gives it a comprehensive and up-to-date treatment, and meets the need for such books in instruction and research in the subject, as in a new course on contextual bandits and recommendation technology that I am developing at Stanford.' Tze L. Lai, Stanford University 'This is a timely book on the theory of multi-armed bandits, covering a very broad range of basic and advanced topics. The rigorous treatment combined with intuition makes it an ideal resource for anyone interested in the mathematical and algorithmic foundations of a fascinating and rapidly growing field of research.' Nicolņ Cesa-Bianchi, University of Milan 'The field of bandit algorithms, in its modern form, and driven by prominent new applications, has been taking off in multiple directions. The book by Lattimore and Szepesvįri is a timely contribution that will become a standard reference on the subject. The book offers a thorough exposition of an enormous amount of material, neatly organized in digestible pieces. It is mathematically rigorous, but also pleasant to read, rich in intuition and historical notes, and without superfluous details. Highly recommended.' John Tsitsiklis, Massachusetts Institute of Technology

Muu info

A comprehensive and rigorous introduction for graduate students and researchers, with applications in sequential decision-making problems.

Preface

xiii

Notation

Part I Bandits, Probability and Concentration

(72)

1 Introduction

(9)

1.1 The Language of Bandits

(3)

1.2 Applications

(3)

1.3 Notes

(1)

1.4 Bibliographic Remarks

(2)

2 Foundations of Probability ()

(24)

2.1 Probability Spaces and Random Elements

(7)

2.2 σ-Algebras and Knowledge

(1)

2.3 Conditional Probabilities

(1)

2.4 Independence

(2)

2.5 Integration and Expectation

(2)

2.6 Conditional Expectation

(4)

2.7 Notes

(3)

2.8 Bibliographic Remarks

(1)

2.9 Exercises

(3)

3 Stochastic Processes and Markov Chains ()

(9)

3.1 Stochastic Processes

(1)

3.2 Markov Chains

(2)

3.3 Martingales and Stopping Times

(2)

3.4 Notes

(1)

3.5 Bibliographic Remarks

(1)

3.6 Exercises

(2)

4 Stochastic Bandits

(15)

4.1 Core Assumptions

(1)

4.2 The Learning Objective

(1)

4.3 Knowledge and Environment Classes

(2)

4.4 The Regret

(2)

4.5 Decomposing the Regret

(1)

4.6 The Canonical Bandit Model ()

(2)

4.7 The Canonical Bandit Model for Uncountable Action Sets ()

(1)

4.8 Notes

(1)

4.9 Bibliographical Remarks

(1)

4.10 Exercises

(4)

5 Concentration of Measure

(13)

5.1 Tail Probabilities

(1)

5.2 The Inequalities of Markov and Chebyshev

(1)

5.3 The Cramer-Chernoff Method and Subgaussian Random Variables

(2)

5.4 Notes

(2)

5.5 Bibliographical Remarks

(1)

5.6 Exercises

(7)

Part II Stochastic Bandits with Finitely Many Arms

(50)

6 The Explore-Then-Commit Algorithm

(9)

6.1 Algorithm and Regret Analysis

(3)

6.2 Notes

(1)

6.3 Bibliographical Remarks

(1)

6.4 Exercises

(5)

7 The Upper Confidence Bound Algorithm

(13)

7.1 The Optimism Principle

(7)

7.2 Notes

(1)

7.3 Bibliographical Remarks

(1)

7.4 Exercises

(5)

8 The Upper Confidence Bound Algorithm: Asymptotic Optimality

(6)

8.1 Asymptotically Optimal UCB

(3)

8.2 Notes

100

(1)

8.3 Bibliographic Remarks

100

(1)

8.4 Exercises

101

(2)

9 The Upper Confidence Bound Algorithm: Minimax Optimality

103

(9)

9.1 The MOSS Algorithm

103

(3)

9.2 Two Problems

106

(2)

9.3 Notes

108

(1)

9.4 Bibliographic Remarks

108

(1)

9.5 Exercises

109

(3)

10 The Upper Confidence Bound Algorithm: Bernoulli Noise

112

(11)

10.1 Concentration for Sums of Bernoulli Random Variables

112

(3)

10.2 The KL-UCB Algorithm

115

(3)

10.3 Notes

118

(1)

10.4 Bibliographic Remarks

119

(1)

10.5 Exercises

119

(4)

Part III Adversarial Bandits with Finitely Many Arms

123

(30)

11 The Exp3 Algorithm

127

(15)

11.1 Adversarial Bandit Environments

127

(2)

11.2 Importance-Weighted Estimators

129

(2)

11.3 The Exp3 Algorithm

131

(1)

11.4 Regret Analysis

131

(4)

11.5 Notes

135

(2)

11.6 Bibliographic Remarks

137

(1)

11.7 Exercises

138

(4)

12 The Exp3-IX Algorithm

142

(11)

12.1 The Exp3-IX Algorithm

142

(2)

12.2 Regret Analysis

144

(4)

12.3 Notes

148

(1)

12.4 Bibliographic Remarks

149

(1)

12.5 Exercises

149

(4)

Part IV Lower Bounds for Bandits with Finitely Many Arms

153

(38)

13 Lower Bounds: Basic Ideas

155

(5)

13.1 Main Ideas Underlying Minimax Lower Bounds

155

(3)

13.2 Notes

158

(1)

13.3 Bibliographic Remarks

158

(1)

13.4 Exercises

159

(1)

14 Foundations of Information Theory

160

(10)

14.1 Entropy and Optimal Coding

160

(2)

14.2 Relative Entropy

162

(3)

14.3 Notes

165

(2)

14.4 Bibliographic Remarks

167

(1)

14.5 Exercises

167

(3)

15 Minimax Lower Bounds

170

(7)

15.1 Relative Entropy Between Bandits

170

(1)

15.2 Minimax Lower Bounds

171

(2)

15.3 Notes

173

(1)

15.4 Bibliographic Remarks

174

(1)

15.5 Exercises

174

(3)

16 Instance-Dependent Lower Bounds

177

(8)

16.1 Asymptotic Bounds

177

(3)

16.2 Finite-Time Bounds

180

(1)

16.3 Notes

181

(1)

16.4 Bibliographic Remarks

181

(1)

16.5 Exercises

181

(4)

17 High-Probability Lower Bounds

185

(6)

17.1 Stochastic Bandits

186

(2)

17.2 Adversarial Bandits

188

(2)

17.3 Notes

190

(1)

17.4 Bibliographic Remarks

190

(1)

17.5 Exercises

190

(1)

Part V Contextual and Linear Bandits

191

(74)

18 Contextual Bandits

193

(12)

18.1 Contextual Bandits: One Bandit per Context

193

(2)

18.2 Bandits with Expert Advice

195

(2)

18.3 Exp4

197

(1)

18.4 Regret Analysis

198

(2)

18.5 Notes

200

(1)

18.6 Bibliographic Remarks

201

(1)

18.7 Exercises

202

(3)

19 Stochastic Linear Bandits

205

(14)

19.1 Stochastic Contextual Bandits

205

(2)

19.2 Stochastic Linear Bandits

207

(2)

19.3 Regret Analysis

209

(2)

19.4 Notes

211

(2)

19.5 Bibliographic Remarks

213

(1)

19.6 Exercises

214

(5)

20 Confidence Bounds for Least Squares Estimators

219

(12)

20.1 Martingales and the Method of Mixtures

221

(4)

20.2 Notes

225

(1)

20.3 Bibliographic Remarks

226

(1)

20.4 Exercises

227

(4)

21 Optimal Design for Least Squares Estimators

231

(5)

21.1 The Kiefer--Wolfowitz Theorem

231

(2)

21.2 Notes

233

(2)

21.3 Bibliographic Remarks

235

(1)

21.4 Exercises

235

(1)

22 Stochastic Linear Bandits with Finitely Many Arms

236

(4)

22.1 Notes

237

(1)

22.2 Bibliographic Remarks

238

(1)

22.3 Exercises

238

(2)

23 Stochastic Linear Bandits with Sparsity

240

(10)

23.1 Sparse Linear Stochastic Bandits

240

(1)

23.2 Elimination on the Hypercube

241

(3)

23.3 Online to Confidence Set Conversion

244

(4)

23.4 Sparse Online Linear Prediction

248

(1)

23.5 Notes

248

(1)

23.6 Bibliographical Remarks

249

(1)

23.7 Exercises

249

(1)

24 Minimax Lower Bounds for Stochastic Linear Bandits

250

(8)

24.1 Hypercube

250

(2)

24.2 Unit Ball

252

(1)

24.3 Sparse Parameter Vectors

253

(2)

24.4 Misspecified Models

255

(1)

24.5 Notes

256

(1)

24.6 Bibliographic Remarks

257

(1)

24.7 Exercises

257

(1)

25 Asymptotic Lower Bounds for Stochastic Linear Bandits

258

(7)

25.1 An Asymptotic Lower Bound for Fixed Action Sets

258

(4)

25.2 Clouds Looming for Optimism

262

(1)

25.3 Notes

263

(1)

25.4 Bibliographic Remarks

264

(1)

25.5 Exercises

264

(1)

Part VI Adversarial Linear Bandits

265

(48)

26 Foundations of Convex Analysis

267

(11)

26.1 Convex Sets and Functions

267

(2)

26.2 Jensen's Inequality

269

(1)

26.3 Bregman Divergence

269

(2)

26.4 Legendre Functions

271

(2)

26.5 Optimisation

273

(1)

26.6 Projections

274

(1)

26.7 Notes

274

(1)

26.8 Bibliographic Remarks

275

(1)

26.9 Exercises

275

(3)

27 Exp3 for Adversarial Linear Bandits

278

(8)

27.1 Exponential Weights for Linear Bandits

278

(2)

27.2 Regret Analysis

280

(1)

27.3 Continuous Exponential Weights

281

(2)

27.4 Notes

283

(1)

27.5 Bibliographic Remarks

283

(1)

27.6 Exercises

284

(2)

28 Follow-the-regularised-Leader and Mirror Descent

286

(20)

28.1 Online Linear Optimisation

286

(4)

28.2 Regret Analysis

290

(4)

28.3 Application to Linear Bandits

294

(1)

28.4 Linear Bandits on the Unit Ball

295

(3)

28.5 Notes

298

(3)

28.6 Bibliographic Remarks

301

(1)

28.7 Exercises

301

(5)

29 The Relation between Adversarial and Stochastic Linear Bandits

306

(7)

29.1 Unified View

306

(1)

29.2 Reducing Stochastic Linear Bandits to Adversarial Linear Bandits

307

(1)

29.3 Stochastic Linear Bandits with Parameter Noise

308

(1)

29.4 Contextual Linear Bandits

309

(1)

29.5 Notes

310

(1)

29.6 Bibliographic Remarks

311

(1)

29.7 Exercises

311

(2)

Part VII Other Topics

313

(108)

30 Combinatorial Bandits

317

(14)

30.1 Notation and Assumptions

317

(1)

30.2 Applications

318

(1)

30.3 Bandit Feedback

319

(1)

30.4 Semi-bandit Feedback and Mirror Descent

320

(1)

30.5 Follow-the-Perturbed-Leader

321

(5)

30.6 Notes

326

(1)

30.7 Bibliographic Remarks

327

(1)

30.8 Exercises

328

(3)

31 Non-stationary Bandits

331

(9)

31.1 Adversarial Bandits

331

(3)

31.2 Stochastic Bandits

334

(2)

31.3 Notes

336

(1)

31.4 Bibliographic Remarks

337

(1)

31.5 Exercises

338

(2)

32 Ranking

340

(13)

32.1 Click Models

341

(2)

32.2 Policy

343

(2)

32.3 Regret Analysis

345

(4)

32.4 Notes

349

(2)

32.5 Bibliographic Remarks

351

(1)

32.6 Exercises

351

(2)

33 Pure Exploration

353

(16)

33.1 Simple Regret

353

(2)

33.2 Best-Arm Identification with a Fixed Confidence

355

(6)

33.3 Best-Arm Identification with a Budget

361

(1)

33.4 Notes

362

(2)

33.5 Bibliographical Remarks

364

(1)

33.6 Exercises

365

(4)

34 Foundations of Bayesian Learning

369

(17)

34.1 Statistical Decision Theory and Bayesian Learning

369

(1)

34.2 Bayesian Learning and the Posterior Distribution

370

(4)

34.3 Conjugate Pairs, Conjugate Priors and the Exponential Family

374

(3)

34.4 The Bayesian Bandit Environment

377

(1)

34.5 Posterior Distributions in Bandits

378

(1)

34.6 Bayesian Regret

379

(1)

34.7 Notes

380

(2)

34.8 Bibliographic Remarks

382

(1)

34.9 Exercises

382

(4)

35 Bayesian Bandits

386

(18)

35.1 Bayesian Optimal Regret for k-Armed Stochastic Bandits

386

(1)

35.2 Optimal Stopping

387

(2)

35.3 One-armed bandits

389

(4)

35.4 Gittins Index

393

(6)

35.5 Computing the Gittins Index

399

(1)

35.6 Notes

399

(2)

35.7 Bibliographical Remarks

401

(1)

35.8 Exercises

402

(2)

36 Thompson Sampling

404

(17)

36.1 Finite-Armed Bandits

404

(2)

36.2 Frequentist Analysis

406

(3)

36.3 Linear Bandits

409

(2)

36.4 Information Theoretic Analysis

411

(3)

36.5 Notes

414

(2)

36.6 Bibliographic Remarks

416

(1)

36.7 Exercises

417

(4)

Part VIII Beyond Bandits

421

(63)

37 Partial Monitoring

423

(29)

37.1 Finite Adversarial Partial Monitoring Problems

424

(2)

37.2 The Structure of Partial Monitoring

426

(4)

37.3 Classification of Finite Adversarial Partial Monitoring

430

(1)

37.4 Lower Bounds

430

(5)

37.5 Policy and Upper Bounds

435

(4)

37.6 Proof of Theorem 37.16

439

(1)

37.7 Proof of Theorem 37.17

440

(4)

37.8 Proof of the Classification Theorem

444

(1)

37.9 Notes

445

(2)

37.10 Bibliographical Remarks

447

(2)

37.11 Exercises

449

(3)

38 Markov Decision Processes

452

(32)

38.1 Problem Set-Up

452

(3)

38.2 Optimal Policies and the Bellman Optimality Equation

455

(3)

38.3 Finding an Optimal Policy

458

(4)

38.4 Learning in Markov Decision Processes

462

(1)

38.5 Upper Confidence Bounds for Reinforcement Learning

463

(2)

38.6 Proof of Upper Bound

465

(3)

38.7 Proof of Lower Bound

468

(3)

38.8 Notes

471

(2)

38.9 Bibliographical Remarks

473

(2)

38.10 Exercises

475

(9)

Bibliography

484

(29)

Index

513

Tor Lattimore is a research scientist at DeepMind. His research is focused on decision making in the face of uncertainty, including bandit algorithms and reinforcement learning. Before joining DeepMind he was an assistant professor at Indiana University and a postdoctoral fellow at the University of Alberta. Csaba Szepesvįri is a Professor in the Department of Computing Science at the University of Alberta and a Principal Investigator of the Alberta Machine Intelligence Institute. He also leads the 'Foundations' team at DeepMind. He has co-authored a book on nonlinear approximate adaptive controllers and authored a book on reinforcement learning, in addition to publishing over 200 journal and conference papers. He is an action editor of the Journal of Machine Learning Research.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97811086874922e.html

Märksõnad:

E-raamat: Bandit Algorithms

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Muu info

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv