Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Foundations of Deep Reinforcement Learning: Theory and Practice in Python [Pehme köide]

4.50/5 (12 hinnangut Goodreads-ist)

Laura Graesser, Wah Loon Keng

Formaat: Paperback / softback, 416 pages, kõrgus x laius x paksus: 234x176x18 mm, kaal: 600 g
Sari: Addison-Wesley Data & Analytics Series
Ilmumisaeg: 11-Feb-2020
Kirjastus: Addison Wesley
ISBN-10: 0135172381
ISBN-13: 9780135172384

Teised raamatud teemal:

Artificial intelligence - (Hetkel poes: 4 nimetust)
Databases - (Hetkel poes: 1 nimetust)

Pehme köide
Hind: 54,09 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 416 pages, kõrgus x laius x paksus: 234x176x18 mm, kaal: 600 g
Sari: Addison-Wesley Data & Analytics Series
Ilmumisaeg: 11-Feb-2020
Kirjastus: Addison Wesley
ISBN-10: 0135172381
ISBN-13: 9780135172384

Teised raamatud teemal:

Artificial intelligence - (Hetkel poes: 4 nimetust)
Databases - (Hetkel poes: 1 nimetust)

Püsilink: https://www.kriso.ee/db/9780135172384.html

Märksõnad:

The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice

Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer gamessuch as Go, Atari games, and DotA 2to robotics.

Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python.

Understand each key aspect of a deep RL problem Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) Understand how algorithms can be parallelized synchronously and asynchronously Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work Explore algorithm benchmark results with tuned hyperparameters Understand how deep RL environments are designed

Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Arvustused

This book provides an accessible introduction to deep reinforcement learning covering the mathematical concepts behind popular algorithms as well as their practical implementation. I think the book will be a valuable resource for anyone looking to apply deep reinforcement learning in practice. Volodymyr Mnih, lead developer of DQN

An excellent book to quickly develop expertise in the theory, language, and practical implementation of deep reinforcement learning algorithms. A limpid exposition which uses familiar notation; all the most recent techniques explained with concise, readable code, and not a page wasted in irrelevant detours: it is the perfect way to develop a solid foundation on the topic. Vincent Vanhoucke, principal scientist, Google

As someone who spends their days trying to make deep reinforcement learning methods more useful for the general public, I can say that Laura and Kengs book is a welcome addition to the literature. It provides both a readable introduction to the fundamental concepts in reinforcement learning as well as intuitive explanations and code for many of the major algorithms in the field. I imagine this will become an invaluable resource for individuals interested in learning about deep reinforcement learning for years to come.

Arthur Juliani, senior machine learning engineer, Unity Technologies

Until now, the only way to get to grips with deep reinforcement learning was to slowly accumulate knowledge from dozens of different sources. Finally, we have a book bringing everything together in one place. Matthew Rahtz, ML researcher, ETH Zürich

Foreword

xix

Preface

xxi

Acknowledgments

xxv

About the Authors

xxvii

1 Introduction to Reinforcement Learning

(22)

1.1 Reinforcement Learning

(5)

1.2 Reinforcement Learning as MDP

(3)

1.3 Learnable Functions in Reinforcement Learning

(2)

1.4 Deep Reinforcement Learning Algorithms

(6)

1.4.1 Policy-Based Algorithms

(1)

1.4.2 Value-Based Algorithms

(1)

1.4.3 Model-Based Algorithms

(2)

1.4.4 Combined Methods

(1)

1.4.5 Algorithms Covered in This Book

(1)

1.4.6 On-Policy and Off-Policy Algorithms

(1)

1.4.7 Summary

(1)

1.5 Deep Learning for Reinforcement Learning

(2)

1.6 Reinforcement Learning and Supervised Learning

(2)

1.6.1 Lack of an Oracle

(1)

1.6.2 Sparsity of Feedback

(1)

1.6.3 Data Generation

(1)

1.7 Summary

(2)

I Policy-Based and Value-Based Algorithms

(110)

2 Reinforce

(28)

2.1 Policy

(1)

2.2 The Objective Function

(1)

2.3 The Policy Gradient

(3)

2.3.1 Policy Gradient Derivation

(2)

2.4 Monte Carlo Sampling

(1)

2.5 Reinforce Algorithm

(2)

2.5.1 Improving Reinforce

(1)

2.6 Implementing Reinforce

(11)

2.6.1 A Minimal Reinforce Implementation

(3)

2.6.2 Constructing Policies with PyTorch

(2)

2.6.3 Sampling Actions

(1)

2.6.4 Calculating Policy Loss

(1)

2.6.5 Reinforce Training Loop

(1)

2.6.6 On-Policy Replay Memory

(3)

2.7 Training a Reinforce Agent

(3)

2.8 Experimental Results

(4)

2.8.1 Experiment The Effect of Discount Factor γ

(2)

2.8.2 Experiment: The Effect of Baseline

(2)

2.9 Summary

(1)

2.10 Further Reading

(1)

2.11 History

(2)

3 Sarsa

(28)

3.1 The Q- and V-Functions

(2)

3.2 Temporal Difference Learning

(9)

3.2.1 Intuition for Temporal Difference Learning

(6)

3.3 Action Selection in Sarsa

(2)

3.3.1 Exploration and Exploitation

(1)

3.4 Sarsa Algorithm

(2)

3.4.1 On-Policy Algorithms

(1)

3.5 Implementing Sarsa

(5)

3.5.1 Action Function: ε-Greedy

(1)

3.5.2 Calculating the Q-Loss

(1)

3.5.3 Sarsa Training Loop

(1)

3.5.4 On-Policy Batched Replay Memory

(2)

3.6 Training a Sarsa Agent

(2)

3.7 Experimental Results

(2)

3.7.1 Experiment: The Effect of Learning Rate

(1)

3.8 Summary

(1)

3.9 Further Reading

(1)

3.10 History

(2)

4 Deep Q-Networks (DQN)

(22)

4.1 Learning the Q-Function in DQN

(1)

4.2 Action Selection in DQN

(5)

4.2.1 The Boltzmann Policy

(2)

4.3 Experience Replay

(1)

4.4 DQN Algorithm

(2)

4.5 Implementing DQN

(5)

4.5.1 Calculating the Q-Loss

(1)

4.5.2 DQN Training Loop

(1)

4.5.3 Replay Memory

(3)

4.6 Training a DQN Agent

(3)

4.7 Experimental Results

(2)

4.7.1 Experiment: The Effect of Network Architecture

(2)

4.8 Summary

101

(1)

4.9 Further Reading

102

(1)

4.10 History

102

(1)

5 Improving DQN

103

(30)

5.1 Target Networks

104

(2)

5.2 Double DQN

106

(3)

5.3 Prioritized Experience Replay (PER)

109

(3)

5.3.1 Importance Sampling

111

(1)

5.4 Modified DQN Implementation

112

(11)

5.4.1 Network Initialization

113

(1)

5.4.2 Calculating the Q-Loss

113

(2)

5.4.3 Updating the Target Network

115

(1)

5.4.4 DQN with Target Networks

116

(1)

5.4.5 Double DQN

116

(1)

5.4.6 Prioritized Experienced Replay

117

(6)

5.5 Training a DQN Agent to Play Atari Games

123

(5)

5.6 Experimental Results

128

(4)

5.6.1 Experiment: The Effect of Double DQN and PER

128

(4)

5.7 Summary

132

(1)

5.8 Further Reading

132

(1)

II Combined Methods

133

(74)

6 Advantage Actor-Critic (A2C)

135

(30)

6.1 The Actor

136

(1)

6.2 The Critic

136

(5)

6.2.1 The Advantage Function

136

(4)

6.2.2 Learning the Advantage Function

140

(1)

6.3 A2C Algorithm

141

(2)

6.4 Implementing A2C

143

(5)

6.4.1 Advantage Estimation

144

(3)

6.4.2 Calculating Value Loss and Policy Loss

147

(1)

6.4.3 Actor-Critic Training Loop

147

(1)

6.5 Network Architecture

148

(2)

6.6 Training an A2C Agent

150

(7)

6.6.1 A2C with n-Step Returns on Pong

150

(3)

6.6.2 A2C with GAE on Pong

153

(2)

6.6.3 A2C with n-Step Returns on Bipeda lWalker

155

(2)

6.7 Experimental Results

157

(4)

6.7.1 Experiment: The Effect of n-Step Returns

158

(1)

6.7.2 Experiment: The Effect of λ of GAE

159

(2)

6.8 Summary

161

(1)

6.9 Further Reading

162

(1)

6.10 History

162

(3)

7 Proximal Policy Optimization (PPO)

165

(30)

7.1 Surrogate Objective

165

(9)

7.1.1 Performance Collapse

166

(2)

7.1.2 Modifying the Objective

168

(6)

7.2 Proximal Policy Optimization (PPO)

174

(3)

7.3 PPO Algorithm

177

(2)

7.4 Implementing PPO

179

(3)

7.4.1 Calculating the PPO Policy Loss

179

(1)

7.4.2 PPO Training Loop

180

(2)

7.5 Training a PPO Agent

182

(6)

7.5.1 PPO on Pong

182

(3)

7.5.2 PPO on BipedalWalker

185

(3)

7.6 Experimental Results

188

(4)

7.6.1 Experiment: The Effect of λ of GAE

188

(2)

7.6.2 Experiment: The Effect of Clipping Variable ε

190

(2)

7.7 Summary

192

(1)

7.8 Further Reading

192

(3)

8 Parallelization Methods

195

(10)

8.1 Synchronous Parallelization

196

(1)

8.2 Asynchronous Parallelization

197

(3)

8.2.1 Hogwild!

198

(2)

8.3 Training an A3C Agent

200

(3)

8.4 Summary

203

(1)

8.5 Further Reading

204

(1)

9 Algorithm Summary

205

(2)

III Practical Details

207

(80)

10 Getting Deep RL to Work

209

(30)

10.1 Software Engineering Practices

209

(9)

10.1.1 Unit Tests

210

(5)

10.1.2 Code Quality

215

(1)

10.1.3 Git Workflow

216

(2)

10.2 Debugging Tips

218

(10)

10.2.1 Signs of Life

219

(1)

10.2.2 Policy Gradient Diagnoses

219

(1)

10.2.3 Data Diagnoses

220

(2)

10.2.4 Preprocessor

222

(1)

10.2.5 Memory

222

(1)

10.2.6 Algorithmic Functions

222

(1)

10.2.7 Neural Networks

222

(3)

10.2.8 Algorithm Simplification

225

(1)

10.2.9 Problem Simplification

226

(1)

10.2.10 Hyperparameters

226

(1)

10.2.11 Lab Workflow

226

(2)

10.3 Atari Tricks

228

(3)

10.4 Deep RL Almanac

231

(7)

10.4.1 Hyperparameter Tables

231

(3)

10.4.2 Algorithm Performance Comparison

234

(4)

10.5 Summary

238

(1)

11 SUM Lab

239

(12)

11.1 Algorithms Implemented in SLM Lab

239

(2)

11.2 Spec File

241

(5)

11.2.1 Search Spec Syntax

243

(3)

11.3 Running SLM Lab

246

(1)

11.3.1 SLM Lab Commands

246

(1)

11.4 Analyzing Experiment Results

247

(2)

11.4.1 Overview of the Experiment Data

247

(2)

11.5 Summary

249

(2)

12 Network Architectures

251

(22)

12.1 Types of Neural Networks

251

(5)

12.1.1 Multilayer Perceptrons (MLPs)

252

(1)

12.1.2 Convolutional Neural Networks (CNNs)

253

(2)

12.1.3 Recurrent Neural Networks (RNNs)

255

(1)

12.2 Guidelines for Choosing a Network Family

256

(6)

12.2.1 MDPs vs. POMDPs

256

(3)

12.2.2 Choosing Networks for Environments

259

(3)

12.3 The Net API

262

(9)

12.3.1 Input and Output Layer Shape Inference

264

(2)

12.3.2 Automatic Network Construction

266

(3)

12.3.3 Training Step

269

(1)

12.3.4 Exposure of Underlying Methods

270

(1)

12.4 Summary

271

(1)

12.5 Further Reading

271

(2)

13 Hardware

273

(14)

13.1 Computer

273

(5)

13.2 Data Types

278

(2)

13.3 Optimizing Data Types in RL

280

(5)

13.4 Choosing Hardware

285

(1)

13.5 Summary

285

(2)

IV Environment Design

287

(56)

14 States

289

(26)

14.1 Examples of States

289

(7)

14.2 State Completeness

296

(1)

14.3 State Complexity

297

(4)

14.4 State Information Loss

301

(5)

14.4.1 Image Grayscaling

301

(1)

14.4.2 Discretization

302

(1)

14.4.3 Hash Conflict

303

(1)

14.4.4 Metainformation Loss

303

(3)

14.5 Preprocessing

306

(7)

14.5.1 Standardization

307

(1)

14.5.2 Image Preprocessing

308

(2)

14.5.3 Temporal Preprocessing

310

(3)

14.6 Summary

313

(2)

15 Actions

315

(12)

15.1 Examples of Actions

315

(3)

15.2 Action Completeness

318

(1)

15.3 Action Complexity

319

(4)

15.4 Summary

323

(1)

15.5 Further Reading: Action Design in Everyday Things

324

(3)

16 Rewards

327

(6)

16.1 The Role of Rewards

327

(1)

16.2 Reward Design Guidelines

328

(4)

16.3 Summary

332

(1)

17 Transition Function

333

(10)

17.1 Feasibility Checks

333

(2)

17.2 Reality Check

335

(2)

17.3 Summary

337

(1)

Epilogue

338

(5)

A Deep Reinforcement Learning Timeline

343

(2)

B Example Environments

345

(8)

B.1 Discrete Environments

346

(4)

B.1.1 CartPole-v0

346

(1)

B.1.2 MountainCar-v0

347

(1)

B.1.3 LunarLander-v2

347

(1)

B.1.4 PongNoFrameskip-v4

348

(1)

B.1.5 BreakoutNoFrameskip-v4

349

(1)

B.2 Continuous Environments

350

(3)

B.2.1 Pendulum-v0

350

(1)

B.2.2 BipedalWalker-v2

350

(3)

References

353

(10)

Index

363

Laura Graesser is a research software engineer working in robotics at Google. She holds a masters degree in computer science from New York University, where she specialized in machine learning.

Wah Loon Keng is an AI engineer at Machine Zone, where he applies deep reinforcement learning to industrial problems. He has a background in both theoretical physics and computer science.

Foundations of Deep Reinforcement Learning: Theory and Practice in Python [Pehme köide]

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv