Muutke küpsiste eelistusi

Foundations of Deep Reinforcement Learning: Theory and Practice in Python [Pehme köide]

  • Formaat: Paperback / softback, 416 pages, kõrgus x laius x paksus: 234x176x18 mm, kaal: 600 g
  • Sari: Addison-Wesley Data & Analytics Series
  • Ilmumisaeg: 11-Feb-2020
  • Kirjastus: Addison Wesley
  • ISBN-10: 0135172381
  • ISBN-13: 9780135172384
Teised raamatud teemal:
  • Formaat: Paperback / softback, 416 pages, kõrgus x laius x paksus: 234x176x18 mm, kaal: 600 g
  • Sari: Addison-Wesley Data & Analytics Series
  • Ilmumisaeg: 11-Feb-2020
  • Kirjastus: Addison Wesley
  • ISBN-10: 0135172381
  • ISBN-13: 9780135172384
Teised raamatud teemal:
The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice

Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer gamessuch as Go, Atari games, and DotA 2to robotics.

Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python.

Understand each key aspect of a deep RL problem Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) Understand how algorithms can be parallelized synchronously and asynchronously Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work Explore algorithm benchmark results with tuned hyperparameters Understand how deep RL environments are designed

Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Arvustused

This book provides an accessible introduction to deep reinforcement learning covering the mathematical concepts behind popular algorithms as well as their practical implementation. I think the book will be a valuable resource for anyone looking to apply deep reinforcement learning in practice. Volodymyr Mnih, lead developer of DQN

An excellent book to quickly develop expertise in the theory, language, and practical implementation of deep reinforcement learning algorithms. A limpid exposition which uses familiar notation; all the most recent techniques explained with concise, readable code, and not a page wasted in irrelevant detours: it is the perfect way to develop a solid foundation on the topic. Vincent Vanhoucke, principal scientist, Google

As someone who spends their days trying to make deep reinforcement learning methods more useful for the general public, I can say that Laura and Kengs book is a welcome addition to the literature. It provides both a readable introduction to the fundamental concepts in reinforcement learning as well as intuitive explanations and code for many of the major algorithms in the field. I imagine this will become an invaluable resource for individuals interested in learning about deep reinforcement learning for years to come.

Arthur Juliani, senior machine learning engineer, Unity Technologies

Until now, the only way to get to grips with deep reinforcement learning was to slowly accumulate knowledge from dozens of different sources. Finally, we have a book bringing everything together in one place. Matthew Rahtz, ML researcher, ETH Zürich

Foreword xix
Preface xxi
Acknowledgments xxv
About the Authors xxvii
1 Introduction to Reinforcement Learning
1(22)
1.1 Reinforcement Learning
1(5)
1.2 Reinforcement Learning as MDP
6(3)
1.3 Learnable Functions in Reinforcement Learning
9(2)
1.4 Deep Reinforcement Learning Algorithms
11(6)
1.4.1 Policy-Based Algorithms
12(1)
1.4.2 Value-Based Algorithms
13(1)
1.4.3 Model-Based Algorithms
13(2)
1.4.4 Combined Methods
15(1)
1.4.5 Algorithms Covered in This Book
15(1)
1.4.6 On-Policy and Off-Policy Algorithms
16(1)
1.4.7 Summary
16(1)
1.5 Deep Learning for Reinforcement Learning
17(2)
1.6 Reinforcement Learning and Supervised Learning
19(2)
1.6.1 Lack of an Oracle
19(1)
1.6.2 Sparsity of Feedback
20(1)
1.6.3 Data Generation
20(1)
1.7 Summary
21(2)
I Policy-Based and Value-Based Algorithms
23(110)
2 Reinforce
25(28)
2.1 Policy
26(1)
2.2 The Objective Function
26(1)
2.3 The Policy Gradient
27(3)
2.3.1 Policy Gradient Derivation
28(2)
2.4 Monte Carlo Sampling
30(1)
2.5 Reinforce Algorithm
31(2)
2.5.1 Improving Reinforce
32(1)
2.6 Implementing Reinforce
33(11)
2.6.1 A Minimal Reinforce Implementation
33(3)
2.6.2 Constructing Policies with PyTorch
36(2)
2.6.3 Sampling Actions
38(1)
2.6.4 Calculating Policy Loss
39(1)
2.6.5 Reinforce Training Loop
40(1)
2.6.6 On-Policy Replay Memory
41(3)
2.7 Training a Reinforce Agent
44(3)
2.8 Experimental Results
47(4)
2.8.1 Experiment The Effect of Discount Factor γ
47(2)
2.8.2 Experiment: The Effect of Baseline
49(2)
2.9 Summary
51(1)
2.10 Further Reading
51(1)
2.11 History
51(2)
3 Sarsa
53(28)
3.1 The Q- and V-Functions
54(2)
3.2 Temporal Difference Learning
56(9)
3.2.1 Intuition for Temporal Difference Learning
59(6)
3.3 Action Selection in Sarsa
65(2)
3.3.1 Exploration and Exploitation
66(1)
3.4 Sarsa Algorithm
67(2)
3.4.1 On-Policy Algorithms
68(1)
3.5 Implementing Sarsa
69(5)
3.5.1 Action Function: ε-Greedy
69(1)
3.5.2 Calculating the Q-Loss
70(1)
3.5.3 Sarsa Training Loop
71(1)
3.5.4 On-Policy Batched Replay Memory
72(2)
3.6 Training a Sarsa Agent
74(2)
3.7 Experimental Results
76(2)
3.7.1 Experiment: The Effect of Learning Rate
77(1)
3.8 Summary
78(1)
3.9 Further Reading
79(1)
3.10 History
79(2)
4 Deep Q-Networks (DQN)
81(22)
4.1 Learning the Q-Function in DQN
82(1)
4.2 Action Selection in DQN
83(5)
4.2.1 The Boltzmann Policy
86(2)
4.3 Experience Replay
88(1)
4.4 DQN Algorithm
89(2)
4.5 Implementing DQN
91(5)
4.5.1 Calculating the Q-Loss
91(1)
4.5.2 DQN Training Loop
92(1)
4.5.3 Replay Memory
93(3)
4.6 Training a DQN Agent
96(3)
4.7 Experimental Results
99(2)
4.7.1 Experiment: The Effect of Network Architecture
99(2)
4.8 Summary
101(1)
4.9 Further Reading
102(1)
4.10 History
102(1)
5 Improving DQN
103(30)
5.1 Target Networks
104(2)
5.2 Double DQN
106(3)
5.3 Prioritized Experience Replay (PER)
109(3)
5.3.1 Importance Sampling
111(1)
5.4 Modified DQN Implementation
112(11)
5.4.1 Network Initialization
113(1)
5.4.2 Calculating the Q-Loss
113(2)
5.4.3 Updating the Target Network
115(1)
5.4.4 DQN with Target Networks
116(1)
5.4.5 Double DQN
116(1)
5.4.6 Prioritized Experienced Replay
117(6)
5.5 Training a DQN Agent to Play Atari Games
123(5)
5.6 Experimental Results
128(4)
5.6.1 Experiment: The Effect of Double DQN and PER
128(4)
5.7 Summary
132(1)
5.8 Further Reading
132(1)
II Combined Methods
133(74)
6 Advantage Actor-Critic (A2C)
135(30)
6.1 The Actor
136(1)
6.2 The Critic
136(5)
6.2.1 The Advantage Function
136(4)
6.2.2 Learning the Advantage Function
140(1)
6.3 A2C Algorithm
141(2)
6.4 Implementing A2C
143(5)
6.4.1 Advantage Estimation
144(3)
6.4.2 Calculating Value Loss and Policy Loss
147(1)
6.4.3 Actor-Critic Training Loop
147(1)
6.5 Network Architecture
148(2)
6.6 Training an A2C Agent
150(7)
6.6.1 A2C with n-Step Returns on Pong
150(3)
6.6.2 A2C with GAE on Pong
153(2)
6.6.3 A2C with n-Step Returns on Bipeda lWalker
155(2)
6.7 Experimental Results
157(4)
6.7.1 Experiment: The Effect of n-Step Returns
158(1)
6.7.2 Experiment: The Effect of λ of GAE
159(2)
6.8 Summary
161(1)
6.9 Further Reading
162(1)
6.10 History
162(3)
7 Proximal Policy Optimization (PPO)
165(30)
7.1 Surrogate Objective
165(9)
7.1.1 Performance Collapse
166(2)
7.1.2 Modifying the Objective
168(6)
7.2 Proximal Policy Optimization (PPO)
174(3)
7.3 PPO Algorithm
177(2)
7.4 Implementing PPO
179(3)
7.4.1 Calculating the PPO Policy Loss
179(1)
7.4.2 PPO Training Loop
180(2)
7.5 Training a PPO Agent
182(6)
7.5.1 PPO on Pong
182(3)
7.5.2 PPO on BipedalWalker
185(3)
7.6 Experimental Results
188(4)
7.6.1 Experiment: The Effect of λ of GAE
188(2)
7.6.2 Experiment: The Effect of Clipping Variable ε
190(2)
7.7 Summary
192(1)
7.8 Further Reading
192(3)
8 Parallelization Methods
195(10)
8.1 Synchronous Parallelization
196(1)
8.2 Asynchronous Parallelization
197(3)
8.2.1 Hogwild!
198(2)
8.3 Training an A3C Agent
200(3)
8.4 Summary
203(1)
8.5 Further Reading
204(1)
9 Algorithm Summary
205(2)
III Practical Details
207(80)
10 Getting Deep RL to Work
209(30)
10.1 Software Engineering Practices
209(9)
10.1.1 Unit Tests
210(5)
10.1.2 Code Quality
215(1)
10.1.3 Git Workflow
216(2)
10.2 Debugging Tips
218(10)
10.2.1 Signs of Life
219(1)
10.2.2 Policy Gradient Diagnoses
219(1)
10.2.3 Data Diagnoses
220(2)
10.2.4 Preprocessor
222(1)
10.2.5 Memory
222(1)
10.2.6 Algorithmic Functions
222(1)
10.2.7 Neural Networks
222(3)
10.2.8 Algorithm Simplification
225(1)
10.2.9 Problem Simplification
226(1)
10.2.10 Hyperparameters
226(1)
10.2.11 Lab Workflow
226(2)
10.3 Atari Tricks
228(3)
10.4 Deep RL Almanac
231(7)
10.4.1 Hyperparameter Tables
231(3)
10.4.2 Algorithm Performance Comparison
234(4)
10.5 Summary
238(1)
11 SUM Lab
239(12)
11.1 Algorithms Implemented in SLM Lab
239(2)
11.2 Spec File
241(5)
11.2.1 Search Spec Syntax
243(3)
11.3 Running SLM Lab
246(1)
11.3.1 SLM Lab Commands
246(1)
11.4 Analyzing Experiment Results
247(2)
11.4.1 Overview of the Experiment Data
247(2)
11.5 Summary
249(2)
12 Network Architectures
251(22)
12.1 Types of Neural Networks
251(5)
12.1.1 Multilayer Perceptrons (MLPs)
252(1)
12.1.2 Convolutional Neural Networks (CNNs)
253(2)
12.1.3 Recurrent Neural Networks (RNNs)
255(1)
12.2 Guidelines for Choosing a Network Family
256(6)
12.2.1 MDPs vs. POMDPs
256(3)
12.2.2 Choosing Networks for Environments
259(3)
12.3 The Net API
262(9)
12.3.1 Input and Output Layer Shape Inference
264(2)
12.3.2 Automatic Network Construction
266(3)
12.3.3 Training Step
269(1)
12.3.4 Exposure of Underlying Methods
270(1)
12.4 Summary
271(1)
12.5 Further Reading
271(2)
13 Hardware
273(14)
13.1 Computer
273(5)
13.2 Data Types
278(2)
13.3 Optimizing Data Types in RL
280(5)
13.4 Choosing Hardware
285(1)
13.5 Summary
285(2)
IV Environment Design
287(56)
14 States
289(26)
14.1 Examples of States
289(7)
14.2 State Completeness
296(1)
14.3 State Complexity
297(4)
14.4 State Information Loss
301(5)
14.4.1 Image Grayscaling
301(1)
14.4.2 Discretization
302(1)
14.4.3 Hash Conflict
303(1)
14.4.4 Metainformation Loss
303(3)
14.5 Preprocessing
306(7)
14.5.1 Standardization
307(1)
14.5.2 Image Preprocessing
308(2)
14.5.3 Temporal Preprocessing
310(3)
14.6 Summary
313(2)
15 Actions
315(12)
15.1 Examples of Actions
315(3)
15.2 Action Completeness
318(1)
15.3 Action Complexity
319(4)
15.4 Summary
323(1)
15.5 Further Reading: Action Design in Everyday Things
324(3)
16 Rewards
327(6)
16.1 The Role of Rewards
327(1)
16.2 Reward Design Guidelines
328(4)
16.3 Summary
332(1)
17 Transition Function
333(10)
17.1 Feasibility Checks
333(2)
17.2 Reality Check
335(2)
17.3 Summary
337(1)
Epilogue
338(5)
A Deep Reinforcement Learning Timeline
343(2)
B Example Environments
345(8)
B.1 Discrete Environments
346(4)
B.1.1 CartPole-v0
346(1)
B.1.2 MountainCar-v0
347(1)
B.1.3 LunarLander-v2
347(1)
B.1.4 PongNoFrameskip-v4
348(1)
B.1.5 BreakoutNoFrameskip-v4
349(1)
B.2 Continuous Environments
350(3)
B.2.1 Pendulum-v0
350(1)
B.2.2 BipedalWalker-v2
350(3)
References 353(10)
Index 363
Laura Graesser is a research software engineer working in robotics at Google. She holds a masters degree in computer science from New York University, where she specialized in machine learning.

Wah Loon Keng is an AI engineer at Machine Zone, where he applies deep reinforcement learning to industrial problems. He has a background in both theoretical physics and computer science.