Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Deep Reinforcement Learning in Action

4.32/5 (38 hinnangut Goodreads-ist)

Alexander Zai, Brandon Brown

Formaat: 384 pages
Ilmumisaeg: 16-Mar-2020
Kirjastus: Manning Publications
Keel: eng
ISBN-13: 9781638350507

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 43,42 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 384 pages
Ilmumisaeg: 16-Mar-2020
Kirjastus: Manning Publications
Keel: eng
ISBN-13: 9781638350507

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Humans learn best from feedbackwe are encouraged to take actions that lead to positive results while deterred by decisions with negative consequences. This reinforcement process can be applied to computer programs allowing them to solve more complex problems that classical programming cannot.

Deep Reinforcement Learning in Action teaches you the fundamental concepts and terminology of deep reinforcement learning, along with the practical skills and techniques youll need to implement it into your own projects.

Key features

Structuring problems as Markov Decision Processes

Popular algorithms such Deep Q-Networks, Policy Gradient method and Evolutionary Algorithms and the intuitions that drive them

Applying reinforcement learning algorithms to real-world problems

Audience

Youll need intermediate Python skills and a basic understanding of deep learning.

About the technology

Deep reinforcement learning is a form of machine learning in which AI agents learn optimal behavior from their own raw sensory input. The system perceives the environment, interprets the results of its past decisions, and uses this information to optimize its behavior for maximum long-term return. Deep reinforcement learning famously contributed to the success of AlphaGo but thats not all it can do!

Alexander Zai is a Machine Learning Engineer at Amazon AI working on MXNet that powers a suite of AWS machine learning products. Brandon Brown is a Machine Learning and Data Analysis blogger at outlace.com committed to providing clear teaching on difficult topics for newcomers.

Preface

xiii

Acknowledgments

About This Book

xvi

About The Authors

xix

About The Cover Illustration

PART 1 FOUNDATIONS

(138)

1 What is reinforcement learning?

(20)

1.1 The "deep" in deep reinforcement learning

(2)

1.2 Reinforcement learning

(3)

1.3 Dynamic programming versus Monte Carlo

(1)

1.4 The reinforcement learning framework

(4)

1.5 What can I do with reinforcement learning?

(2)

1.6 Why deep reinforcement learning?

(2)

1.7 Our didactic tool: String diagrams

(2)

1.8 What's next?

(3)

2 Modeling reinforcement learning problems: Markov decision processes

(31)

2.1 String diagrams and our teaching methods

(5)

2.2 Solving the multi-arm bandit

(9)

Exploration and exploitation

(1)

Epsilon-greedy strategy

(5)

Softmax selection policy

(2)

2.3 Applying bandits to optimize ad placements

(3)

Contextual bandits

(1)

States, actions, rewards

(1)

2.4 Building networks with PyTorch

(2)

Automatic differentiation

(1)

Building Models

(1)

2.5 Solving contextual bandits

(5)

2.6 The Markov property

(2)

2.7 Predicting future rewards: Value and policy functions

(5)

Policy functions

(1)

Optimal policy

(1)

Value functions

(3)

3 Predicting the best states and actions: Deep Q networks

(36)

3.1 The Q function

(1)

3.2 Navigating with Q-learning

(19)

What is Q-learning?

(1)

Tackling Gridworld

(2)

Hyperparameters

(1)

Discount factor

(1)

Building the network

(2)

Introducing the Gridworld game engine

(2)

A neural network as the Q function

(10)

3.3 Preventing catastrophic forgetting: Experience replay

(5)

Catastrophic forgetting

(1)

Experience replay

(4)

3.4 Improving stability with a target network

(6)

Learning instability

(5)

3.5 Review

(4)

4 Learning to pick the best policy: Policy gradient methods

(21)

4.1 Policy function using neural networks

(4)

Neural network as the policy function

(1)

Stochastic policy gradient

(2)

Exploration

(1)

4.2 Reinforcing good actions: The policy gradient algorithm

(5)

Defining an objective

(2)

Action reinforcement

(1)

Log probability

(1)

Credit assignment

(1)

4.3 Working with OpenAI Gym

100

(3)

Cart Pole

102

(1)

The OpenAI Gym API

103

(1)

4.4 The REINFORCE algorithm

103

(8)

Creating the policy network

104

(1)

Having the agent interact with the environment

104

(1)

Training the model

105

(2)

The full training loop

107

(1)

Chapter conclusion

108

(3)

5 Tackling more complex problems with actor-critic methods

111

(28)

5.1 Combining the value and policy function

113

(5)

5.2 Distributed training

118

(5)

5.3 Advantage actor-critic

123

(9)

5.4 N-step actor-critic

132

(7)

PART 2 ABOVE AND BEYOND

139

(197)

6 Alternative optimization methods: Evolutionary algorithms

141

(26)

6.1 A different approach to reinforcement learning

142

(1)

6.2 Reinforcement learning with evolution strategies

143

(8)

Evolution in theory

143

(4)

Evolution in practice

147

(4)

6.3 A genetic algorithm for CartPole

151

(7)

6.4 Pros and cons of evolutionary algorithms

158

(1)

Evolutionary algorithms explore more

158

(1)

Evolutionary algorithms are incredibly sample intensive

158

(1)

Simulators

159

(1)

6.5 Evolutionary algorithms as a scalable alternative

159

(8)

Scaling evolutionary algorithms

160

(1)

Parallel vs. serial processing

161

(1)

Scaling efficiency

162

(1)

Communicating between nodes

163

(2)

Scaling linearly

165

(1)

Scaling gradient-based approaches

165

(2)

7 Distributional DQN: Getting the full story

167

(43)

7.1 What's wrong with Q-learning?

168

(5)

7.2 Probability and statistics revisited

173

(7)

Priors and posteriors

175

(1)

Expectation and variance

176

(4)

7.3 The Bellman equation

180

(1)

The distributional Bellman equation

180

(1)

7.4 Distributional Q-learning

181

(12)

Representing a probability distribution in Python

182

(9)

Implementing the Dist-DQN

191

(2)

7.5 Comparing probability distributions

193

(5)

7.6 Dist-DQN on simulated data

198

(5)

7.7 Using distributional Q-learning to play Freeway

203

(7)

8 Curiosity-driven exploration

210

(33)

8.1 Tackling sparse rewards with predictive coding

212

(3)

8.2 Inverse dynamics prediction

215

(3)

8.3 Setting up Super Mario Bros

218

(3)

8.4 Preprocessing and the Q-network

221

(2)

8.5 Setting up the Q-network and policy function

223

(3)

8.6 Intrinsic curiosity module

226

(13)

8.7 Alternative intrinsic reward mechanisms

239

(4)

9 Multi-agent reinforcement learning

243

(40)

9.1 From one to many agents

244

(4)

9.2 Neighborhood Q-learning

248

(4)

9.3 The ID Ising model

252

(9)

9.4 Mean field Q-learning and the 2D Ising model

261

(10)

9.5 Mixed cooperative-competitive games

271

(12)

10 Interpretable reinforcement learning: Attention and relational models

283

(46)

10.1 Machine learning interpretability with attention and relational biases

284

(3)

Invariance and equivariance

286

(1)

10.2 Relational reasoning with attention

287

(11)

Attention models

288

(2)

Relational reasoning

290

(5)

Self-attention models

295

(3)

10.3 Implementing self-attention for MNIST

298

(12)

Transformed MNIST

298

(1)

The relational module

299

(4)

Tensor contractions and Einstein notation

303

(3)

Training the relational module

306

(4)

10.4 Multi-head attention and relational DQN

310

(7)

10.5 Double Q-learning

317

(2)

10.6 Training and attention visualization

319

(10)

Maximum entropy learning

323

(1)

Curriculum learning

323

(1)

Visualizing attention weights

323

(6)

11 In conclusion: A review and roadmap

329

(1)

11.1 What did we learn?

329

(2)

11.2 The uncharted topics in deep reinforcement learning

331

(4)

Prioritized experience replay

331

(1)

Proximal policy optimization (PPO)

332

(1)

Hierarchical reinforcement learning and the options framework

333

(1)

Model-based planning

333

(1)

Monte Carlo tree search (MCTS)

334

(1)

11.3 The end

335

(1)

Appendix Mathematics, deep learning, PyTorch

336

(12)

Reference list

348

(3)

Index

351

Alexander Zai is a Machine Learning Engineer at Amazon AI working on MXNet that powers a suite of AWS machine learning products. Brandon Brown is a Machine Learning and Data Analysis blogger at outlace.com committed to providing clear teaching on difficult topics for newcomers.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97816383505076e.html

Märksõnad:

E-raamat: Deep Reinforcement Learning in Action

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv