Muutke küpsiste eelistusi

E-raamat: Deep Reinforcement Learning in Action

  • Formaat: 384 pages
  • Ilmumisaeg: 16-Mar-2020
  • Kirjastus: Manning Publications
  • Keel: eng
  • ISBN-13: 9781638350507
  • Formaat - EPUB+DRM
  • Hind: 43,42 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 384 pages
  • Ilmumisaeg: 16-Mar-2020
  • Kirjastus: Manning Publications
  • Keel: eng
  • ISBN-13: 9781638350507

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Humans learn best from feedbackwe are encouraged to take actions that lead to positive results while deterred by decisions with negative consequences. This reinforcement process can be applied to computer programs allowing them to solve more complex problems that classical programming cannot. 



Deep Reinforcement Learning in Action teaches you the fundamental concepts and terminology of deep reinforcement learning, along with the practical skills and techniques youll need to implement it into your own projects.





Key features

Structuring problems as Markov Decision Processes 

Popular algorithms such Deep Q-Networks, Policy Gradient method and Evolutionary Algorithms and the intuitions that drive them 

Applying reinforcement learning algorithms to real-world problems



Audience

Youll need intermediate Python skills and a basic understanding of deep learning.



About the technology

Deep reinforcement learning is a form of machine learning in which AI agents learn optimal behavior from their own raw sensory input. The system perceives the environment, interprets the results of its past decisions, and uses this information to optimize its behavior for maximum long-term return. Deep reinforcement learning famously contributed to the success of AlphaGo but thats not all it can do!



Alexander Zai is a Machine Learning Engineer at Amazon AI working on MXNet that powers a suite of AWS machine learning products. Brandon Brown is a Machine Learning and Data Analysis blogger at outlace.com committed to providing clear teaching on difficult topics for newcomers. 
Preface xiii
Acknowledgments xv
About This Book xvi
About The Authors xix
About The Cover Illustration xx
PART 1 FOUNDATIONS
1(138)
1 What is reinforcement learning?
3(20)
1.1 The "deep" in deep reinforcement learning
4(2)
1.2 Reinforcement learning
6(3)
1.3 Dynamic programming versus Monte Carlo
9(1)
1.4 The reinforcement learning framework
10(4)
1.5 What can I do with reinforcement learning?
14(2)
1.6 Why deep reinforcement learning?
16(2)
1.7 Our didactic tool: String diagrams
18(2)
1.8 What's next?
20(3)
2 Modeling reinforcement learning problems: Markov decision processes
23(31)
2.1 String diagrams and our teaching methods
23(5)
2.2 Solving the multi-arm bandit
28(9)
Exploration and exploitation
29(1)
Epsilon-greedy strategy
30(5)
Softmax selection policy
35(2)
2.3 Applying bandits to optimize ad placements
37(3)
Contextual bandits
38(1)
States, actions, rewards
39(1)
2.4 Building networks with PyTorch
40(2)
Automatic differentiation
40(1)
Building Models
41(1)
2.5 Solving contextual bandits
42(5)
2.6 The Markov property
47(2)
2.7 Predicting future rewards: Value and policy functions
49(5)
Policy functions
50(1)
Optimal policy
51(1)
Value functions
51(3)
3 Predicting the best states and actions: Deep Q networks
54(36)
3.1 The Q function
55(1)
3.2 Navigating with Q-learning
56(19)
What is Q-learning?
56(1)
Tackling Gridworld
57(2)
Hyperparameters
59(1)
Discount factor
60(1)
Building the network
61(2)
Introducing the Gridworld game engine
63(2)
A neural network as the Q function
65(10)
3.3 Preventing catastrophic forgetting: Experience replay
75(5)
Catastrophic forgetting
75(1)
Experience replay
76(4)
3.4 Improving stability with a target network
80(6)
Learning instability
81(5)
3.5 Review
86(4)
4 Learning to pick the best policy: Policy gradient methods
90(21)
4.1 Policy function using neural networks
91(4)
Neural network as the policy function
91(1)
Stochastic policy gradient
92(2)
Exploration
94(1)
4.2 Reinforcing good actions: The policy gradient algorithm
95(5)
Defining an objective
95(2)
Action reinforcement
97(1)
Log probability
98(1)
Credit assignment
99(1)
4.3 Working with OpenAI Gym
100(3)
Cart Pole
102(1)
The OpenAI Gym API
103(1)
4.4 The REINFORCE algorithm
103(8)
Creating the policy network
104(1)
Having the agent interact with the environment
104(1)
Training the model
105(2)
The full training loop
107(1)
Chapter conclusion
108(3)
5 Tackling more complex problems with actor-critic methods
111(28)
5.1 Combining the value and policy function
113(5)
5.2 Distributed training
118(5)
5.3 Advantage actor-critic
123(9)
5.4 N-step actor-critic
132(7)
PART 2 ABOVE AND BEYOND
139(197)
6 Alternative optimization methods: Evolutionary algorithms
141(26)
6.1 A different approach to reinforcement learning
142(1)
6.2 Reinforcement learning with evolution strategies
143(8)
Evolution in theory
143(4)
Evolution in practice
147(4)
6.3 A genetic algorithm for CartPole
151(7)
6.4 Pros and cons of evolutionary algorithms
158(1)
Evolutionary algorithms explore more
158(1)
Evolutionary algorithms are incredibly sample intensive
158(1)
Simulators
159(1)
6.5 Evolutionary algorithms as a scalable alternative
159(8)
Scaling evolutionary algorithms
160(1)
Parallel vs. serial processing
161(1)
Scaling efficiency
162(1)
Communicating between nodes
163(2)
Scaling linearly
165(1)
Scaling gradient-based approaches
165(2)
7 Distributional DQN: Getting the full story
167(43)
7.1 What's wrong with Q-learning?
168(5)
7.2 Probability and statistics revisited
173(7)
Priors and posteriors
175(1)
Expectation and variance
176(4)
7.3 The Bellman equation
180(1)
The distributional Bellman equation
180(1)
7.4 Distributional Q-learning
181(12)
Representing a probability distribution in Python
182(9)
Implementing the Dist-DQN
191(2)
7.5 Comparing probability distributions
193(5)
7.6 Dist-DQN on simulated data
198(5)
7.7 Using distributional Q-learning to play Freeway
203(7)
8 Curiosity-driven exploration
210(33)
8.1 Tackling sparse rewards with predictive coding
212(3)
8.2 Inverse dynamics prediction
215(3)
8.3 Setting up Super Mario Bros
218(3)
8.4 Preprocessing and the Q-network
221(2)
8.5 Setting up the Q-network and policy function
223(3)
8.6 Intrinsic curiosity module
226(13)
8.7 Alternative intrinsic reward mechanisms
239(4)
9 Multi-agent reinforcement learning
243(40)
9.1 From one to many agents
244(4)
9.2 Neighborhood Q-learning
248(4)
9.3 The ID Ising model
252(9)
9.4 Mean field Q-learning and the 2D Ising model
261(10)
9.5 Mixed cooperative-competitive games
271(12)
10 Interpretable reinforcement learning: Attention and relational models
283(46)
10.1 Machine learning interpretability with attention and relational biases
284(3)
Invariance and equivariance
286(1)
10.2 Relational reasoning with attention
287(11)
Attention models
288(2)
Relational reasoning
290(5)
Self-attention models
295(3)
10.3 Implementing self-attention for MNIST
298(12)
Transformed MNIST
298(1)
The relational module
299(4)
Tensor contractions and Einstein notation
303(3)
Training the relational module
306(4)
10.4 Multi-head attention and relational DQN
310(7)
10.5 Double Q-learning
317(2)
10.6 Training and attention visualization
319(10)
Maximum entropy learning
323(1)
Curriculum learning
323(1)
Visualizing attention weights
323(6)
11 In conclusion: A review and roadmap
329(1)
11.1 What did we learn?
329(2)
11.2 The uncharted topics in deep reinforcement learning
331(4)
Prioritized experience replay
331(1)
Proximal policy optimization (PPO)
332(1)
Hierarchical reinforcement learning and the options framework
333(1)
Model-based planning
333(1)
Monte Carlo tree search (MCTS)
334(1)
11.3 The end
335(1)
Appendix Mathematics, deep learning, PyTorch 336(12)
Reference list 348(3)
Index 351
Alexander Zai is a Machine Learning Engineer at Amazon AI working on MXNet that powers a suite of AWS machine learning products. Brandon Brown is a Machine Learning and Data Analysis blogger at outlace.com committed to providing clear teaching on difficult topics for newcomers.