|
1 Reinforcement Learning as a Subfield of Machine Learning |
|
|
1 | (14) |
|
1.1 Machine Learning as Automated Processing of Feedback from the Environment |
|
|
2 | (1) |
|
|
3 | (5) |
|
1.3 Reinforcement Learning with Java |
|
|
8 | (7) |
|
|
13 | (2) |
|
2 Basic Concepts of Reinforcement Learning |
|
|
15 | (8) |
|
|
16 | (2) |
|
2.2 The Policy of the Agent |
|
|
18 | (2) |
|
2.3 Evaluation of States and Actions (Q-Function, Bellman Equation) |
|
|
20 | (3) |
|
|
22 | (1) |
|
3 Optimal Decision-Making in a Known Environment |
|
|
23 | (24) |
|
|
25 | (11) |
|
3.1.1 Target-Oriented Condition Assessment ("Backward Induction") |
|
|
25 | (9) |
|
3.1.2 Policy-Based State Valuation (Reward Prediction) |
|
|
34 | (2) |
|
3.2 Iterative Policy Search |
|
|
36 | (7) |
|
3.2.1 Direct Policy Improvement |
|
|
37 | (1) |
|
3.2.2 Mutual Improvement of Policy and Value Function |
|
|
38 | (5) |
|
3.3 Optimal Policy in a Board Game Scenario |
|
|
43 | (3) |
|
|
46 | (1) |
|
|
46 | (1) |
|
4 Decision-Making and Learning in an Unknown Environment |
|
|
47 | (76) |
|
4.1 Exploration vs. Exploitation |
|
|
49 | (2) |
|
4.2 Retroactive Processing of Experience ("Model-Free Reinforcement Learning") |
|
|
51 | (45) |
|
4.2.1 Goal-Oriented Learning ("Value-Based") |
|
|
51 | (15) |
|
|
66 | (18) |
|
4.2.3 Combined Methods (Actor-Critic) |
|
|
84 | (12) |
|
4.3 Exploration with Predictive Simulations ("Model-Based Reinforcement Learning") |
|
|
96 | (24) |
|
|
97 | (4) |
|
4.3.2 Monte Carlo Rollout |
|
|
101 | (6) |
|
4.3.3 Artificial Curiosity |
|
|
107 | (4) |
|
4.3.4 Monte Carlo Tree Search (MCTS) |
|
|
111 | (7) |
|
4.3.5 Remarks on the Concept of Intelligence |
|
|
118 | (2) |
|
4.4 Systematics of the Learning Methods |
|
|
120 | (3) |
|
|
121 | (2) |
|
5 Artificial Neural Networks as Estimators for State Values and the Action Selection |
|
|
123 | (52) |
|
5.1 Artificial Neural Networks |
|
|
125 | (27) |
|
5.1.1 Pattern Recognition with the Perceptron |
|
|
128 | (3) |
|
5.1.2 The Adaptability of Artificial Neural Networks |
|
|
131 | (15) |
|
5.1.3 Backpropagation Learning |
|
|
146 | (3) |
|
5.1.4 Regression with Multilayer Perceptrons |
|
|
149 | (3) |
|
5.2 State Evaluation with Generalizing Approximations |
|
|
152 | (11) |
|
5.3 Neural Estimators for Action Selection |
|
|
163 | (12) |
|
5.3.1 Policy Gradient with Neural Networks |
|
|
163 | (2) |
|
5.3.2 Proximal Policy Optimization |
|
|
165 | (4) |
|
5.3.3 Evolutionary Strategy with a Neural Policy |
|
|
169 | (4) |
|
|
173 | (2) |
|
6 Guiding Ideas in Artificial Intelligence over Time |
|
|
175 | (9) |
|
6.1 Changing Guiding Ideas |
|
|
176 | (5) |
|
6.2 On the Relationship Between Humans and Artificial Intelligence |
|
|
181 | (3) |
Bibliography |
|
184 | |