Muutke küpsiste eelistusi

Model-Based Reinforcement Learning: From Data to Continuous Actions with a Python-based Toolbox [Kõva köide]

(University of Tabriz, Iran; University of Waterloo, Canada), (University of Waterloo, Canada), Series edited by
"Whilst reinforcement learning has gained tremendous success and popularity in recent years, most research papers and books focus on either the theory (optimal control and dynamic programming) or the algorithms (mostly simulation-based). From a control systems perspective, this book will provide a model-based framework that bridges these two aspects to provide a holistic treatment of the topic of model-based online learning control. The aim is to develop a model-based framework for data-driven control that encompasses the topics of systems identification from data, model-based reinforcement learning and optimal control, and their applications. This will be done through reviewing the classical results in system identification from a new perspective to develop more efficient reinforcement learning techniques. Hence, the focus of this book will be on presenting an end to end framework from design to application of a more tractable model-based reinforcement learning technique. The tutorial aspects of the book are enhanced by the provision of a Python-based toolbox, accessible online"--

Model-Based Reinforcement Learning

Explore a comprehensive and practical approach to reinforcement learning

Reinforcement learning is an essential paradigm of machine learning, wherein an intelligent agent performs actions that ensure optimal behavior from devices. While this paradigm of machine learning has gained tremendous success and popularity in recent years, previous scholarship has focused either on theory—optimal control and dynamic programming – or on algorithms—most of which are simulation-based.

Model-Based Reinforcement Learning provides a model-based framework to bridge these two aspects, thereby creating a holistic treatment of the topic of model-based online learning control. In doing so, the authors seek to develop a model-based framework for data-driven control that bridges the topics of systems identification from data, model-based reinforcement learning, and optimal control, as well as the applications of each. This new technique for assessing classical results will allow for a more efficient reinforcement learning system. At its heart, this book is focused on providing an end-to-end framework—from design to application—of a more tractable model-based reinforcement learning technique.

Model-Based Reinforcement Learning readers will also find:

  • A useful textbook to use in graduate courses on data-driven and learning-based control that emphasizes modeling and control of dynamical systems from data
  • Detailed comparisons of the impact of different techniques, such as basic linear quadratic controller, learning-based model predictive control, model-free reinforcement learning, and structured online learning
  • Applications and case studies on ground vehicles with nonholonomic dynamics and another on quadrator helicopters
  • An online, Python-based toolbox that accompanies the contents covered in the book, as well as the necessary code and data

Model-Based Reinforcement Learning is a useful reference for senior undergraduate students, graduate students, research assistants, professors, process control engineers, and roboticists.

About the Authors xi
Preface xiii
Acronyms xv
Introduction xvii
1 Nonlinear Systems Analysis
1(10)
1.1 Notation
1(1)
1.2 Nonlinear Dynamical Systems
2(1)
1.2.1 Remarks on Existence, Uniqueness, and Continuation of Solutions
2(1)
1.3 Lyapunov Analysis of Stability
3(4)
1.4 Stability Analysis of Discrete Time Dynamical Systems
7(3)
1.5 Summary
10(1)
Bibliography
10(1)
2 Optimal Control
11(22)
2.1 Problem Formulation
11(1)
2.2 Dynamic Programming
12(6)
2.2.1 Principle of Optimality
12(2)
2.2.2 Hamilton--Jacobi--Bellman Equation
14(1)
2.2.3 A Sufficient Condition for Optimality
15(1)
2.2.4 Infinite-Horizon Problems
16(2)
2.3 Linear Quadratic Regulator
18(12)
2.3.1 Differential Riccati Equation
18(5)
2.3.2 Algebraic Riccati Equation
23(3)
2.3.3 Convergence of Solutions to the Differential Riccati Equation
26(2)
2.3.4 Forward Propagation of the Differential Riccati Equation for Linear Quadratic Regulator
28(2)
2.4 Summary
30(3)
Bibliography
30(3)
3 Reinforcement Learning
33(18)
3.1 Control-Affine Systems with Quadratic Costs
33(2)
3.2 Exact Policy Iteration
35(6)
3.2.1 Linear Quadratic Regulator
39(2)
3.3 Policy Iteration with Unknown Dynamics and Function Approximations
41(6)
3.3.1 Linear Quadratic Regulator with Unknown Dynamics
46(1)
3.4 Summary
47(4)
Bibliography
48(3)
4 Learning of Dynamic Models
51(26)
4.1 Introduction
51(1)
4.1.1 Autonomous Systems
51(1)
4.1.2 Control Systems
51(1)
4.2 Model Selection
52(2)
4.2.1 Gray-Box vs. Black-Box
52(1)
4.2.2 Parametric vs. Nonparametric
52(2)
4.3 Parametric Model
54(2)
4.3.1 Model in Terms of Bases
54(1)
4.3.2 Data Collection
55(1)
4.3.3 Learning of Control Systems
55(1)
4.4 Parametric Learning Algorithms
56(4)
4.4.1 Least Squares
56(1)
4.4.2 Recursive Least Squares
57(2)
4.4.3 Gradient Descent
59(1)
4.4.4 Sparse Regression
60(1)
4.5 Persistence of Excitation
60(1)
4.6 Python Toolbox
61(3)
4.6.1 Configurations
62(1)
4.6.2 Model Update
62(1)
4.6.3 Model Validation
63(1)
4.7 Comparison Results
64(9)
4.7.1 Convergence of Parameters
65(2)
4.7.2 Error Analysis
67(2)
4.7.3 Runtime Results
69(4)
4.8 Summary
73(4)
Bibliography
75(2)
5 Structured Online Learning-Based Control of Continuous-Time Nonlinear Systems
77(26)
5.1 Introduction
77(1)
5.2 A Structured Approximate Optimal Control Framework
77(4)
5.3 Local Stability and Optimality Analysis
81(2)
5.3.1 Linear Quadratic Regulator
81(1)
5.3.2 SOL Control
82(1)
5.4 SOL Algorithm
83(4)
5.4.1 ODE Solver and Control Update
84(1)
5.4.2 Identified Model Update
85(1)
5.4.3 Database Update
85(1)
5.4.4 Limitations and Implementation Considerations
86(1)
5.4.5 Asymptotic Convergence with Approximate Dynamics
87(1)
5.5 Simulation Results
87(12)
5.5.1 Systems Identifiable in Terms of a Given Set of Bases
88(3)
5.5.2 Systems to Be Approximated by a Given Set of Bases
91(7)
5.5.3 Comparison Results
98(1)
5.6 Summary
99(4)
Bibliography
99(4)
6 A Structured Online Learning Approach to Nonlinear Tracking with Unknown Dynamics
103(18)
6.1 Introduction
103(1)
6.2 A Structured Online Learning for Tracking Control
104(7)
6.2.1 Stability and Optimality in the Linear Case
108(3)
6.3 Learning-based Tracking Control Using SOL
111(1)
6.4 Simulation Results
112(3)
6.4.1 Tracking Control of the Pendulum
113(1)
6.4.2 Synchronization of Chaotic Lorenz System
114(1)
6.5 Summary
115(6)
Bibliography
118(3)
7 Piecewise Learning and Control with Stability Guarantees
121(26)
7.1 Introduction
121(1)
7.2 Problem Formulation
122(1)
7.3 The Piecewise Learning and Control Framework
122(3)
7.3.1 System Identification
123(1)
7.3.2 Database
124(1)
7.3.3 Feedback Control
125(1)
7.4 Analysis of Uncertainty Bounds
125(4)
7.4.1 Quadratic Programs for Bounding Errors
126(3)
7.5 Stability Verification for Piecewise-Affine Learning and Control
129(5)
7.5.1 Piecewise Affine Models
129(1)
7.5.2 MIQP-based Stability Verification of PWA Systems
130(3)
7.5.3 Convergence of ACCPM
133(1)
7.6 Numerical Results
134(8)
7.6.1 Pendulum System
134(4)
7.6.2 Dynamic Vehicle System with Skidding
138(2)
7.6.3 Comparison of Runtime Results
140(2)
7.7 Summary
142(5)
Bibliography
143(4)
8 An Application to Solar Photovoltaic Systems
147(40)
8.1 Introduction
147(3)
8.2 Problem Statement
150(4)
8.2.1 PV Array Model
151(1)
8.2.2 DC-DC Boost Converter
152(2)
8.3 Optimal Control of PV Array
154(11)
8.3.1 Maximum Power Point Tracking Control
156(6)
8.3.2 Reference Voltage Tracking Control
162(2)
8.3.3 Piecewise Learning Control
164(1)
8.4 Application Considerations
165(5)
8.4.1 Partial Derivative Approximation Procedure
165(2)
8.4.2 Partial Shading Effect
167(3)
8.5 Simulation Results
170(12)
8.5.1 Model and Control Verification
173(1)
8.5.2 Comparative Results
174(2)
8.5.3 Model-Free Approach Results
176(2)
8.5.4 Piecewise Learning Results
178(1)
8.5.5 Partial Shading Results
179(3)
8.6 Summary
182(5)
Bibliography
182(5)
9 An Application to Low-level Control of Quadrotors
187(18)
9.1 Introduction
187(2)
9.2 Quadrotor Model
189(1)
9.3 Structured Online Learning with RLS Identifier on Quadrotor
190(7)
9.3.1 Learning Procedure
191(4)
9.3.2 Asymptotic Convergence with Uncertain Dynamics
195(1)
9.3.3 Computational Properties
195(2)
9.4 Numerical Results
197(4)
9.5 Summary
201(4)
Bibliography
201(4)
10 Python Toolbox
205(10)
10.1 Overview
205(1)
10.2 User Inputs
205(2)
10.2.1 Process
206(1)
10.2.2 Objective
207(1)
10.3 SOL
207(4)
10.3.1 Model Update
208(1)
10.3.2 Database
208(2)
10.3.3 Library
210(1)
10.3.4 Control
210(1)
10.4 Display and Outputs
211(3)
10.4.1 Graphs and Printouts
213(1)
10.4.2 3D Simulation
213(1)
10.5 Summary
214(1)
Bibliography
214(1)
A Appendix
215(8)
A.1 Supplementary Analysis of Remark 5.4
215(7)
A.2 Supplementary Analysis of Remark 5.5
222(1)
Index 223
Milad Farsi received the B.S. degree in Electrical Engineering (Electronics) from the University of Tabriz in 2010. He obtained his M.S. degree also in Electrical Engineering (Control Systems) from the Sahand University of Technology in 2013. Moreover, he gained industrial experience as a Control System Engineer between 2012 and 2016. Later, he acquired the Ph.D. degree in Applied Mathematics from the University of Waterloo, Canada, in 2022, and he is currently a Postdoctoral Fellow at the same institution. His research interests include control systems, reinforcement learning, and their applications in robotics and power electronics.

Jun Liu received the Ph.D. degree in Applied Mathematics from the University of Waterloo, Canada, in 2010. He is currently an Associate Professor of Applied Mathematics and a Canada Research Chair in Hybrid Systems and Control at the University of Waterloo, Canada, where he directs the Hybrid Systems Laboratory. From 2012 to 2015, he was a Lecturer in Control and Systems Engineering at the University of Sheffield. During 2011 and 2012, he was a Postdoctoral Scholar in Control and Dynamical Systems at the California Institute of Technology. His main research interests are in the theory and applications of hybrid systems and control, including rigorous computational methods for control design with applications in cyber-physical systems and robotics.