|
1 Principle of Adaptive Dynamic Programming |
|
|
1 | (18) |
|
|
1 | (2) |
|
1.1.1 Discrete-Time Systems |
|
|
1 | (1) |
|
1.1.2 Continuous-Time Systems |
|
|
2 | (1) |
|
1.2 Original Forms of Adaptive Dynamic Programming |
|
|
3 | (6) |
|
1.2.1 Principle of Adaptive Dynamic Programming |
|
|
4 | (5) |
|
1.3 Iterative Forms of Adaptive Dynamic Programming |
|
|
9 | (2) |
|
|
9 | (1) |
|
|
10 | (1) |
|
|
11 | (3) |
|
|
14 | (5) |
|
2 An Iterative e-Optimal Control Scheme for a Class of Discrete-Time Nonlinear Systems with Unfixed Initial State |
|
|
19 | (28) |
|
|
19 | (1) |
|
|
20 | (1) |
|
2.3 Properties of the Iterative Adaptive Dynamic Programming Algorithm |
|
|
21 | (7) |
|
2.3.1 Derivation of the Iterative ADP Algorithm |
|
|
21 | (2) |
|
2.3.2 Properties of the Iterative ADP Algorithm |
|
|
23 | (5) |
|
2.4 The e-Optimal Control Algorithm |
|
|
28 | (9) |
|
2.4.1 The Derivation of the e-Optimal Control Algorithm |
|
|
28 | (4) |
|
2.4.2 Properties of the e-Optimal Control Algorithm |
|
|
32 | (2) |
|
2.4.3 The e-Optimal Control Algorithm for Unfixed Initial State |
|
|
34 | (3) |
|
2.4.4 The Expressions of the e-Optimal Control Algorithm |
|
|
37 | (1) |
|
2.5 Neural Network Implementation for the e-Optimal Control Scheme |
|
|
37 | (3) |
|
|
38 | (1) |
|
|
39 | (1) |
|
|
40 | (2) |
|
|
42 | (1) |
|
|
43 | (4) |
|
3 Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based g-Learning |
|
|
47 | (38) |
|
|
47 | (2) |
|
3.2 Preliminaries and Assumptions |
|
|
49 | (3) |
|
3.2.1 Problem Formulations |
|
|
49 | (1) |
|
3.2.2 Derivation of the Discrete-Time Q-Learning Algorithm |
|
|
50 | (2) |
|
3.3 Properties of the Discrete-Time Q-Learning Algorithm |
|
|
52 | (12) |
|
|
52 | (7) |
|
|
59 | (5) |
|
3.4 Neural Network Implementation for the Discrete-Time Q-Learning Algorithm |
|
|
64 | (6) |
|
|
65 | (2) |
|
|
67 | (2) |
|
|
69 | (1) |
|
|
70 | (11) |
|
|
70 | (6) |
|
|
76 | (5) |
|
|
81 | (1) |
|
|
82 | (3) |
|
4 A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems |
|
|
85 | (26) |
|
|
85 | (1) |
|
|
86 | (1) |
|
4.3 Policy Iteration-Based Deterministic Q-Learning Algorithm for Discrete-Time Nonlinear Systems |
|
|
87 | (6) |
|
4.3.1 Derivation of the Policy Iteration-Based Deterministic Q-Learning Algorithm |
|
|
87 | (2) |
|
4.3.2 Properties of the Policy Iteration-Based Deterministic Q-Learning Algorithm |
|
|
89 | (4) |
|
4.4 Neural Network Implementation for the Policy Iteration-Based Deterministic Q-Learning Algorithm |
|
|
93 | (4) |
|
|
93 | (2) |
|
|
95 | (1) |
|
4.4.3 Summary of the Policy Iteration-Based Deterministic g-Learning Algorithm |
|
|
96 | (1) |
|
|
97 | (10) |
|
|
97 | (3) |
|
|
100 | (7) |
|
|
107 | (1) |
|
|
107 | (4) |
|
5 Nonlinear Neuro-Optimal Tracking Control via Stable Iterative Q-Learning Algorithm |
|
|
111 | (22) |
|
|
111 | (1) |
|
|
112 | (2) |
|
5.3 Policy Iteration Q-Learning Algorithm for Optimal Tracking Control |
|
|
114 | (1) |
|
5.4 Properties of the Policy Iteration Q-Learning Algorithm |
|
|
114 | (5) |
|
5.5 Neural Network Implementation for the Policy Iteration Q-Learning Algorithm |
|
|
119 | (2) |
|
|
120 | (1) |
|
|
120 | (1) |
|
|
121 | (8) |
|
|
122 | (3) |
|
|
125 | (4) |
|
|
129 | (1) |
|
|
129 | (4) |
|
6 Model-Free Multiobjective Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems with General Performance Index Functions |
|
|
133 | (26) |
|
|
133 | (1) |
|
|
134 | (1) |
|
6.3 Multiobjective Adaptive Dynamic Programming Method |
|
|
135 | (10) |
|
6.4 Model-Free Incremental Q-Learning Method |
|
|
145 | (2) |
|
6.4.1 Derivation of the Incremental Q-Learning Method |
|
|
145 | (2) |
|
6.5 Neural Network Implementation for the Incremental Q-Learning Method |
|
|
147 | (3) |
|
|
148 | (1) |
|
|
149 | (1) |
|
6.5.3 The Procedure of the Model-Free Incremental Q-learning Method |
|
|
150 | (1) |
|
|
150 | (3) |
|
|
153 | (4) |
|
|
153 | (2) |
|
|
155 | (2) |
|
|
157 | (1) |
|
|
157 | (2) |
|
7 Multiobjective Optimal Control for a Class of Unknown Nonlinear Systems Based on Finite-Approximation-Error ADP Algorithm |
|
|
159 | (26) |
|
|
159 | (1) |
|
|
160 | (2) |
|
7.3 Optimal Solution Based on Finite-Approximation-Error ADP |
|
|
162 | (11) |
|
7.3.1 Data-Based Identifier of Unknown System Dynamics |
|
|
162 | (4) |
|
7.3.2 Derivation of the ADP Algorithm with Finite Approximation Errors |
|
|
166 | (2) |
|
7.3.3 Convergence Analysis of the Iterative ADP Algorithm |
|
|
168 | (5) |
|
7.4 Implementation of the Iterative ADP Algorithm |
|
|
173 | (2) |
|
|
174 | (1) |
|
|
174 | (1) |
|
7.4.3 The Procedure of the ADP Algorithm |
|
|
175 | (1) |
|
|
175 | (7) |
|
|
176 | (3) |
|
|
179 | (3) |
|
|
182 | (1) |
|
|
182 | (3) |
|
8 A New Approach for a Class of Continuous-Time Chaotic Systems Optimal Control by Online ADP Algorithm |
|
|
185 | (16) |
|
|
185 | (1) |
|
|
185 | (2) |
|
8.3 Optimal Control Based on Online ADP Algorithm |
|
|
187 | (8) |
|
8.3.1 Design Method of the Critic Network and the Action Network |
|
|
188 | (3) |
|
|
191 | (4) |
|
8.3.3 Online ADP Algorithm Implementation |
|
|
195 | (1) |
|
|
195 | (4) |
|
|
196 | (1) |
|
|
197 | (2) |
|
|
199 | (1) |
|
|
200 | (1) |
|
9 Off-Policy IRL Optimal Tracking Control for Continuous-Time Chaotic Systems |
|
|
201 | (14) |
|
|
201 | (1) |
|
9.2 System Description and Problem Statement |
|
|
201 | (2) |
|
9.3 Off-Policy IRL ADP Algorithm |
|
|
203 | (6) |
|
9.3.1 Convergence Analysis of IRL ADP Algorithm |
|
|
204 | (2) |
|
9.3.2 Off-Policy IRL Method |
|
|
206 | (2) |
|
9.3.3 Methods for Updating Weights |
|
|
208 | (1) |
|
|
209 | (4) |
|
|
209 | (2) |
|
|
211 | (2) |
|
|
213 | (1) |
|
|
213 | (2) |
|
10 ADP-Based Optimal Sensor Scheduling for Target Tracking in Energy Harvesting Wireless Sensor Networks |
|
|
215 | (1) |
|
|
215 | (1) |
|
|
216 | (3) |
|
10.2.1 NN Model Description of Solar Energy Harvesting |
|
|
216 | (1) |
|
10.2.2 Sensor Energy Consumption |
|
|
217 | (1) |
|
|
218 | (1) |
|
10.3 ADP-Based Sensor Scheduling for Maximum WSNs Residual Energy and Minimum Measuring Accuracy |
|
|
219 | (5) |
|
10.3.1 Optimization Problem of the Sensor Scheduling |
|
|
219 | (1) |
|
10.3.2 ADP-Based Sensor Scheduling with Convergence Analysis |
|
|
220 | (3) |
|
|
223 | (1) |
|
10.3.4 Implementation Process |
|
|
224 | (1) |
|
|
224 | (2) |
|
|
226 | (1) |
|
|
227 | |
Erratum to: Self-Learning Optimal Control of Nonlinear Systems |
|
1 | (228) |
Index |
|
229 | |