Preface |
|
xix | |
|
Chapter 1 Deep Neural Networks |
|
|
1 | (44) |
|
1.1 Three Types Of Neural Networks |
|
|
1 | (16) |
|
1.1.1 Multilayer Feedforward Neural Networks |
|
|
1 | (1) |
|
1.1.1.1 Architecture of Feedforward Neural Networks |
|
|
1 | (2) |
|
1.1.1.2 Loss Function and Training Algorithms |
|
|
3 | (2) |
|
1.1.2 Convolutional Neural Network |
|
|
5 | (1) |
|
|
5 | (2) |
|
1.1.2.2 Nonlinearity (ReLU) |
|
|
7 | (1) |
|
|
7 | (1) |
|
1.1.2.4 Fully Connected Layers |
|
|
7 | (1) |
|
1.1.3 Recurrent Neural Networks |
|
|
7 | (1) |
|
|
8 | (4) |
|
1.1.3.2 Gated Recurrent Units |
|
|
12 | (1) |
|
1.1.3.3 Long Short-Term Memory (LSTM) |
|
|
13 | (1) |
|
1.1.3.4 Applications of RNN to Modeling and Forecasting of Dynamic Systems |
|
|
14 | (1) |
|
1.1.3.5 Recurrent State Space Models with Autonomous Adjusted Intervention Variable |
|
|
15 | (2) |
|
1.2 Dynamic Approach To Deep Learning |
|
|
17 | (6) |
|
1.2.1 Differential Equations for Neural Networks |
|
|
17 | (1) |
|
1.2.2 Ordinary Differential Equations for ResNets |
|
|
17 | (1) |
|
1.2.3 Ordinary Differential Equations for Reversible Neural Networks |
|
|
18 | (1) |
|
1.2.3.1 Stability of Dynamic Systems |
|
|
18 | (1) |
|
1.2.3.2 Second Method of Lyapunov |
|
|
19 | (2) |
|
1.2.3.3 Lyapunov Exponent |
|
|
21 | (1) |
|
1.2.3.4 Reversible ResNet |
|
|
21 | (1) |
|
1.2.3.5 Residual Generative Adversarial Networks |
|
|
22 | (1) |
|
1.2.3.6 Normalizing Flows |
|
|
23 | (1) |
|
1.3 Optimal Control For Deep Learning |
|
|
23 | (7) |
|
1.3.1 Mathematic Formulation of Optimal Control |
|
|
23 | (1) |
|
1.3.2 Pontryagin's Maximum Principle |
|
|
24 | (1) |
|
1.3.3 Optimal Control Approach to Parameter Estimation |
|
|
25 | (1) |
|
1.3.4 Learning Nonlinear State Space Models |
|
|
26 | (1) |
|
1.3.4.1 Joint Estimation of Parameters and Controls |
|
|
26 | (2) |
|
1.3.4.2 Multiple Samples and Parameter Estimation |
|
|
28 | (1) |
|
1.3.4.3 Optimal Control Problem |
|
|
29 | (1) |
|
|
30 | (1) |
|
Appendix 1A Brief Introduction Of Tensor Calculus |
|
|
30 | (7) |
|
|
30 | (5) |
|
|
35 | (2) |
|
Appendix 1B Calculate Gradient Of Cross Entropy Loss Function |
|
|
37 | (1) |
|
Appendix 1C Optimal Control And Pontryagin's Maximum Principle |
|
|
38 | (7) |
|
|
38 | (1) |
|
1C2 Pontryagin's Maximum Principle |
|
|
38 | (1) |
|
1C3 Calculus of Variation |
|
|
39 | (2) |
|
1C4 Proof of Pontryagin's Maximum Principle |
|
|
41 | (2) |
|
|
43 | (2) |
|
Chapter 2 Gaussian Processes and Learning Dynamic for Wide Neural Networks |
|
|
45 | (18) |
|
|
45 | (1) |
|
2.2 Linear Models For Learning In Neural Networks |
|
|
45 | (3) |
|
2.2.1 Notation and Mathematic Formulation of Dynamics of Parameter Estimation Process |
|
|
45 | (2) |
|
2.2.2 Linearized Neural Networks |
|
|
47 | (1) |
|
|
48 | (4) |
|
|
48 | (1) |
|
2.3.2 Gaussian Process Models |
|
|
49 | (2) |
|
2.3.3 Gaussian Processes for Regression |
|
|
51 | (1) |
|
2.3.3.1 Prediction with Noise-Free Observations |
|
|
51 | (1) |
|
2.3.3.2 Prediction with Noise Observations |
|
|
51 | (1) |
|
2.4 Wide Neural Network As A Gaussian Process |
|
|
52 | (3) |
|
2.4.1 Gaussian Process for Single-Layer Neural Networks |
|
|
52 | (1) |
|
2.4.2 Gaussian Process for Multilayer Neural Networks |
|
|
53 | (2) |
|
Appendix 2A Recursive Formula For Ntk Calculation |
|
|
55 | (6) |
|
Appendix 2B Analytic Formula For Parameter Estimation In The Linearized Neural Networks |
|
|
61 | (2) |
|
|
61 | (2) |
|
Chapter 3 Deep Generative Models |
|
|
63 | (46) |
|
3.1 Variational Inference |
|
|
63 | (13) |
|
|
63 | (1) |
|
3.1.2 Variational Inference as Optimization |
|
|
63 | (1) |
|
3.1.3 Variational Bound and Variational Objective |
|
|
64 | (1) |
|
3.1.4 Mean-Field Variational Inference |
|
|
65 | (1) |
|
3.1.4.1 A General Framework |
|
|
65 | (1) |
|
3.1.4.2 Bayesian Mixture of Gaussians |
|
|
65 | (4) |
|
3.1.4.3 Mean-Field Variational Inference with Exponential Family |
|
|
69 | (3) |
|
3.1.5 Stochastic Variational Inference |
|
|
72 | (1) |
|
3.1.5.1 Natural Gradient Decent |
|
|
72 | (2) |
|
3.1.5.2 Revisit Variational Distribution for Exponential Family |
|
|
74 | (2) |
|
3.2 Variational Autoencoder |
|
|
76 | (8) |
|
|
76 | (1) |
|
3.2.2 Deep Latent Variable Models and Intractability of Likelihood Function |
|
|
76 | (1) |
|
3.2.3 Approximate Techniques and Recognition Model |
|
|
77 | (1) |
|
|
78 | (1) |
|
3.2.5 Optimization of the ELBO and Stochastic Gradient Method |
|
|
79 | (1) |
|
3.2.6 Reparameterization Trick |
|
|
79 | (1) |
|
3.2.7 Gradient of Expectation and Gradient of ELBO |
|
|
80 | (1) |
|
3.2.8 Bernoulli Generative Model |
|
|
80 | (1) |
|
3.2.9 Factorized Gaussian Encoder |
|
|
81 | (1) |
|
3.2.10 Full Gaussian Encoder |
|
|
82 | (1) |
|
3.2.11 Algorithms for Computing ELBO |
|
|
82 | (1) |
|
3.2.12 Improve the Lower Bound |
|
|
83 | (1) |
|
3.2.12.1 Importance Weighted Autoencoder |
|
|
83 | (1) |
|
3.2.12.2 Connection between ELBO and KL Distance |
|
|
83 | (1) |
|
3.3 Other Types Of Variational Autoencoder |
|
|
84 | (13) |
|
3.3.1 Convolutional Variational Autoencoder |
|
|
84 | (1) |
|
|
84 | (1) |
|
|
85 | (1) |
|
|
85 | (1) |
|
3.3.2 Graphic Convolutional Variational Autoencoder |
|
|
85 | (1) |
|
3.3.2.1 Notation and Basic Concepts for Graph Autoencoder |
|
|
86 | (1) |
|
3.3.2.2 Spectral-Based Convolutional Graph Neural Networks |
|
|
86 | (5) |
|
3.3.2.3 Graph Convolutional Encoder |
|
|
91 | (1) |
|
3.3.2.4 Graph Convolutional Decoder |
|
|
92 | (1) |
|
|
92 | (1) |
|
3.3.2.6 A Typical Approach to Variational Graph Autoencoders |
|
|
92 | (2) |
|
3.3.2.7 Directed Graph Variational Autoencoder |
|
|
94 | (2) |
|
3.3.2.8 Graph VAE for Clustering |
|
|
96 | (1) |
|
|
97 | (1) |
|
|
97 | (1) |
|
Appendix 3B Derivation Of Algorithms For Variational Graph Autoencoders |
|
|
97 | (5) |
|
3B1 Evidence of Lower Bound |
|
|
97 | (1) |
|
3B2 The Reparameterization Trick |
|
|
98 | (1) |
|
3B3 Stochastic Gradient Variational Bayes (SGVB) Estimator |
|
|
99 | (1) |
|
3B4 Neural Network Implementation |
|
|
100 | (2) |
|
Appendix 3C Matrix Normal Distribution |
|
|
102 | (7) |
|
3C1 Notations and Definitions |
|
|
102 | (2) |
|
3C2 Properties of Matrix Normal Distribution |
|
|
104 | (2) |
|
|
106 | (3) |
|
Chapter 4 Generative Adversarial Networks |
|
|
109 | (42) |
|
|
109 | (1) |
|
4.2 Generative Adversarial Networks |
|
|
109 | (8) |
|
4.2.1 Framework and Architecture of GAN |
|
|
109 | (1) |
|
|
110 | (1) |
|
|
111 | (1) |
|
|
112 | (1) |
|
|
113 | (1) |
|
4.2.5.1 Different Distances |
|
|
113 | (1) |
|
4.2.5.2 The Kantorovich-Rubinstein Duality |
|
|
114 | (2) |
|
|
116 | (1) |
|
|
117 | (21) |
|
|
117 | (1) |
|
|
117 | (1) |
|
|
118 | (1) |
|
4.3.2 Adversarial Autoencoder and Bidirectional GAN |
|
|
119 | (1) |
|
4.3.2.1 Adversarial Autoencoder (AAE) |
|
|
119 | (1) |
|
4.3.2.2 Bidirectional GAN |
|
|
119 | (1) |
|
4.3.2.3 Anomaly Detection by BiGAN |
|
|
120 | (1) |
|
4.3.3 Graph Representation in GAN |
|
|
121 | (1) |
|
4.3.3.1 Adversarially Regularized Graph Autoencoder |
|
|
121 | (5) |
|
4.3.3.2 Cycle-Consistent Adversarial Networks |
|
|
126 | (1) |
|
4.3.3.3 Conditional Variational Autoencoder and Conditional Generative Adversarial Networks |
|
|
127 | (4) |
|
4.3.3.4 Integrated Conditional Graph Variational Adversarial Networks |
|
|
131 | (3) |
|
4.3.4 Deep Convolutional Generative Adversarial Network |
|
|
134 | (1) |
|
4.3.4.1 Architecture of DCGAN |
|
|
134 | (1) |
|
|
135 | (1) |
|
4.3.4.3 Discriminator Network |
|
|
136 | (1) |
|
|
136 | (2) |
|
4.4 Generative Implicit Networks For Causal Inference With Measured And Unmeasured Confounders |
|
|
138 | (13) |
|
4.4.1 Generative Implicit Models |
|
|
138 | (1) |
|
|
139 | (1) |
|
|
139 | (1) |
|
4.4.2.2 Loss Function for the Generative Implicit Models |
|
|
140 | (1) |
|
4.4.3 Divergence Minimization |
|
|
141 | (4) |
|
4.4.4 Lower Bound of the f-Divergence |
|
|
145 | (1) |
|
4.4.4.1 Tighten Lower Bound of the f-Divergence |
|
|
145 | (1) |
|
4.4.5 Representation for the Variational Function |
|
|
146 | (1) |
|
4.4.6 Single-Step Gradient Method for Variational Divergence Minimization (VDM) |
|
|
147 | (1) |
|
4.4.7 Random Vector Functional Link Network for Pearson %2 Divergence |
|
|
147 | (2) |
|
|
149 | (1) |
|
|
149 | (2) |
|
Chapter 5 Deep Learning for Causal Inference |
|
|
151 | (58) |
|
5.1 Functional Additive Models For Causal Inference |
|
|
151 | (11) |
|
5.1.1 Correlation, Causation, and Do-Calculus |
|
|
151 | (1) |
|
5.1.2 The Rules of Do-Calculus |
|
|
152 | (3) |
|
5.1.3 Structural Equation Models and Additive Noise Models for Two or Two Sets of Variables |
|
|
155 | (2) |
|
5.1.4 VAE and ANMs for Causal Analysis |
|
|
157 | (1) |
|
5.1.4.1 Evidence Lower Bound (ELBO) for ANM |
|
|
157 | (1) |
|
5.1.4.2 Computation of the ELBO |
|
|
158 | (2) |
|
5.1.5 Classifier Two-Sample Test for Causation |
|
|
160 | (1) |
|
5.1.5.1 Procedures of the VCTEST (Figure 5.5) |
|
|
161 | (1) |
|
5.2 Learning Structural Causal Models With Graph Neural Networks |
|
|
162 | (13) |
|
5.2.1 A General Framework for Formulation of Causal Inference into Continuous Optimization |
|
|
162 | (1) |
|
5.2.1.1 Score Function and New Acyclic Constraint |
|
|
162 | (2) |
|
5.2.2 Parameter Estimation and Optimization |
|
|
164 | (1) |
|
5.2.2.1 Transform the Equality Constrained Optimization Problem into Unconstrained Optimization Problem |
|
|
164 | (2) |
|
5.2.2.2 Compact Representation for the Hessian Approximation Ek and Limited-Memory-BFGS |
|
|
166 | (1) |
|
5.2.3 VAE for Learning Structural Models and DAG among Observed Variables |
|
|
167 | (1) |
|
5.2.3.1 Linear Structure Equation Model and Graph Neural Network Model |
|
|
167 | (1) |
|
5.2.3.2 ELBO for Learning the Generative Model |
|
|
167 | (1) |
|
5.23.3 Computation of ELBO |
|
|
168 | (2) |
|
5.2.3.4 Optimization Formulation for Learning DAG |
|
|
170 | (2) |
|
5.2.4 Loss Function and Acyclicity Constraint |
|
|
172 | (1) |
|
5.2.4.1 OLS Loss Function |
|
|
172 | (1) |
|
5.2.4.2 A New Characterization of Acyclicity |
|
|
173 | (2) |
|
5.3 Latent Causal Structure |
|
|
175 | (4) |
|
5.3.1 Latent Space and Latent Representation |
|
|
175 | (1) |
|
5.3.2 Mapping Observed Variables to the Latent Space |
|
|
175 | (1) |
|
|
176 | (1) |
|
5.3.2.2 Encoder and Decoder for Latent Causal Graph |
|
|
176 | (1) |
|
5.3.3 ELBO for the Log-Likelihood log pθ(Y|X) |
|
|
177 | (1) |
|
5.3.4 Computation of ELBO |
|
|
178 | (1) |
|
|
178 | (1) |
|
|
178 | (1) |
|
5.3.4.3 Learning Latent Causal Graph |
|
|
179 | (1) |
|
5.3.5 Optimization for Learning the Latent DAG |
|
|
179 | (1) |
|
5.4 Causal Mediation Analysis |
|
|
179 | (4) |
|
5.4.1 Basics of Mediation Analysis |
|
|
180 | (1) |
|
5.4.1.1 Univariate Mediation Model |
|
|
180 | (1) |
|
5.4.1.2 Multivariate Mediation Analysis |
|
|
180 | (1) |
|
5.4.1.3 Cascade Unobserved Mediator Model |
|
|
181 | (1) |
|
5.4.1.4 Unobserved Multivariate Mediation Model |
|
|
181 | (1) |
|
5.4.2 VAE for Cascade Unobserved Mediator Model |
|
|
181 | (1) |
|
5.4.2.1 ELBO for Cascade Mediator Model |
|
|
181 | (1) |
|
5.4.2.2 Encoder and Decoder |
|
|
182 | (1) |
|
|
183 | (1) |
|
|
183 | (4) |
|
5.5.1 Deep Latent Variable Models for Causal Inference under Unobserved Confounders |
|
|
183 | (1) |
|
5.5.2 Treatment Effect Formulation for Causal Inference with Unobserved Confounder |
|
|
184 | (1) |
|
|
184 | (1) |
|
|
185 | (1) |
|
|
185 | (2) |
|
5.6 Instrumental Variable Models |
|
|
187 | (7) |
|
5.6.1 Simple Linear IV Regression and Mendelian Randomization |
|
|
187 | (2) |
|
5.6.1.1 Two-Stage Least Square Method |
|
|
189 | (1) |
|
5.6.1.2 Assumptions of IV |
|
|
190 | (1) |
|
5.6.2 IV and Deep Latent Variable Models |
|
|
190 | (1) |
|
|
190 | (2) |
|
|
192 | (1) |
|
|
192 | (1) |
|
|
193 | (1) |
|
Appendix 5A Derive Evidence Lower Bound (Elbo) For Anm |
|
|
194 | (1) |
|
Appendix 5B Approximation Of Evidence Lower Bound (Elbo) For Anm |
|
|
195 | (1) |
|
Appendix 5C Computation Of Kl Distance |
|
|
195 | (1) |
|
Appendix 5D Bfgs And Limited Bfgs Updating Algorithm |
|
|
196 | (5) |
|
Appendix 5E Nonsmooth Optimization Analysis |
|
|
201 | (1) |
|
Appendix 5F Computation Of Elbo For Learning Sems |
|
|
202 | (7) |
|
|
202 | (1) |
|
5F2 The Reparameterization Trick |
|
|
203 | (1) |
|
5F3 Stochastic Gradient Variational Bayes (SGVB) Estimator |
|
|
203 | (1) |
|
3F4 Neural Network Implementation |
|
|
204 | (3) |
|
|
207 | (2) |
|
Chapter 6 Causal Inference in Time Series |
|
|
209 | (38) |
|
|
209 | (1) |
|
6.2 Four Concepts Of Causality For Multiple Time Series |
|
|
209 | (2) |
|
|
209 | (1) |
|
|
210 | (1) |
|
6.2.3 Intervention Causality |
|
|
210 | (1) |
|
6.2.4 Structural Causality |
|
|
211 | (1) |
|
6.3 Statistical Methods For Granger Causality Inference In Time Series |
|
|
211 | (25) |
|
6.3.1 Bivariate Granger Causality Test |
|
|
211 | (1) |
|
6.3.1.1 Bivariate Linear Granger Causality Test |
|
|
211 | (1) |
|
6.3.1.2 Bivariate Nonlinear Causality Test |
|
|
212 | (2) |
|
6.3.2 Multivariate Granger Causality Test |
|
|
214 | (1) |
|
6.3.2.1 Multivariate Linear Granger Causality Test |
|
|
214 | (2) |
|
6.3.3 Nonstationary Time Series Granger Causal Analysis |
|
|
216 | (1) |
|
|
216 | (10) |
|
6.3.3.2 Multivariate Nonlinear Causality Test for Nonstationary Time Series |
|
|
226 | (4) |
|
6.3.4 Granger Causal Networks |
|
|
230 | (1) |
|
|
230 | (1) |
|
6.3.4.2 Architecture of Granger Causal Networks |
|
|
230 | (1) |
|
6.3.4.3 Component-Wise Multilayer Perceptron (cMPL) for Inferring Granger Causal Networks |
|
|
231 | (1) |
|
6.3.4.4 Component-Wise Recurrent Neural Networks (cRNNs) for Inferring Granger Causal Networks |
|
|
232 | (1) |
|
6.3.4.5 Statistical Recurrent Units for Inferring Granger Causal Networks |
|
|
233 | (3) |
|
6.4 Nonlinear Structural Equation Models For Causal Inference On Multivariate Time Series |
|
|
236 | (2) |
|
|
238 | (1) |
|
Appendix 6A Test Statistic Tnng Asymptotically Follows A Normal Distribution |
|
|
238 | (2) |
|
Appendix 6B Hsic-Based Tests For Independence Between Two Stationary Multivariate Time Series |
|
|
240 | (7) |
|
6B1 Reproducing Kernel Hilbert Space |
|
|
240 | (3) |
|
|
243 | (1) |
|
6B3 Cross-Covariance Operator |
|
|
244 | (1) |
|
6B4 The Hilbert-Schmidt Independence Criterion |
|
|
245 | (1) |
|
|
246 | (1) |
|
Chapter 7 Deep Learning for Counterfactual Inference and Treatment Effect Estimation |
|
|
247 | (46) |
|
|
247 | (9) |
|
7.1.1 Potential Outcome Framework and Counterfactual Causal Inference |
|
|
247 | (1) |
|
7.1.2 Assumptions and Average Treatment Effect |
|
|
248 | (3) |
|
7.1.3 Traditional Methods without Unobserved Confounders |
|
|
251 | (1) |
|
7.1.3.1 Regression Adjustment |
|
|
251 | (1) |
|
7.1.3.2 Propensity Score Methods |
|
|
251 | (1) |
|
7.1.3.3 Doubly Robust Estimation (DRE) and G-Methods |
|
|
252 | (3) |
|
7.1.3.4 Targeted Maximum Likelihood Estimator (TMLE) |
|
|
255 | (1) |
|
7.2 Combine Deep Learning With Classical Treatment Effect Estimation Methods |
|
|
256 | (2) |
|
7.2.1 Adaptive Learning for Treatment Effect Estimation |
|
|
256 | (1) |
|
7.2.1.1 Problem Formulation |
|
|
256 | (1) |
|
7.2.2 Architecture of Neural Networks |
|
|
256 | (1) |
|
7.2.3 Targeted Regularization |
|
|
257 | (1) |
|
7.3 Counterfactual Variational Autoencoder |
|
|
258 | (3) |
|
|
258 | (1) |
|
7.3.2 Variational Autoencoders |
|
|
259 | (1) |
|
|
259 | (1) |
|
|
259 | (1) |
|
7.3.3 Architecture of CFVAE |
|
|
259 | (1) |
|
|
260 | (1) |
|
|
260 | (1) |
|
|
260 | (1) |
|
7.3.4.3 Computation of the KL Distance |
|
|
260 | (1) |
|
7.3.4.4 Calculation of ELBO |
|
|
261 | (1) |
|
7.4 Variational Autoencoder For Survival Analysis |
|
|
261 | (8) |
|
|
261 | (1) |
|
7.4.2 Notations and Problem Formulation |
|
|
262 | (1) |
|
7.4.3 Classical Survival Analysis Theory |
|
|
262 | (1) |
|
7.4.4 Potential Outcome (Survival Time) and Censoring Time Distributions |
|
|
263 | (1) |
|
7.4.5 VAE Causal Survival Analysis |
|
|
264 | (1) |
|
7.4.5.1 Deep Latent Model |
|
|
264 | (1) |
|
|
264 | (1) |
|
|
265 | (1) |
|
|
265 | (1) |
|
7.4.5.5 Computation of the KL Distance |
|
|
265 | (1) |
|
7.4.5.6 Calculation of ELBO |
|
|
266 | (1) |
|
|
266 | (1) |
|
7.4.6 VAE-Cox Model for Survival Analysis |
|
|
267 | (1) |
|
|
267 | (1) |
|
7.4.6.2 Likelihood Estimation for the Cox Model |
|
|
267 | (1) |
|
7.4.6.3 A Censored-Data Likelihood |
|
|
268 | (1) |
|
7.4.6.4 Object Function for VAE-Cox Model |
|
|
269 | (1) |
|
7.5 Time Series Causal Survival Analysis |
|
|
269 | (3) |
|
|
269 | (1) |
|
7.5.2 Multi-State Survival Models |
|
|
269 | (1) |
|
7.5.2.1 Notations and Basic Concepts |
|
|
269 | (1) |
|
7.5.3 Multi-State Survival Models |
|
|
270 | (1) |
|
7.5.3.1 Transition Probabilities, the Kolmogorov Forward Equations and Likelihood Function |
|
|
270 | (1) |
|
7.5.3.2 Likelihood Function with Interval Censoring |
|
|
271 | (1) |
|
7.5.3.3 Ordinary Differential Equations (NODE) for Multi-State Survival Models |
|
|
271 | (1) |
|
7.6 Neural Ordinary Differential Equation Approach To Treatment Effect Estimation And Intervention Analysis |
|
|
272 | (6) |
|
|
272 | (1) |
|
7.6.2 Latent NODE for Irregularly-Sampled Time Series |
|
|
273 | (1) |
|
7.6.3 Augmented Counterfactual ODE for Effect Estimation of Time Series Interventions with Confounders |
|
|
274 | (1) |
|
7.6.3.1 Potential Outcome Framework for Estimation of Effect of Time Series Interventions |
|
|
275 | (1) |
|
7.6.3.2 Augmented Counterfactual Ordinary Differential Equations |
|
|
275 | (3) |
|
7.7 Generative Adversarial Networks For Counterfactual And Treatment Effect Estimation |
|
|
278 | (9) |
|
7.7.1 A General GAN Model for Estimation of ITE with Discrete Outcome and Any Type of Treatment |
|
|
279 | (1) |
|
7.7.1.1 Potential Framework |
|
|
279 | (1) |
|
7.7.1.2 Conditional GAN as a General Framework for Estimation of ITE |
|
|
280 | (2) |
|
7.7.2 Adversarial Variational Autoencoder-Generative Adversarial Network (AVAE-GAN) for Estimation in the Presence of Unmeasured Confounders |
|
|
282 | (1) |
|
7.7.2.1 Architecture of AVAE-GAN |
|
|
283 | (1) |
|
7.7.2.2 VAE with Disentangled Latent Factors |
|
|
283 | (4) |
|
|
287 | (1) |
|
Appendix 7A Derive Evidence Of Lower Bound |
|
|
287 | (1) |
|
Appendix 7B Derivation Of Kolmogorov Forward Equations |
|
|
287 | (1) |
|
Appendix 7C Inverse Relationship Of The Kolmogorov Backward Equation |
|
|
288 | (1) |
|
Appendix 7D Introduction To Pontryagin's Maximum Principle |
|
|
289 | (1) |
|
Appendix 7E Algorithm For Ite Block Optimization |
|
|
290 | (1) |
|
Appendix 7F Algorithms For Implementing Stochastic Gradient Decent |
|
|
291 | (2) |
|
|
291 | (2) |
|
Chapter 8 Reinforcement Learning and Causal Inference |
|
|
293 | (56) |
|
|
293 | (1) |
|
8.2 Basic Reinforcement Learning Theory |
|
|
293 | (15) |
|
8.2.1 Formalization of the Problem |
|
|
293 | (1) |
|
8.2.1.1 Markov Decision Process and Notation |
|
|
293 | (1) |
|
8.2.1.2 State-Value Function and Policy |
|
|
294 | (3) |
|
8.2.1.3 Optimal Value Functions and Policies |
|
|
297 | (1) |
|
8.2.1.4 Bellman Optimality Equation |
|
|
298 | (2) |
|
8.2.2 Dynamic Programming |
|
|
300 | (1) |
|
8.2.2.1 Policy Evaluation |
|
|
300 | (3) |
|
8.2.2.2 Value Function and Policy Improvement |
|
|
303 | (2) |
|
|
305 | (1) |
|
8.2.2.4 Monte Carlo Policy Evaluation |
|
|
306 | (1) |
|
8.2.2.5 Temporal-Difference Learning |
|
|
307 | (1) |
|
8.2.2.6 Comparisons: Dynamic Programming, Monte Carlo Methods, and Temporal Difference Methods |
|
|
308 | (1) |
|
8.3 Approximate Function And Approximate Dynamic Programming |
|
|
308 | (6) |
|
|
308 | (1) |
|
8.3.2 Linear Function Approximation |
|
|
309 | (1) |
|
8.3.3 Neural Network Approximation |
|
|
310 | (2) |
|
8.3.4 Value-Based Methods |
|
|
312 | (1) |
|
|
312 | (1) |
|
|
313 | (1) |
|
8.4 Policy Gradient Methods |
|
|
314 | (10) |
|
|
314 | (1) |
|
8.4.2 Policy Approximation |
|
|
314 | (3) |
|
8.4.3 Reinforce: Monte Carlo Policy Gradient |
|
|
317 | (1) |
|
8.4.4 Reinforce with Baseline |
|
|
317 | (1) |
|
8.4.5 Actor-Critic Methods |
|
|
318 | (1) |
|
8.4.6 ft-Step Temporal Difference (TD) |
|
|
319 | (1) |
|
8.4.6.1 n-Step Prediction |
|
|
319 | (1) |
|
|
320 | (2) |
|
8.4.8 Sarsa and Sarsa (A) |
|
|
322 | (1) |
|
|
323 | (1) |
|
8.4.10 Actor-Critic and Eligibility Trace |
|
|
324 | (1) |
|
8.5 Causal Inference And Reinforcement Learning |
|
|
324 | (10) |
|
8.5.1 Deconfounding Reinforcement Learning |
|
|
325 | (1) |
|
8.5.1.1 Adjust for Measured Confounders |
|
|
325 | (1) |
|
8.5.1.2 Proxy Variable Approximation to Unobserved Confounding |
|
|
326 | (1) |
|
8.5.1.3 Deep Latent Model for Identifying the Proxy Variables of Confounders |
|
|
326 | (1) |
|
8.5.1.4 Reward and Causal Effect Estimation |
|
|
327 | (1) |
|
8.5.1.5 Variational Autoencoder for Reinforcement Learning |
|
|
327 | (1) |
|
|
328 | (1) |
|
|
329 | (1) |
|
8.5.1.8 Deconfounding Causal Effect Estimation and Actor-Critic Methods |
|
|
330 | (1) |
|
8.5.2 Counterfactuals and Reinforcement Learning |
|
|
330 | (1) |
|
8.5.2.1 Structural Causal Model for Counterfactual Inference |
|
|
330 | (1) |
|
8.5.2.2 Bidirectional Conditional GAN (BiCoGAN) for Estimation of Causal Mechanism |
|
|
331 | (2) |
|
8.5.2.3 Dueling Double-Deep Q-Networks and Augmented Counterfactual Data for Reinforcement Learning |
|
|
333 | (1) |
|
8.6 Reinforcement Learning For Inferring Causal Networks |
|
|
334 | (11) |
|
|
334 | (1) |
|
8.6.2 Mathematic Formulation of Inferring Causal Networks Using Bidirectional Conditional GAN |
|
|
334 | (2) |
|
8.6.3 Framework of Reinforcement Learning for Combinatorial Optimization |
|
|
336 | (1) |
|
8.6.4 Graph Encoder and Decoder |
|
|
337 | (1) |
|
8.6.4.1 Mathematic Formulation of Graph Embedding |
|
|
337 | (1) |
|
|
337 | (1) |
|
8.6.4.3 Shallow Embedding Approaches |
|
|
338 | (2) |
|
8.6.4.4 Attention and Transformer for Combinatorial Optimization and Construction of Directed Acyclic Graph |
|
|
340 | (5) |
|
|
345 | (1) |
|
Appendix 8A Bidirectional Rnn For Encoding |
|
|
345 | (1) |
|
Appendix 8B Calculation Of Kl Divergence |
|
|
345 | (4) |
|
|
347 | (2) |
References |
|
349 | (14) |
Index |
|
363 | |