Foreword |
|
xi | |
Preface |
|
xiii | |
|
|
|
1 Introduction and Problem Formulation |
|
|
3 | (18) |
|
1.1 Machine Learning under Covariate Shift |
|
|
3 | (2) |
|
1.2 Quick Tour of Covariate Shift Adaptation |
|
|
5 | (2) |
|
|
7 | (7) |
|
1.3.1 Function Learning from Examples |
|
|
7 | (1) |
|
|
8 | (1) |
|
1.3.3 Generalization Error |
|
|
9 | (1) |
|
|
9 | (1) |
|
1.3.5 Models for Function Learning |
|
|
10 | (3) |
|
1.3.6 Specification of Models |
|
|
13 | (1) |
|
1.4 Structure of This Book |
|
|
14 | (7) |
|
1.4.1 Part II: Learning under Covariate Shift |
|
|
14 | (3) |
|
1.4.2 Part III: Learning Causing Covariate Shift |
|
|
17 | (4) |
|
II Learning Under Covariate Shift |
|
|
|
|
21 | (26) |
|
2.1 Importance-Weighting Techniques for Covariate Shift Adaptation |
|
|
22 | (3) |
|
2.1.1 Importance-Weighted ERM |
|
|
22 | (1) |
|
|
23 | (1) |
|
|
23 | (2) |
|
2.2 Examples of Importance-Weighted Regression Methods |
|
|
25 | (10) |
|
2.2.1 Squared Loss: Least-Squares Regression |
|
|
26 | (4) |
|
2.2.2 Absolute Loss: Least-Absolute Regression |
|
|
30 | (1) |
|
2.2.3 Huber Loss: Huber Regression |
|
|
31 | (2) |
|
2.2.4 Deadzone-Linear Loss: Support Vector Regression |
|
|
33 | (2) |
|
2.3 Examples of Importance-Weighted Classification Methods |
|
|
35 | (5) |
|
2.3.1 Squared Loss: Fisher Discriminant Analysis |
|
|
36 | (2) |
|
2.3.2 Logistic Loss: Logistic Regression Classifier |
|
|
38 | (1) |
|
2.3.3 Hinge Loss: Support Vector Machine |
|
|
39 | (1) |
|
2.3.4 Exponential Loss: Boosting |
|
|
40 | (1) |
|
|
40 | (5) |
|
|
40 | (1) |
|
|
41 | (4) |
|
2.5 Summary and Discussion |
|
|
45 | (2) |
|
|
47 | (26) |
|
3.1 Importance-Weighted Akaike Information Criterion |
|
|
47 | (3) |
|
3.2 Importance-Weighted Subspace Information Criterion |
|
|
50 | (14) |
|
3.2.1 Input Dependence vs. Input Independence in Generalization Error Analysis |
|
|
51 | (2) |
|
3.2.2 Approximately Correct Models |
|
|
53 | (1) |
|
3.2.3 Input-Dependent Analysis of Generalization Error |
|
|
54 | (10) |
|
3.3 Importance-Weighted Cross-Validation |
|
|
64 | (2) |
|
|
66 | (4) |
|
|
66 | (3) |
|
|
69 | (1) |
|
3.5 Summary and Discussion |
|
|
70 | (3) |
|
|
73 | (30) |
|
4.1 Kernel Density Estimation |
|
|
73 | (2) |
|
|
75 | (1) |
|
|
76 | (2) |
|
4.4 Kullback-Leibler Importance Estimation Procedure |
|
|
78 | (5) |
|
|
78 | (3) |
|
4.4.2 Model Selection by Cross-Validation |
|
|
81 | (1) |
|
4.4.3 Basis Function Design |
|
|
82 | (1) |
|
4.5 Least-Squares Importance Fitting |
|
|
83 | (4) |
|
|
83 | (1) |
|
4.5.2 Basis Function Design and Model Selection |
|
|
84 | (1) |
|
4.5.3 Regularization Path Tracking |
|
|
85 | (2) |
|
4.6 Unconstrained Least-Squares Importance Fitting |
|
|
87 | (1) |
|
|
87 | (1) |
|
4.6.2 Analytic Computation of Leave-One-Out Cross-Validation |
|
|
88 | (1) |
|
|
88 | (6) |
|
|
90 | (1) |
|
4.7.2 Importance Estimation by KLIEP |
|
|
90 | (2) |
|
4.7.3 Covariate Shift Adaptation by IWLS and IWCV |
|
|
92 | (2) |
|
4.8 Experimental Comparison |
|
|
94 | (7) |
|
|
101 | (2) |
|
5 Direct Density-Ratio Estimation with Dimensionality Reduction |
|
|
103 | (22) |
|
5.1 Density Difference in Hetero-Distributional Subspace |
|
|
103 | (1) |
|
5.2 Characterization of Hetero-Distributional Subspace |
|
|
104 | (2) |
|
5.3 Identifying Hetero-Distributional Subspace |
|
|
106 | (6) |
|
|
106 | (2) |
|
5.3.2 Fisher Discriminant Analysis |
|
|
108 | (1) |
|
5.3.3 Local Fisher Discriminant Analysis |
|
|
109 | (3) |
|
5.4 Using LFDA for Finding Hetero-Distributional Subspace |
|
|
112 | (1) |
|
5.5 Density-Ratio Estimation in the Hetero-Distributional Subspace |
|
|
113 | (1) |
|
|
113 | (8) |
|
5.6.1 Illustrative Example |
|
|
113 | (4) |
|
5.6.2 Performance Comparison Using Artificial Data Sets |
|
|
117 | (4) |
|
|
121 | (4) |
|
6 Relation to Sample Selection Bias |
|
|
125 | (12) |
|
6.1 Heckman's Sample Selection Model |
|
|
125 | (4) |
|
6.2 Distributional Change and Sample Selection Bias |
|
|
129 | (2) |
|
6.3 The Two-Step Algorithm |
|
|
131 | (3) |
|
6.4 Relation to Covariate Shift Approach |
|
|
134 | (3) |
|
7 Applications of Covariate Shift Adaptation |
|
|
137 | (46) |
|
7.1 Brain-Computer Interface |
|
|
137 | (5) |
|
|
137 | (1) |
|
|
138 | (2) |
|
7.1.3 Experimental Results |
|
|
140 | (2) |
|
7.2 Speaker identification |
|
|
142 | (7) |
|
|
142 | (1) |
|
|
142 | (2) |
|
7.2.3 Experimental Results |
|
|
144 | (5) |
|
7.3 Natural Language Processing |
|
|
149 | (3) |
|
|
149 | (2) |
|
7.3.2 Experimental Results |
|
|
151 | (1) |
|
7.4 Perceived Age Prediction from Face Images |
|
|
152 | (5) |
|
|
152 | (1) |
|
|
153 | (1) |
|
7.4.3 Incorporating Characteristics of Human Age Perception |
|
|
153 | (2) |
|
7.4.4 Experimental Results |
|
|
155 | (2) |
|
7.5 Human Activity Recognition from Accelerometric Data |
|
|
157 | (8) |
|
|
157 | (1) |
|
7.5.2 Importance-Weighted Least-Squares Probabilistic Classifier |
|
|
157 | (3) |
|
7.5.3 Experimental Results |
|
|
160 | (5) |
|
7.6 Sample Reuse in Reinforcement Learning |
|
|
165 | (18) |
|
7.6.1 Markov Decision Problems |
|
|
165 | (1) |
|
|
166 | (1) |
|
7.6.3 Value Function Approximation |
|
|
167 | (1) |
|
7.6.4 Sample Reuse by Covariate Shift Adaptation |
|
|
168 | (1) |
|
7.6.5 On-Policy vs. Off-Policy |
|
|
169 | (1) |
|
7.6.6 Importance Weighting in Value Function Approximation |
|
|
170 | (4) |
|
7.6.7 Automatic Selection of the Flattening Parameter |
|
|
174 | (1) |
|
7.6.8 Sample Reuse Policy Iteration |
|
|
175 | (1) |
|
7.6.9 Robot Control Experiments |
|
|
176 | (7) |
|
III Learning Causing Covariate Shift |
|
|
|
|
183 | (32) |
|
|
183 | (5) |
|
|
183 | (2) |
|
8.1.2 Decomposition of Generalization Error |
|
|
185 | (3) |
|
8.1.3 Basic Strategy of Active Learning |
|
|
188 | (1) |
|
8.2 Population-Based Active Learning Methods |
|
|
188 | (10) |
|
8.2.1 Classical Method of Active Learning for Correct Models |
|
|
189 | (1) |
|
8.2.2 Limitations of Classical Approach and Countermeasures |
|
|
190 | (1) |
|
8.2.3 Input-Independent Variance-Only Method |
|
|
191 | (2) |
|
8.2.4 Input-Dependent Variance-Only Method |
|
|
193 | (2) |
|
8.2.5 Input-Independent Bias-and-Variance Approach |
|
|
195 | (3) |
|
8.3 Numerical Examples of Population-Based Active Learning Methods |
|
|
198 | (6) |
|
|
198 | (2) |
|
8.3.2 Accuracy of Generalization Error Estimation |
|
|
200 | (2) |
|
8.3.3 Obtained Generalization Error |
|
|
202 | (2) |
|
8.4 Pool-Based Active Learning Methods |
|
|
204 | (5) |
|
8.4.1 Classical Active Learning Method for Correct Models and Its Limitations |
|
|
204 | (1) |
|
8.4.2 Input-Independent Variance-Only Method |
|
|
205 | (1) |
|
8.4.3 Input-Dependent Variance-Only Method |
|
|
206 | (1) |
|
8.4.4 Input-Independent Bias-and-Variance Approach |
|
|
207 | (2) |
|
8.5 Numerical Examples of Pool-Based Active Learning Methods |
|
|
209 | (3) |
|
8.6 Summary and Discussion |
|
|
212 | (3) |
|
9 Active Learning with Model Selection |
|
|
215 | (10) |
|
9.1 Direct Approach and the Active Learning/Model Selection Dilemma |
|
|
215 | (1) |
|
|
216 | (2) |
|
|
218 | (1) |
|
9.4 Ensemble Active Learning |
|
|
219 | (1) |
|
|
220 | (3) |
|
|
220 | (1) |
|
9.5.2 Analysis of Batch Approach |
|
|
221 | (1) |
|
9.5.3 Analysis of Sequential Approach |
|
|
222 | (1) |
|
9.5.4 Comparison of Obtained Generalization Error |
|
|
222 | (1) |
|
9.6 Summary and Discussion |
|
|
223 | (2) |
|
10 Applications of Active Learning |
|
|
225 | (16) |
|
10.1 Design of Efficient Exploration Strategies in Reinforcement Learning |
|
|
225 | (9) |
|
10.1.1 Efficient Exploration with Active Learning |
|
|
225 | (1) |
|
10.1.2 Reinforcement Learning Revisited |
|
|
226 | (2) |
|
10.1.3 Decomposition of Generalization Error |
|
|
228 | (1) |
|
10.1.4 Estimating Generalization Error for Active Learning |
|
|
229 | (1) |
|
10.1.5 Designing Sampling Policies |
|
|
230 | (1) |
|
10.1.6 Active Learning in Policy Iteration |
|
|
231 | (1) |
|
10.1.7 Robot Control Experiments |
|
|
232 | (2) |
|
10.2 Wafer Alignment in Semiconductor Exposure Apparatus |
|
|
234 | (7) |
|
|
|
11 Conclusions and Future Prospects |
|
|
241 | (2) |
|
|
241 | (1) |
|
|
242 | (1) |
Appendix: List of Symbols and Abbreviations |
|
243 | (4) |
Bibliography |
|
247 | (12) |
Index |
|
259 | |