Series Foreword |
|
xi | |
Preface |
|
xiii | |
|
1 Introduction and Overview |
|
|
1 | (20) |
|
1.1 Classification Problems and Machine Learning |
|
|
2 | (2) |
|
|
4 | (10) |
|
1.3 Resistance to Overfitting and the Margins Theory |
|
|
14 | (3) |
|
1.4 Foundations and Algorithms |
|
|
17 | (2) |
|
|
19 | (1) |
|
|
19 | (1) |
|
|
20 | (1) |
I Core Analysis |
|
21 | (118) |
|
2 Foundations of Machine Learning |
|
|
23 | (30) |
|
2.1 A Direct Approach to Machine Learning |
|
|
24 | (6) |
|
2.2 General Methods of Analysis |
|
|
30 | (13) |
|
2.3 A Foundation for the Study of Boosting Algorithms |
|
|
43 | (6) |
|
|
49 | (1) |
|
|
49 | (1) |
|
|
50 | (3) |
|
3 Using AdaBoost to Minimize Raining Error |
|
|
53 | (22) |
|
3.1 A Bound on AdaBoost's Training Error |
|
|
54 | (2) |
|
3.2 A Sufficient Condition for Weak Learnability |
|
|
56 | (4) |
|
3.3 Relation to Chernoff Bounds |
|
|
60 | (2) |
|
3.4 Using and Designing Base Learning Algorithms |
|
|
62 | (8) |
|
|
70 | (1) |
|
|
71 | (1) |
|
|
71 | (4) |
|
4 Direct Bounds on the Generalization Error |
|
|
75 | (18) |
|
4.1 Using VC Theory to Bound the Generalization Error |
|
|
75 | (8) |
|
4.2 Compression-Based Bounds |
|
|
83 | (3) |
|
4.3 The Equivalence of Strong and Weak Learnability |
|
|
86 | (2) |
|
|
88 | (1) |
|
|
89 | (1) |
|
|
89 | (4) |
|
5 The Margins Explanation for Boosting's Effectiveness |
|
|
93 | (46) |
|
5.1 Margin as a Measure of Confidence |
|
|
94 | (3) |
|
5.2 A Margins-Based Analysis of the Generalization Error |
|
|
97 | (9) |
|
5.3 Analysis Based on Rademacher Complexity |
|
|
106 | (5) |
|
5.4 The Effect of Boosting on Margin Distributions |
|
|
111 | (6) |
|
5.5 Bias, Variance, and Stability |
|
|
117 | (5) |
|
5.6 Relation to Support-Vector Machines |
|
|
122 | (6) |
|
5.7 Practical Applications of Margins |
|
|
128 | (4) |
|
|
132 | (1) |
|
|
132 | (2) |
|
|
134 | (5) |
II Fundamental Perspectives |
|
139 | (130) |
|
6 Game Theory, Online Learning, and Boosting |
|
|
141 | (34) |
|
|
142 | (3) |
|
6.2 Learning in Repeated Game Playing |
|
|
145 | (8) |
|
|
153 | (4) |
|
|
157 | (6) |
|
6.5 Application to a "Mind-Reading" Game |
|
|
163 | (6) |
|
|
169 | (1) |
|
|
169 | (1) |
|
|
170 | (5) |
|
7 Loss Minimization and Generalizations of Boosting |
|
|
175 | (52) |
|
7.1 AdaBoost's Loss Function |
|
|
177 | (2) |
|
|
179 | (5) |
|
7.3 Loss Minimization Cannot Explain Generalization |
|
|
184 | (4) |
|
7.4 Functional Gradient Descent |
|
|
188 | (6) |
|
7.5 Logistic Regression and Conditional Probabilities |
|
|
194 | (8) |
|
|
202 | (9) |
|
7.7 Applications to Data-Limited Learning |
|
|
211 | (8) |
|
|
219 | (1) |
|
|
219 | (1) |
|
|
220 | (7) |
|
8 Boosting, Convex Optimization, and Information Geometry |
|
|
227 | (42) |
|
8.1 Iterative Projection Algorithms |
|
|
228 | (15) |
|
8.2 Proving the Convergence of AdaBoost |
|
|
243 | (9) |
|
8.3 Unification with Logistic Regression |
|
|
252 | (3) |
|
8.4 Application to Species Distribution Modeling |
|
|
255 | (5) |
|
|
260 | (2) |
|
|
262 | (1) |
|
|
263 | (6) |
III Algorithmic Extensions |
|
269 | (106) |
|
9 Using Confidence-Rated Weak Predictions |
|
|
271 | (32) |
|
|
273 | (2) |
|
9.2 General Methods for Algorithm Design |
|
|
275 | (12) |
|
|
287 | (3) |
|
9.4 Alternating Decision Trees |
|
|
290 | (6) |
|
|
296 | (1) |
|
|
297 | (1) |
|
|
297 | (6) |
|
10 Multiclass Classification Problems |
|
|
303 | (38) |
|
10.1 A Direct Extension to the Multiclass Case |
|
|
305 | (5) |
|
10.2 The One-against-All Reduction and Multi-label Classification |
|
|
310 | (6) |
|
10.3 Application to Semantic Classification |
|
|
316 | (4) |
|
10.4 General Reductions Using Output Codes |
|
|
320 | (13) |
|
|
333 | (1) |
|
|
333 | (1) |
|
|
334 | (7) |
|
|
341 | (34) |
|
11.1 A Formal Framework for Ranking Problems |
|
|
342 | (3) |
|
11.2 A Boosting Algorithm for the Ranking Task |
|
|
345 | (6) |
|
11.3 Methods for Improving Efficiency |
|
|
351 | (10) |
|
11.4 Multiclass,*Multi-label Classification |
|
|
361 | (3) |
|
|
364 | (3) |
|
|
367 | (2) |
|
|
369 | (1) |
|
|
369 | (6) |
IV Advanced Theory |
|
375 | (116) |
|
12 Attaining the Best Possible Accuracy |
|
|
377 | (38) |
|
12.1 Optimality in Classification and Risk Minimization |
|
|
378 | (4) |
|
12.2 Approaching the Optimal Risk |
|
|
382 | (16) |
|
12.3 How Minimizing Risk Can Lead to Poor Accuracy |
|
|
398 | (8) |
|
|
406 | (1) |
|
|
406 | (1) |
|
|
407 | (8) |
|
13 Optimally Efficient Boosting |
|
|
415 | (44) |
|
13.1 The Boost-by-Majority Algorithm |
|
|
416 | (16) |
|
13.2 Optimal Generalization Error |
|
|
432 | (16) |
|
13.3 Relation to AdaBoost |
|
|
448 | (5) |
|
|
453 | (1) |
|
|
453 | (1) |
|
|
453 | (6) |
|
14 Boosting in Continuous Time |
|
|
459 | (32) |
|
14.1 Adaptiveness in the Limit of Continuous Time |
|
|
460 | (8) |
|
|
468 | (8) |
|
14.3 AdaBoost as a Special Case of BrownBoost |
|
|
476 | (7) |
|
14.4 Experiments with Noisy Data |
|
|
483 | (2) |
|
|
485 | (1) |
|
|
486 | (1) |
|
|
486 | (5) |
Appendix: Some Notation, Definitions, and Mathematical Background |
|
491 | (10) |
|
|
491 | (1) |
|
|
492 | (1) |
|
A.3 Maxima, Minima, Suprema, and Infima |
|
|
493 | (1) |
|
|
493 | (1) |
|
A.5 Continuity, Closed Sets, and Compactness |
|
|
494 | (1) |
|
A.6 Derivatives, Gradients, and Taylor's Theorem |
|
|
495 | (1) |
|
|
496 | (1) |
|
A.8 The Method of Lagrange Multipliers |
|
|
497 | (1) |
|
A.9 Some Distributions and the Central Limit Theorem |
|
|
498 | (3) |
Bibliography |
|
501 | (10) |
Index of Algorithms, Figures, and Tables |
|
511 | (2) |
Subject and Author Index |
|
513 | |