Series Foreword |
|
xv | |
Preface |
|
xvii | |
|
|
1 | (16) |
|
The Learning Problem and (Statistical) Inference |
|
|
1 | (7) |
|
|
3 | (3) |
|
|
6 | (1) |
|
|
7 | (1) |
|
Learning Kernel Classifiers |
|
|
8 | (3) |
|
The Purposes of Learning Theory |
|
|
11 | (6) |
I LEARNING ALGORITHMS |
|
|
Kernel Classifiers from a Machine Learning Perspective |
|
|
17 | (56) |
|
|
17 | (7) |
|
Learning by Risk Minimization |
|
|
24 | (6) |
|
The (Primal) Perceptron Algorithm |
|
|
26 | (1) |
|
Regularized Risk Functionals |
|
|
27 | (3) |
|
Kernels and Linear Classifiers |
|
|
30 | (19) |
|
|
33 | (3) |
|
|
36 | (11) |
|
|
47 | (2) |
|
Support Vector Classification Learning |
|
|
49 | (12) |
|
|
49 | (4) |
|
Soft Margins---Learning with Training Error |
|
|
53 | (3) |
|
Geometrical Viewpoints on Margin Maximization |
|
|
56 | (2) |
|
The v-Trick and Other Variants |
|
|
58 | (3) |
|
|
61 | (7) |
|
Assessment of Learning Algorithms |
|
|
61 | (2) |
|
|
63 | (1) |
|
Pitfalls of Minimizing a Leave-One-Out Bound |
|
|
64 | (2) |
|
|
66 | (2) |
|
|
68 | (5) |
|
Kernel Classifiers from a Bayesian Perspective |
|
|
73 | (42) |
|
|
73 | (8) |
|
The Power of Conditioning on Data |
|
|
79 | (2) |
|
|
81 | (11) |
|
Bayesian Linear Regression |
|
|
82 | (5) |
|
From Regression to Classification |
|
|
87 | (5) |
|
The Relevance Vector Machine |
|
|
92 | (5) |
|
|
97 | (6) |
|
Estimating the Bayes Point |
|
|
100 | (3) |
|
|
103 | (7) |
|
|
110 | (5) |
II LEARNING THEORY |
|
|
Mathematical Models of Learning |
|
|
115 | (48) |
|
Generative vs. Discriminative Models |
|
|
116 | (5) |
|
|
121 | (13) |
|
Classical PAC and VC Analysis |
|
|
123 | (4) |
|
Growth Function and VC Dimension |
|
|
127 | (4) |
|
Structural Risk Minimization |
|
|
131 | (3) |
|
|
134 | (6) |
|
PAC and VC Frameworks for Real-Valued Classifiers |
|
|
140 | (18) |
|
VC Dimensions for Real-Valued Function Classes |
|
|
146 | (4) |
|
|
150 | (1) |
|
|
151 | (7) |
|
|
158 | (5) |
|
Bounds for Specific Algorithms |
|
|
163 | (168) |
|
The PAC-Bayesian Framework |
|
|
164 | (11) |
|
PAC-Bayesian Bounds for Bayesian Algorithms |
|
|
164 | (8) |
|
A PAC-Bayesian Margin Bound |
|
|
172 | (3) |
|
|
175 | (10) |
|
Compression Schemes and Generalization Error |
|
|
176 | (6) |
|
On-line Learning and Compression Schemes |
|
|
182 | (3) |
|
Algorithmic Stability Bounds |
|
|
185 | (8) |
|
Algorithmic Stability for Regression |
|
|
185 | (5) |
|
Algorithmic Stability for Classification |
|
|
190 | (3) |
|
|
193 | (6) |
III APPENDICES |
|
|
A Theoretical Background and Basic Inequalities |
|
|
199 | (54) |
|
|
199 | (1) |
|
|
200 | (3) |
|
A.2.1 Some Results for Random Variables |
|
|
203 | (4) |
|
A.2.2 Families of Probability Measures |
|
|
207 | (8) |
|
A.3 Functional Analysis and Linear Algebra |
|
|
215 | (5) |
|
A.3.1 Covering, Packing and Entropy Numbers |
|
|
220 | (2) |
|
|
222 | (17) |
|
|
239 | (1) |
|
|
240 | (1) |
|
A.5.1 General (In)equalities |
|
|
240 | (3) |
|
A.5.2 Large Deviation Bounds |
|
|
243 | (10) |
|
B Proofs and Derivations---Part I |
|
|
253 | (28) |
|
|
253 | (1) |
|
B.2 Efficient Computation of String Kernels |
|
|
254 | (1) |
|
B.2.1 Efficient Computation of the Substring Kernel |
|
|
255 | (1) |
|
B.2.2 Efficient Computation of the Subsequence Kernel |
|
|
255 | (2) |
|
|
257 | (1) |
|
B.4 Convergence of the Perceptron |
|
|
258 | (1) |
|
B.5 Convex Optimization Problems of Support Vector Machines |
|
|
259 | (1) |
|
|
260 | (1) |
|
B.5.2 Linear Soft Margin Loss SVM |
|
|
260 | (1) |
|
B.5.3 Quadratic Soft Margin Loss SVM |
|
|
261 | (1) |
|
B.5.4 v-Linear Margin Loss SVM |
|
|
262 | (1) |
|
B.6 Leave-One-Out Bound for Kernel Classifiers |
|
|
263 | (2) |
|
B.7 Laplace Approximation for Gaussian Processes |
|
|
265 | (1) |
|
B.7.1 Maximization of ftm+1&barver;X=x,Xm=x |
|
|
266 | (2) |
|
|
268 | (1) |
|
B.7.3 Stabilized Gaussian Process Classification |
|
|
269 | (2) |
|
B.8 Relevance Vector Machines |
|
|
271 | (1) |
|
B.8.1 Derivative of the Evidence w.r.t. &thetas; |
|
|
271 | (2) |
|
B.8.2 Derivative of the Evidence w.r.t. σ |
|
|
273 | (1) |
|
B.8.3 Update Algorithms for Maximizing the Evidence |
|
|
274 | (1) |
|
B.8.4 Computing the Log-Evidence |
|
|
275 | (1) |
|
B.8.5 Maximization of fW&barver;Zm=z |
|
|
276 | (1) |
|
B.9 A Derivation of the Operation Åμ |
|
|
277 | (1) |
|
B.10 Fisher Linear Discriminant |
|
|
278 | (3) |
|
C Proofs and Derivations---Part II |
|
|
281 | (40) |
|
C.1 VC and PAC Generalization Error Bounds |
|
|
281 | (1) |
|
|
281 | (3) |
|
C.1.2 Proof of Theorem 4.7 |
|
|
284 | (3) |
|
C.2 Bound on the Growth Function |
|
|
287 | (2) |
|
|
289 | (3) |
|
C.4 Empirical VC Dimension Luckiness |
|
|
292 | (4) |
|
C.5 Bound on the Fat Shattering Dimension |
|
|
296 | (2) |
|
C.6 Margin Distribution Bound |
|
|
298 | (2) |
|
C.7 The Quantifier Reversal Lemma |
|
|
300 | (2) |
|
C.8 A PAC-Bayesian Marin Bound |
|
|
302 | (1) |
|
C.8.1 Balls in Version Space |
|
|
303 | (3) |
|
C.8.2 Volume Ratio Theorem |
|
|
306 | (2) |
|
C.8.3 A Volume Ratio Bound |
|
|
308 | (3) |
|
|
311 | (3) |
|
C.9 Algorithmic Stability Bounds |
|
|
314 | (1) |
|
C.9.1 Uniform Stability of Functions Minimizing a Regularized Risk |
|
|
315 | (1) |
|
C.9.2 Algorithmic Stability Bounds |
|
|
316 | (5) |
|
|
321 | (10) |
|
|
321 | (2) |
|
D.2 Support Vector and Adaptive Margin Machines |
|
|
323 | (1) |
|
D.2.1 Standard Support Vector Machines |
|
|
323 | (1) |
|
D.2.2-Support Vector Machines |
|
|
324 | (1) |
|
D.2.3 Adaptive Margin Machines |
|
|
324 | (1) |
|
|
325 | (1) |
|
D.4 Relevance Vector Machines |
|
|
325 | (4) |
|
|
329 | (1) |
|
|
330 | (1) |
List of Symbols |
|
331 | (8) |
References |
|
339 | (18) |
Index |
|
357 | |