Series Foreword |
|
xiii | |
Preface |
|
xv | |
|
1 A Tutorial Introduction |
|
|
1 | (22) |
|
1.1 Data Representation and Similarity |
|
|
1 | (3) |
|
1.2 A Simple Pattern Recognition Algorithm |
|
|
4 | (2) |
|
1.3 Some Insights From Statistical Learning Theory |
|
|
6 | (5) |
|
1.4 Hyperplane Classifiers |
|
|
11 | (4) |
|
1.5 Support Vector Classification |
|
|
15 | (2) |
|
1.6 Support Vector Regression |
|
|
17 | (2) |
|
1.7 Kernel Principal Component Analysis |
|
|
19 | (2) |
|
1.8 Empirical Results and Implementations |
|
|
21 | (2) |
|
|
23 | (164) |
|
|
25 | (36) |
|
|
26 | (3) |
|
2.2 The Representation of Similarities in Linear Spaces |
|
|
29 | (16) |
|
2.3 Examples and Properties of Kernels |
|
|
45 | (3) |
|
2.4 The Representation of Dissimilarities in Linear Spaces |
|
|
48 | (7) |
|
|
55 | (1) |
|
|
55 | (6) |
|
3 Risk and Loss Functions |
|
|
61 | (26) |
|
|
62 | (3) |
|
3.2 Test Error and Expected Risk |
|
|
65 | (3) |
|
3.3 A Statistical Perspective |
|
|
68 | (7) |
|
|
75 | (8) |
|
|
83 | (1) |
|
|
84 | (3) |
|
|
87 | (38) |
|
4.1 The Regularized Risk Functional |
|
|
88 | (1) |
|
4.2 The Representer Theorem |
|
|
89 | (3) |
|
4.3 Regularization Operators |
|
|
92 | (4) |
|
4.4 Translation Invariant Kernels |
|
|
96 | (9) |
|
4.5 Translation Invariant Kernels in Higher Dimensions |
|
|
105 | (5) |
|
|
110 | (3) |
|
4.7 Multi-Output Regularization |
|
|
113 | (2) |
|
4.8 Semiparametric Regularization |
|
|
115 | (3) |
|
4.9 Coefficient Based Regularization |
|
|
118 | (3) |
|
|
121 | (1) |
|
|
122 | (3) |
|
5 Elements of Statistical Learning Theory |
|
|
125 | (24) |
|
|
125 | (3) |
|
5.2 The Law of Large Numbers |
|
|
128 | (3) |
|
5.3 When Does Learning Work: the Question of Consistency |
|
|
131 | (1) |
|
5.4 Uniform Convergence and Consistency |
|
|
131 | (3) |
|
5.5 How to Derive a VC Bound |
|
|
134 | (10) |
|
5.6 A Model Selection Example |
|
|
144 | (2) |
|
|
146 | (1) |
|
|
146 | (3) |
|
|
149 | (38) |
|
|
150 | (4) |
|
6.2 Unconstrained Problems |
|
|
154 | (11) |
|
|
165 | (10) |
|
6.4 Interior Point Methods |
|
|
175 | (4) |
|
6.5 Maximum Search Problems |
|
|
179 | (4) |
|
|
183 | (1) |
|
|
184 | (3) |
|
II SUPPORT VECTOR MACHINES |
|
|
187 | (218) |
|
|
189 | (38) |
|
7.1 Separating Hyperplanes |
|
|
189 | (3) |
|
7.2 The Role of the Margin |
|
|
192 | (4) |
|
7.3 Optimal Margin Hyperplanes |
|
|
196 | (4) |
|
7.4 Nonlinear Support Vector Classifiers |
|
|
200 | (4) |
|
7.5 Soft Margin Hyperplanes |
|
|
204 | (7) |
|
7.6 Multi-Class Classification |
|
|
211 | (3) |
|
7.7 Variations on a Theme |
|
|
214 | (1) |
|
|
215 | (7) |
|
|
222 | (1) |
|
|
222 | (5) |
|
8 Single-Class Problems: Quantile Estimation and Novelty Detection |
|
|
227 | (24) |
|
|
228 | (1) |
|
8.2 A Distribution's Support and Quantiles |
|
|
229 | (1) |
|
|
230 | (4) |
|
|
234 | (2) |
|
|
236 | (5) |
|
|
241 | (2) |
|
|
243 | (4) |
|
|
247 | (1) |
|
|
248 | (3) |
|
|
251 | (28) |
|
9.1 Linear Regression with Insensitive Loss Function |
|
|
251 | (3) |
|
|
254 | (6) |
|
|
260 | (6) |
|
9.4 Convex Combinations and i-Norms |
|
|
266 | (3) |
|
9.5 Parametric Insensitivity Models |
|
|
269 | (3) |
|
|
272 | (1) |
|
|
273 | (1) |
|
|
274 | (5) |
|
|
279 | (54) |
|
|
281 | (7) |
|
10.2 Sparse Greedy Matrix Approximation |
|
|
288 | (7) |
|
10.3 Interior Point Algorithms |
|
|
295 | (5) |
|
10.4 Subset Selection Methods |
|
|
300 | (5) |
|
10.5 Sequential Minimal Optimization |
|
|
305 | (7) |
|
|
312 | (15) |
|
|
327 | (2) |
|
|
329 | (4) |
|
11 Incorporating Invariances |
|
|
333 | (26) |
|
|
333 | (2) |
|
11.2 Transformation Invariance |
|
|
335 | (2) |
|
11.3 The Virtual SV Method |
|
|
337 | (6) |
|
11.4 Constructing Invariance Kernels |
|
|
343 | (11) |
|
11.5 The Jittered SV Method |
|
|
354 | (2) |
|
|
356 | (1) |
|
|
357 | (2) |
|
12 Learning Theory Revisited |
|
|
359 | (46) |
|
12.1 Concentration of Measure Inequalities |
|
|
360 | (6) |
|
12.2 Leave-One-Out Estimates |
|
|
366 | (15) |
|
|
381 | (10) |
|
12.4 Operator-Theoretic Methods in Learning Theory |
|
|
391 | (12) |
|
|
403 | (1) |
|
|
404 | (1) |
|
|
405 | (164) |
|
|
407 | (20) |
|
13.1 Tricks for Constructing Kernels |
|
|
408 | (4) |
|
|
412 | (2) |
|
13.3 Locality-Improved Kernels |
|
|
414 | (4) |
|
|
418 | (5) |
|
|
423 | (1) |
|
|
423 | (4) |
|
14 Kernel Feature Extraction |
|
|
427 | (30) |
|
|
427 | (2) |
|
|
429 | (8) |
|
14.3 Kernel PCA Experiments |
|
|
437 | (5) |
|
14.4 A Framework for Feature Extraction |
|
|
442 | (5) |
|
14.5 Algorithms for Sparse KFA |
|
|
447 | (3) |
|
|
450 | (1) |
|
|
451 | (1) |
|
|
452 | (5) |
|
15 Kernel Fisher Discriminant |
|
|
457 | (12) |
|
|
457 | (1) |
|
15.2 Fisher's Discriminant in Feature Space |
|
|
458 | (2) |
|
15.3 Efficient Training of Kernel Fisher Discriminants |
|
|
460 | (4) |
|
15.4 Probabilistic Outputs |
|
|
464 | (2) |
|
|
466 | (1) |
|
|
467 | (1) |
|
|
468 | (1) |
|
16 Bayesian Kernel Methods |
|
|
469 | (48) |
|
|
470 | (5) |
|
|
475 | (5) |
|
|
480 | (8) |
|
16.4 Implementation of Gaussian Processes |
|
|
488 | (11) |
|
|
499 | (7) |
|
16.6 Relevance Vector Machines |
|
|
506 | (5) |
|
|
511 | (2) |
|
|
513 | (4) |
|
17 Regularized Principal Manifolds |
|
|
517 | (26) |
|
|
518 | (4) |
|
17.2 A Regularized Quantization Functional |
|
|
522 | (4) |
|
17.3 An Algorithm for Minimizing Rreg[ /] |
|
|
526 | (3) |
|
17.4 Connections to Other Algorithms |
|
|
529 | (4) |
|
17.5 Uniform Convergence Bounds |
|
|
533 | (4) |
|
|
537 | (2) |
|
|
539 | (1) |
|
|
540 | (3) |
|
18 Pre-Images and Reduced Set Methods |
|
|
543 | (26) |
|
18.1 The Pre-Image Problem |
|
|
544 | (3) |
|
18.2 Finding Approximate Pre-Images |
|
|
547 | (5) |
|
|
552 | (2) |
|
18.4 Reduced Set Selection Methods |
|
|
554 | (7) |
|
18.5 Reduced Set Construction Methods |
|
|
561 | (3) |
|
18.6 Sequential Evaluation of Reduced Set Expansions |
|
|
564 | (2) |
|
|
566 | (1) |
|
|
567 | (2) |
|
|
569 | (6) |
|
|
569 | (3) |
|
|
572 | (3) |
|
B Mathematical Prerequisites |
|
|
575 | (16) |
|
|
575 | (5) |
|
|
580 | (6) |
|
|
586 | (5) |
References |
|
591 | (26) |
Index |
|
617 | (8) |
Notation and Symbols |
|
625 | |