PREFACE |
|
xiii | |
PART I |
|
1 | |
|
1 WHAT IS KNOWLEDGE DISCOVERY? |
|
|
3 | |
|
|
4 | |
|
1.2 Structure of the Universe X |
|
|
6 | |
|
|
8 | |
|
1.4 Model Representations |
|
|
9 | |
|
|
11 | |
|
|
11 | |
|
2 KNOWLEDGE DISCOVERY ENVIRONMENTS |
|
|
13 | |
|
2.1 Computational Aspects of Knowledge Discovery |
|
|
13 | |
|
|
14 | |
|
|
17 | |
|
|
20 | |
|
2.1.4 Model Building and Evaluation |
|
|
23 | |
|
|
26 | |
|
|
27 | |
|
|
27 | |
|
|
28 | |
|
3 DESCRIBING DATA MATHEMATICALLY |
|
|
31 | |
|
3.1 From Data Sets to Vector Spaces |
|
|
31 | |
|
|
35 | |
|
|
40 | |
|
3.2 The Dot Product as a Similarity Score |
|
|
41 | |
|
3.3 Lines, Planes, and Hyperplanes |
|
|
44 | |
|
|
47 | |
|
|
48 | |
|
4 LINEAR DECISION SURFACES AND FUNCTIONS |
|
|
49 | |
|
4.1 From Data Sets to Decision Functions |
|
|
49 | |
|
4.1.1 Linear Decision Surfaces Through the Origin |
|
|
50 | |
|
4.1.2 Decision Surfaces with an Offset Term |
|
|
51 | |
|
4.2 Simple Learning Algorithm |
|
|
54 | |
|
|
57 | |
|
|
58 | |
|
|
59 | |
|
|
61 | |
|
5.1 Perceptron Architecture and Training |
|
|
62 | |
|
|
67 | |
|
|
70 | |
|
|
71 | |
|
|
72 | |
|
6 MAXIMUM-MARGIN CLASSIFIERS |
|
|
73 | |
|
6.1 Optimization Problems |
|
|
74 | |
|
|
75 | |
|
6.3 Optimizing the Margin |
|
|
77 | |
|
6.4 Quadratic Programming |
|
|
82 | |
|
|
86 | |
|
|
87 | |
|
|
88 | |
PART II |
|
89 | |
|
7 SUPPORT VECTOR MACHINES |
|
|
91 | |
|
|
92 | |
|
7.2 Dual Maximum-Margin Optimization |
|
|
97 | |
|
7.2.1 The Dual Decision Function |
|
|
101 | |
|
7.3 Linear Support Vector Machines |
|
|
102 | |
|
7.4 Nonlinear Support Vector Machines |
|
|
103 | |
|
|
106 | |
|
|
109 | |
|
7.4.3 A Closer Look at Kernels |
|
|
109 | |
|
7.5 Soft-Margin Classifiers |
|
|
114 | |
|
7.5.1 The Dual Setting for Soft-Margin Classifiers |
|
|
118 | |
|
|
122 | |
|
|
123 | |
|
|
126 | |
|
|
128 | |
|
|
130 | |
|
|
131 | |
|
|
133 | |
|
|
134 | |
|
8.1.1 The Kernel-Adatron Algorithm |
|
|
136 | |
|
8.2 Quadratic Programming |
|
|
138 | |
|
|
139 | |
|
8.3 Sequential Minimal Optimization |
|
|
142 | |
|
|
144 | |
|
|
144 | |
|
|
145 | |
|
9 EVALUATING WHAT HAS BEEN LEARNED |
|
|
147 | |
|
|
148 | |
|
9.1.1 The Confusion Matrix |
|
|
149 | |
|
|
152 | |
|
9.2.1 The Hold-Out Method |
|
|
155 | |
|
9.2.2 The Leave-One-Out Method |
|
|
157 | |
|
9.2.3 N-Fold Cross-Validation |
|
|
158 | |
|
9.3 Error Confidence Intervals |
|
|
160 | |
|
9.3.1 Comparison of Models |
|
|
162 | |
|
9.4 Model Evaluation in Practice |
|
|
163 | |
|
|
163 | |
|
|
167 | |
|
|
169 | |
|
|
170 | |
|
10 ELEMENTS OF STATISTICAL LEARNING THEORY |
|
|
171 | |
|
10.1 The VC-Dimension and Model Complexity |
|
|
172 | |
|
10.2 A Theoretical Setting for Machine Learning |
|
|
175 | |
|
10.3 Empirical Risk Minimization |
|
|
176 | |
|
|
177 | |
|
10.5 Structural Risk Minimization |
|
|
179 | |
|
|
180 | |
|
|
180 | |
|
|
181 | |
PART III |
|
183 | |
|
11 MULTICLASS CLASSIFICATION |
|
|
185 | |
|
11.1 One-Versus-the-Rest Classification |
|
|
185 | |
|
11.2 Pairwise Classification |
|
|
189 | |
|
|
192 | |
|
|
192 | |
|
|
192 | |
|
12 REGRESSION WITH SUPPORT VECTOR MACHINES |
|
|
193 | |
|
12.1 Regression as Machine Learning |
|
|
193 | |
|
12.2 Simple and Multiple Linear Regression |
|
|
194 | |
|
12.3 Regression with Maximum-Margin Machines |
|
|
197 | |
|
12.4 Regression with Support Vector Machines |
|
|
200 | |
|
|
202 | |
|
|
203 | |
|
|
204 | |
|
|
205 | |
|
|
207 | |
|
|
208 | |
|
|
209 | |
|
13.1 Maximum-Margin Machines |
|
|
210 | |
|
|
212 | |
|
13.3 Novelty Detection in R |
|
|
214 | |
|
|
217 | |
|
|
217 | |
APPENDIX A NOTATION |
|
219 | |
APPENDIX B TUTORIAL INTRODUCTION TO R |
|
221 | |
|
B.1 Programming Constructs |
|
|
222 | |
|
|
224 | |
|
|
227 | |
|
|
230 | |
REFERENCES |
|
231 | |
INDEX |
|
237 | |