Series Foreword |
|
xvii | |
|
|
xix | |
|
|
xxix | |
|
|
Acknowledgments |
|
xxxiii | |
Notes for the Second Edition |
|
xxxv | |
Notations |
|
xxxix | |
|
|
1 | (20) |
|
What Is Machine Learning? |
|
|
1 | (3) |
|
Examples of Machine Learning Applications |
|
|
4 | (10) |
|
|
4 | (1) |
|
|
5 | (4) |
|
|
9 | (2) |
|
|
11 | (2) |
|
|
13 | (1) |
|
|
14 | (2) |
|
|
16 | (2) |
|
|
18 | (1) |
|
|
19 | (2) |
|
|
21 | (26) |
|
Learning a Class from Examples |
|
|
21 | (6) |
|
Vapnik-Chervonenkis (VC) Dimension |
|
|
27 | (2) |
|
Probably Approximately Correct (PAC) Learning |
|
|
29 | (1) |
|
|
30 | (2) |
|
Learning Multiple Classes |
|
|
32 | (2) |
|
|
34 | (3) |
|
Model Selection and Generalization |
|
|
37 | (4) |
|
Dimensions of a Supervised Machine Learning Algorithm |
|
|
41 | (1) |
|
|
42 | (1) |
|
|
43 | (1) |
|
|
44 | (3) |
|
|
47 | (14) |
|
|
47 | (2) |
|
|
49 | (2) |
|
|
51 | (2) |
|
|
53 | (1) |
|
|
54 | (1) |
|
|
55 | (3) |
|
|
58 | (1) |
|
|
58 | (1) |
|
|
59 | (2) |
|
|
61 | (26) |
|
|
61 | (1) |
|
Maximum Likelihood Estimation |
|
|
62 | (3) |
|
|
63 | (1) |
|
|
64 | (1) |
|
Gaussian (Normal) Density |
|
|
64 | (1) |
|
Evaluating an Estimator: Bias and Variance |
|
|
65 | (1) |
|
|
66 | (3) |
|
Parametric Classification |
|
|
69 | (4) |
|
|
73 | (3) |
|
Tuning Model Complexity: Bias/Variance Dilemma |
|
|
76 | (4) |
|
Model Selection Procedures |
|
|
80 | (4) |
|
|
84 | (1) |
|
|
84 | (1) |
|
|
85 | (2) |
|
|
87 | (22) |
|
|
87 | (1) |
|
|
88 | (1) |
|
Estimation of Missing Values |
|
|
89 | (1) |
|
Multivariate Normal Distribution |
|
|
90 | (4) |
|
Multivariate Classification |
|
|
94 | (5) |
|
|
99 | (3) |
|
|
102 | (1) |
|
|
103 | (2) |
|
|
105 | (1) |
|
|
106 | (1) |
|
|
107 | (2) |
|
|
109 | (34) |
|
|
109 | (1) |
|
|
110 | (3) |
|
Principal Components Analysis |
|
|
113 | (7) |
|
|
120 | (5) |
|
|
125 | (3) |
|
Linear Discriminant Analysis |
|
|
128 | (5) |
|
|
133 | (2) |
|
|
135 | (3) |
|
|
138 | (1) |
|
|
139 | (1) |
|
|
140 | (3) |
|
|
143 | (20) |
|
|
143 | (1) |
|
|
144 | (1) |
|
|
145 | (4) |
|
Expectation-Maximization Algorithm |
|
|
149 | (5) |
|
Mixtures of Latent Variable Models |
|
|
154 | (1) |
|
Supervised Learning after Clustering |
|
|
155 | (2) |
|
|
157 | (1) |
|
Choosing the Number of Clusters |
|
|
158 | (2) |
|
|
160 | (1) |
|
|
160 | (1) |
|
|
161 | (2) |
|
|
163 | (22) |
|
|
163 | (2) |
|
Nonparametric Density Estimation |
|
|
165 | (5) |
|
|
165 | (2) |
|
|
167 | (1) |
|
κ-Nearest Neighbor Estimator |
|
|
168 | (2) |
|
Generalization to Multivariate Data |
|
|
170 | (1) |
|
Nonparametric Classification |
|
|
171 | (1) |
|
Condensed Nearest Neighbor |
|
|
172 | (2) |
|
Nonparametric Regression: Smoothing Models |
|
|
174 | (4) |
|
|
175 | (1) |
|
|
176 | (1) |
|
|
177 | (1) |
|
How to Choose the Smoothing Parameter |
|
|
178 | (2) |
|
|
180 | (1) |
|
|
181 | (1) |
|
|
182 | (3) |
|
|
185 | (24) |
|
|
185 | (2) |
|
|
187 | (7) |
|
|
188 | (4) |
|
|
192 | (2) |
|
|
194 | (3) |
|
Rule Extraction from Trees |
|
|
197 | (1) |
|
|
198 | (4) |
|
|
202 | (2) |
|
|
204 | (3) |
|
|
207 | (1) |
|
|
207 | (2) |
|
|
209 | (24) |
|
|
209 | (2) |
|
Generalizing the Linear Model |
|
|
211 | (1) |
|
Geometry of the Linear Discriminant |
|
|
212 | (4) |
|
|
212 | (2) |
|
|
214 | (2) |
|
|
216 | (1) |
|
Parametric Discrimination Revisited |
|
|
217 | (1) |
|
|
218 | (2) |
|
|
220 | (8) |
|
|
220 | (4) |
|
|
224 | (4) |
|
Discrimination by Regression |
|
|
228 | (2) |
|
|
230 | (1) |
|
|
230 | (1) |
|
|
231 | (2) |
|
|
233 | (46) |
|
|
233 | (4) |
|
|
234 | (1) |
|
Neural Networks as a Paradigm for Parallel Processing |
|
|
235 | (2) |
|
|
237 | (3) |
|
|
240 | (3) |
|
Learning Boolean Functions |
|
|
243 | (2) |
|
|
245 | (3) |
|
MLP as a Universal Approximator |
|
|
248 | (1) |
|
Backpropagation Algorithm |
|
|
249 | (7) |
|
|
250 | (2) |
|
|
252 | (2) |
|
Multiclass Discrimination |
|
|
254 | (2) |
|
|
256 | (1) |
|
|
256 | (7) |
|
|
256 | (1) |
|
|
257 | (1) |
|
|
258 | (3) |
|
|
261 | (2) |
|
|
263 | (3) |
|
Bayesian View of Learning |
|
|
266 | (1) |
|
|
267 | (3) |
|
|
270 | (1) |
|
Time Delay Neural Networks |
|
|
270 | (4) |
|
|
271 | (1) |
|
|
272 | (2) |
|
|
274 | (1) |
|
|
275 | (4) |
|
|
279 | (30) |
|
|
279 | (1) |
|
|
280 | (8) |
|
|
280 | (5) |
|
Adaptive Resonance Theory |
|
|
285 | (1) |
|
|
286 | (2) |
|
|
288 | (6) |
|
Incorporating Rule-Based Knowledge |
|
|
294 | (1) |
|
Normalized Basis Functions |
|
|
295 | (2) |
|
Competitive Basis Functions |
|
|
297 | (3) |
|
Learning Vector Quantization |
|
|
300 | (1) |
|
|
300 | (3) |
|
|
303 | (2) |
|
|
304 | (1) |
|
Hierarchical Mixture of Experts |
|
|
304 | (1) |
|
|
305 | (1) |
|
|
306 | (1) |
|
|
307 | (2) |
|
|
309 | (32) |
|
|
309 | (2) |
|
Optimal Separating Hyperplane |
|
|
311 | (4) |
|
The Nonseparable Case: Soft Margin Hyperplane |
|
|
315 | (3) |
|
|
318 | (1) |
|
|
319 | (2) |
|
|
321 | (3) |
|
|
324 | (1) |
|
|
325 | (2) |
|
Multiclass Kernel Machines |
|
|
327 | (1) |
|
Kernel Machines for Regression |
|
|
328 | (5) |
|
One-Class Kernel Machines |
|
|
333 | (2) |
|
Kernel Dimensionality Reduction |
|
|
335 | (2) |
|
|
337 | (1) |
|
|
338 | (1) |
|
|
339 | (2) |
|
|
341 | (22) |
|
|
341 | (2) |
|
Estimating the Parameter of a Distribution |
|
|
343 | (5) |
|
|
343 | (2) |
|
|
345 | (3) |
|
Bayesian Estimation of the Parameters of a Function |
|
|
348 | (8) |
|
|
348 | (4) |
|
The Use of Basis/Kernel Functions |
|
|
352 | (1) |
|
|
353 | (3) |
|
|
356 | (3) |
|
|
359 | (1) |
|
|
360 | (1) |
|
|
361 | (2) |
|
|
363 | (24) |
|
|
363 | (1) |
|
Discrete Markov Processes |
|
|
364 | (3) |
|
|
367 | (2) |
|
Three Basic Problems of HMMs |
|
|
369 | (1) |
|
|
369 | (4) |
|
Finding the State Sequence |
|
|
373 | (2) |
|
Learning Model Parameters |
|
|
375 | (3) |
|
|
378 | (1) |
|
|
379 | (1) |
|
|
380 | (2) |
|
|
382 | (1) |
|
|
383 | (1) |
|
|
384 | (3) |
|
|
387 | (32) |
|
|
387 | (2) |
|
Canonical Cases for Conditional Independence |
|
|
389 | (7) |
|
|
396 | (6) |
|
|
396 | (2) |
|
|
398 | (3) |
|
|
401 | (1) |
|
|
402 | (1) |
|
|
402 | (8) |
|
|
403 | (2) |
|
|
405 | (2) |
|
|
407 | (2) |
|
|
409 | (1) |
|
Undirected Graphs: Markov Random Fields |
|
|
410 | (3) |
|
Learning the Structure of a Graphical Model |
|
|
413 | (1) |
|
|
414 | (5) |
|
|
414 | (3) |
|
|
417 | (1) |
|
|
417 | (2) |
|
Combining Multiple Learners |
|
|
419 | (28) |
|
|
419 | (1) |
|
Generating Diverse Learners |
|
|
420 | (3) |
|
Model Combination Schemes |
|
|
423 | (1) |
|
|
424 | (3) |
|
Error-Correcting Output Codes |
|
|
427 | (3) |
|
|
430 | (1) |
|
|
431 | (3) |
|
Mixture of Experts Revisited |
|
|
434 | (1) |
|
|
435 | (2) |
|
|
437 | (1) |
|
|
438 | (2) |
|
|
440 | (2) |
|
|
442 | (1) |
|
|
443 | (4) |
|
|
447 | (28) |
|
|
447 | (2) |
|
Single State Case: K-Armed Bandit |
|
|
449 | (1) |
|
Elements of Reinforcement Learning |
|
|
450 | (3) |
|
|
453 | (1) |
|
|
453 | (1) |
|
|
454 | (1) |
|
Temporal Difference Learning |
|
|
454 | (7) |
|
|
455 | (1) |
|
Deterministic Rewards and Actions |
|
|
456 | (1) |
|
Nondeterministic Rewards and Actions |
|
|
457 | (2) |
|
|
459 | (2) |
|
|
461 | (3) |
|
Partially Observable States |
|
|
464 | (6) |
|
|
464 | (1) |
|
Example: The Tiger Problem |
|
|
465 | (5) |
|
|
470 | (2) |
|
|
472 | (1) |
|
|
473 | (2) |
|
Design and Analysis of Machine Learning Experiments |
|
|
475 | (42) |
|
|
475 | (3) |
|
Factors, Response, and Strategy of Experimentation |
|
|
478 | (3) |
|
|
481 | (1) |
|
Randomization, Replication, and Blocking |
|
|
482 | (1) |
|
Guidelines for Machine Learning Experiments |
|
|
483 | (3) |
|
Cross-Validation and Resampling Methods |
|
|
486 | (3) |
|
|
487 | (1) |
|
|
488 | (1) |
|
|
489 | (1) |
|
Measuring Classifier Performance |
|
|
489 | (4) |
|
|
493 | (3) |
|
|
496 | (2) |
|
Assessing a Classification Algorithm's Performance |
|
|
498 | (3) |
|
|
499 | (1) |
|
|
500 | (1) |
|
|
500 | (1) |
|
Comparing Two Classification Algorithms |
|
|
501 | (3) |
|
|
501 | (1) |
|
K-Fold Cross-Validated Paired t Test |
|
|
501 | (1) |
|
|
502 | (1) |
|
|
503 | (1) |
|
Comparing Multiple Algorithms: Analysis of Variance |
|
|
504 | (4) |
|
Comparison over Multiple Datasets |
|
|
508 | (4) |
|
|
509 | (2) |
|
|
511 | (1) |
|
|
512 | (1) |
|
|
513 | (1) |
|
|
514 | (3) |
|
|
517 | (12) |
|
|
517 | (2) |
|
|
518 | (1) |
|
|
518 | (1) |
|
|
519 | (4) |
|
Probability Distribution and Density Functions |
|
|
519 | (1) |
|
Joint Distribution and Density Functions |
|
|
520 | (1) |
|
Conditional Distributions |
|
|
520 | (1) |
|
|
521 | (1) |
|
|
521 | (1) |
|
|
522 | (1) |
|
Weak Law of Large Numbers |
|
|
523 | (1) |
|
|
523 | (4) |
|
|
523 | (1) |
|
|
524 | (1) |
|
|
524 | (1) |
|
|
524 | (1) |
|
Normal (Gaussian) Distribution |
|
|
525 | (1) |
|
|
526 | (1) |
|
|
527 | (1) |
|
|
527 | (1) |
|
|
527 | (2) |
Index |
|
529 | |