|
|
ix | |
|
|
xi | |
Preface |
|
xiii | |
Notation |
|
xvi | |
|
Part I Introduction and Overview |
|
|
|
|
3 | (12) |
|
1.1 An Example: Autonomous Driving |
|
|
4 | (2) |
|
1.2 Pattern Recognition and Machine Learning |
|
|
6 | (6) |
|
|
12 | (3) |
|
2 Mathematical Background |
|
|
15 | (29) |
|
|
15 | (10) |
|
|
25 | (9) |
|
2.3 Optimization and Matrix Calculus |
|
|
34 | (5) |
|
2.4 Complexity of Algorithms |
|
|
39 | (1) |
|
2.5 Miscellaneous Notes and Additional Resources |
|
|
40 | (1) |
|
|
41 | (3) |
|
3 Overview of a Pattern Recognition System |
|
|
44 | (19) |
|
|
44 | (1) |
|
3.2 A Simple Nearest Neighbor Classifier |
|
|
45 | (4) |
|
|
49 | (3) |
|
3.4 Making Assumptions and Simplifications |
|
|
52 | (7) |
|
|
59 | (1) |
|
3.6 Miscellaneous Notes and Additional Resources |
|
|
59 | (2) |
|
|
61 | (2) |
|
|
63 | (38) |
|
4.1 Accuracy and Error in the Simple Case |
|
|
63 | (7) |
|
4.2 Minimizing the Cost/Loss |
|
|
70 | (3) |
|
4.3 Evaluation in Imbalanced Problems |
|
|
73 | (6) |
|
4.4 Can We Reach 100% Accuracy? |
|
|
79 | (6) |
|
4.5 Confidence in the Evaluation Results |
|
|
85 | (7) |
|
4.6 Miscellaneous Notes and Additional Resources |
|
|
92 | (1) |
|
|
93 | (8) |
|
Part II Domain-Independent Feature Extraction |
|
|
|
5 Principal Component Analysis |
|
|
101 | (22) |
|
|
101 | (3) |
|
5.2 PCA to Zero-Dimensional Subspace |
|
|
104 | (2) |
|
5.3 PCA to One-Dimensional Subspace |
|
|
106 | (4) |
|
5.4 PCA for More Dimensions |
|
|
110 | (1) |
|
5.5 The Complete PCA Algorithm |
|
|
110 | (1) |
|
|
111 | (4) |
|
5.7 When to Use or Not to Use PCA? |
|
|
115 | (3) |
|
5.8 The Whitening Transform |
|
|
118 | (1) |
|
5.9 Eigen-Decomposition vs. SVD |
|
|
118 | (1) |
|
5.10 Miscellaneous Notes and Additional Resources |
|
|
119 | (1) |
|
|
119 | (4) |
|
6 Fisher's Linear Discriminant |
|
|
123 | (20) |
|
6.1 FLD for Binary Classification |
|
|
125 | (7) |
|
|
132 | (3) |
|
6.3 Miscellaneous Notes and Additional Resources |
|
|
135 | (1) |
|
|
136 | (7) |
|
Part III Classifiers and Tools |
|
|
|
7 Support Vector Machines |
|
|
143 | (30) |
|
|
143 | (4) |
|
7.2 Visualizing and Calculating the Margin |
|
|
147 | (3) |
|
7.3 Maximizing the Margin |
|
|
150 | (2) |
|
7.4 The Optimization and the Solution |
|
|
152 | (5) |
|
7.5 Extensions for Linearly Inseparable and Multiclass Problems |
|
|
157 | (4) |
|
|
161 | (6) |
|
7.7 Miscellaneous Notes and Additional Resources |
|
|
167 | (1) |
|
|
167 | (6) |
|
|
173 | (23) |
|
8.1 The Probabilistic Way of Thinking |
|
|
173 | (2) |
|
|
175 | (3) |
|
8.3 Parametric Estimation |
|
|
178 | (6) |
|
8.4 Nonparametric Estimation |
|
|
184 | (7) |
|
|
191 | (1) |
|
8.6 Miscellaneous Notes and Additional Resources |
|
|
192 | (1) |
|
|
192 | (4) |
|
9 Distance Metrics and Data Transformations |
|
|
196 | (23) |
|
9.1 Distance Metrics and Similarity Measures |
|
|
196 | (11) |
|
9.2 Data Transformation and Normalization |
|
|
207 | (6) |
|
9.3 Miscellaneous Notes and Additional Resources |
|
|
213 | (1) |
|
|
213 | (6) |
|
10 Information Theory and Decision Trees |
|
|
219 | (26) |
|
10.1 Prefix Code and Huffman Tree |
|
|
219 | (2) |
|
10.2 Basics of Information Theory |
|
|
221 | (5) |
|
10.3 Information Theory for Continuous Distributions |
|
|
226 | (5) |
|
10.4 Information Theory in ML and PR |
|
|
231 | (3) |
|
|
234 | (5) |
|
10.6 Miscellaneous Notes and Additional Resources |
|
|
239 | (1) |
|
|
239 | (6) |
|
Part IV Handling Diverse Data Formats |
|
|
|
11 Sparse and Misaligned Data |
|
|
245 | (21) |
|
11.1 Sparse Machine Learning |
|
|
245 | (9) |
|
11.2 Dynamic Time Warping |
|
|
254 | (8) |
|
11.3 Miscellaneous Notes and Additional Resources |
|
|
262 | (1) |
|
|
262 | (4) |
|
|
266 | (27) |
|
12.1 Sequential Data and the Markov Property |
|
|
266 | (8) |
|
12.2 Three Basic Problems in HMM Learning |
|
|
274 | (1) |
|
12.3 α, β and the Evaluation Problem |
|
|
275 | (5) |
|
12.4 γ, δ, &phsi; and the Decoding Problem |
|
|
280 | (3) |
|
12.5 ζ and Learning HMM Parameters |
|
|
283 | (3) |
|
12.6 Miscellaneous Notes and Additional Resources |
|
|
286 | (1) |
|
|
287 | (6) |
|
|
|
13 The Normal Distribution |
|
|
293 | (23) |
|
|
293 | (3) |
|
13.2 Notation and Parameterization |
|
|
296 | (1) |
|
13.3 Linear Operation and Summation |
|
|
297 | (2) |
|
13.4 Geometry and the Mahalanobis Distance |
|
|
299 | (1) |
|
|
300 | (2) |
|
13.6 Product of Gaussians |
|
|
302 | (1) |
|
13.7 Application I: Parameter Estimation |
|
|
303 | (2) |
|
13.8 Application II: Kalman Filter |
|
|
305 | (2) |
|
13.9 Useful Math in This Chapter |
|
|
307 | (5) |
|
|
312 | (4) |
|
14 The Basic Idea behind Expectation-Maximization |
|
|
316 | (17) |
|
14.1 GMM: A Worked Example |
|
|
316 | (5) |
|
14.2 An Informal Description of the EM Algorithm |
|
|
321 | (1) |
|
14.3 The Expectation-Maximization Algorithm |
|
|
321 | (7) |
|
|
328 | (2) |
|
14.5 Miscellaneous Notes and Additional Resources |
|
|
330 | (1) |
|
|
331 | (2) |
|
15 Convolutional Neural Networks |
|
|
333 | (32) |
|
|
334 | (2) |
|
|
336 | (5) |
|
15.3 Layer Input, Output, and Notation |
|
|
341 | (1) |
|
|
342 | (2) |
|
15.5 The Convolution Layer |
|
|
344 | (12) |
|
|
356 | (3) |
|
15.7 A Case Study: The VGG16 Net |
|
|
359 | (2) |
|
15.8 Hands-On CNN Experiences |
|
|
361 | (1) |
|
15.9 Miscellaneous Notes and Additional Resources |
|
|
362 | (1) |
|
|
362 | (3) |
Bibliography |
|
365 | (14) |
Index |
|
379 | |