|
|
1 | (12) |
|
1.1 Two Fundamental Questions |
|
|
1 | (3) |
|
1.1.1 Why Should One Read the Book? |
|
|
1 | (1) |
|
1.1.2 What Is the Book About? |
|
|
2 | (2) |
|
1.2 The Structure of the Book |
|
|
4 | (4) |
|
1.2.1 Part I: From Perception to Computation |
|
|
4 | (1) |
|
1.2.2 Part II: Machine Learning |
|
|
5 | (1) |
|
1.2.3 Part III: Applications |
|
|
6 | (1) |
|
|
7 | (1) |
|
1.3 How to Read This Book |
|
|
8 | (1) |
|
1.3.1 Background and Learning Objectives |
|
|
8 | (1) |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
9 | (1) |
|
|
9 | (4) |
|
Part I From Perception to Computation |
|
|
|
2 Audio Acquisition, Representation and Storage |
|
|
13 | (44) |
|
|
13 | (2) |
|
2.2 Sound Physics, Production and Perception |
|
|
15 | (7) |
|
2.2.1 Acoustic Waves Physics |
|
|
15 | (3) |
|
|
18 | (2) |
|
|
20 | (2) |
|
|
22 | (10) |
|
2.3.1 Sampling and Aliasing |
|
|
23 | (2) |
|
2.3.2 The Sampling Theorem** |
|
|
25 | (3) |
|
2.3.3 Linear Quantization |
|
|
28 | (2) |
|
2.3.4 Nonuniform Scalar Quantization |
|
|
30 | (2) |
|
2.4 Audio Encoding and Storage Formats |
|
|
32 | (6) |
|
2.4.1 Linear PCM and Compact Discs |
|
|
33 | (1) |
|
2.4.2 MPEG Digital Audio Coding |
|
|
34 | (1) |
|
2.4.3 AAC Digital Audio Coding |
|
|
35 | (1) |
|
|
36 | (2) |
|
2.5 Time-Domain Audio Processing |
|
|
38 | (9) |
|
2.5.1 Linear and Time-Invariant Systems |
|
|
39 | (1) |
|
2.5.2 Short-Term Analysis |
|
|
40 | (3) |
|
2.5.3 Time-Domain Measures |
|
|
43 | (4) |
|
2.6 Linear Predictive Coding |
|
|
47 | (5) |
|
2.6.1 Parameter Estimation |
|
|
50 | (2) |
|
|
52 | (5) |
|
|
52 | (1) |
|
|
53 | (4) |
|
3 Image and Video Acquisition, Representation and Storage |
|
|
57 | (42) |
|
|
57 | (1) |
|
|
58 | (2) |
|
3.2.1 Structure of the Human Eye |
|
|
58 | (2) |
|
3.3 Image Acquisition Devices |
|
|
60 | (3) |
|
|
60 | (3) |
|
|
63 | (13) |
|
3.4.1 Human Color Perception |
|
|
63 | (1) |
|
|
64 | (12) |
|
|
76 | (5) |
|
3.5.1 Image File Format Standards |
|
|
76 | (1) |
|
|
77 | (4) |
|
|
81 | (7) |
|
3.6.1 Global Image Descriptors |
|
|
81 | (4) |
|
|
85 | (3) |
|
|
88 | (1) |
|
|
89 | (4) |
|
3.8.1 Further MPEG Standards |
|
|
90 | (3) |
|
|
93 | (6) |
|
|
93 | (2) |
|
|
95 | (4) |
|
|
|
|
99 | (8) |
|
|
99 | (1) |
|
4.2 Taxonomy of Machine Learning |
|
|
100 | (1) |
|
|
100 | (1) |
|
4.2.2 Learning from Instruction |
|
|
101 | (1) |
|
4.2.3 Learning by Analogy |
|
|
101 | (1) |
|
4.3 Learning from Examples |
|
|
101 | (4) |
|
4.3.1 Supervised Learning |
|
|
102 | (1) |
|
4.3.2 Reinforcement Learning |
|
|
103 | (1) |
|
4.3.3 Unsupervised Learning |
|
|
103 | (1) |
|
4.3.4 Semi-supervised Learning |
|
|
104 | (1) |
|
|
105 | (2) |
|
|
105 | (2) |
|
5 Bayesian Theory of Decision |
|
|
107 | (24) |
|
|
107 | (1) |
|
|
108 | (2) |
|
|
110 | (2) |
|
|
112 | (3) |
|
5.4.1 Binary Classification |
|
|
114 | (1) |
|
5.5 Zero-One Loss Function |
|
|
115 | (1) |
|
5.6 Discriminant Functions |
|
|
116 | (2) |
|
5.6.1 Binary Classification Case |
|
|
117 | (1) |
|
|
118 | (4) |
|
5.7.1 Univariate Gaussian Density |
|
|
118 | (1) |
|
5.7.2 Multivariate Gaussian Density |
|
|
119 | (1) |
|
5.7.3 Whitening Transformation |
|
|
120 | (2) |
|
5.8 Discriminant Functions for Gaussian Likelihood |
|
|
122 | (3) |
|
5.8.1 Features Are Statistically Independent |
|
|
122 | (1) |
|
5.8.2 Covariance Matrix is the Same for all Classes |
|
|
123 | (2) |
|
5.8.3 Covariance Matrix is Not the Same for all Classes |
|
|
125 | (1) |
|
5.9 Receiver Operating Curves |
|
|
125 | (2) |
|
|
127 | (4) |
|
|
128 | (1) |
|
|
129 | (2) |
|
|
131 | (38) |
|
|
131 | (2) |
|
6.2 Expectation and Maximization Algorithm* |
|
|
133 | (3) |
|
|
134 | (2) |
|
6.3 Basic Notions and Terminology |
|
|
136 | (5) |
|
6.3.1 Codebooks and Codevectors |
|
|
136 | (1) |
|
6.3.2 Quantization Error Minimization |
|
|
137 | (1) |
|
6.3.3 Entropy Maximization |
|
|
138 | (1) |
|
6.3.4 Vector Quantization |
|
|
139 | (2) |
|
|
141 | (5) |
|
|
142 | (1) |
|
|
143 | (3) |
|
6.4.3 K-Means Software Packages |
|
|
146 | (1) |
|
|
146 | (3) |
|
6.5.1 SOM Software Packages |
|
|
148 | (1) |
|
|
148 | (1) |
|
6.6 Neural Gas and Topology Representing Network |
|
|
149 | (2) |
|
|
149 | (1) |
|
6.6.2 Topology Representing Network |
|
|
150 | (1) |
|
6.6.3 Neural Gas and TRN Software Package |
|
|
151 | (1) |
|
6.6.4 Neural Gas and TRN Drawbacks |
|
|
151 | (1) |
|
6.7 General Topographic Mapping* |
|
|
151 | (4) |
|
|
152 | (1) |
|
6.7.2 Optimization by EM Algorithm* |
|
|
153 | (1) |
|
|
154 | (1) |
|
6.7.4 GTM Software Package |
|
|
155 | (1) |
|
6.8 Fuzzy Clustering Algorithms |
|
|
155 | (2) |
|
|
156 | (1) |
|
6.9 Hierarchical Clustering |
|
|
157 | (2) |
|
6.10 Mixtures of Gaussians |
|
|
159 | (4) |
|
|
160 | (1) |
|
|
161 | (2) |
|
|
163 | (6) |
|
|
164 | (1) |
|
|
165 | (4) |
|
7 Foundations of Statistical Learning and Model Selection |
|
|
169 | (22) |
|
|
169 | (1) |
|
7.2 Bias-Variance Dilemma |
|
|
170 | (3) |
|
7.2.1 Bias-Variance Dilemma for Regression |
|
|
170 | (1) |
|
7.2.2 Bias-Variance Decomposition for Classification* |
|
|
171 | (2) |
|
|
173 | (3) |
|
7.4 VC Dimension and Structural Risk Minimization |
|
|
176 | (3) |
|
7.5 Statistical Learning Theory* |
|
|
179 | (3) |
|
7.5.1 Vapnik-Chervonenkis Theory |
|
|
180 | (2) |
|
|
182 | (2) |
|
7.6.1 Akaike Information Criterion |
|
|
182 | (1) |
|
7.6.2 Bayesian Information Criterion |
|
|
183 | (1) |
|
7.7 Minimum Description Length Approach |
|
|
184 | (2) |
|
|
186 | (2) |
|
7.8.1 Generalized Crossvalidation |
|
|
186 | (2) |
|
|
188 | (3) |
|
|
188 | (1) |
|
|
189 | (2) |
|
8 Supervised Neural Networks and Ensemble Methods |
|
|
191 | (38) |
|
|
191 | (1) |
|
8.2 Artificial Neural Networks and Neural Computation |
|
|
192 | (1) |
|
|
193 | (3) |
|
8.4 Connections and Network Architectures |
|
|
196 | (2) |
|
8.5 Single-Layer Networks |
|
|
198 | (5) |
|
8.5.1 Linear Discriminant Functions and Single-Layer Networks |
|
|
199 | (1) |
|
8.5.2 Linear Discriminants and the Logistic Sigmoid |
|
|
200 | (1) |
|
8.5.3 Generalized Linear Discriminants and the Perceptron |
|
|
201 | (2) |
|
|
203 | (2) |
|
8.6.1 The Multilayer Perceptron |
|
|
204 | (1) |
|
8.7 Multilayer Networks Training |
|
|
205 | (7) |
|
8.7.1 Error Back-Propagation for Feed-Forwards Networks* |
|
|
206 | (2) |
|
8.7.2 Parameter Update: The Error Surface |
|
|
208 | (2) |
|
8.7.3 Parameters Update: The Gradient Descent* |
|
|
210 | (2) |
|
|
212 | (1) |
|
8.8 Learning Vector Quantization |
|
|
212 | (3) |
|
8.8.1 The LVQ_PAK Software Package |
|
|
214 | (1) |
|
8.9 Nearest Neighbour Classification |
|
|
215 | (2) |
|
8.9.1 Probabilistic Interpretation |
|
|
217 | (1) |
|
|
217 | (7) |
|
8.10.1 Classifier Diversity and Ensemble Performance* |
|
|
218 | (2) |
|
8.10.2 Creating Ensemble of Diverse Classifiers |
|
|
220 | (4) |
|
|
224 | (5) |
|
|
224 | (1) |
|
|
225 | (4) |
|
|
229 | (66) |
|
|
229 | (2) |
|
9.2 Lagrange Method and Kuhn Tucker Theorem |
|
|
231 | (4) |
|
9.2.1 Lagrange Multipliers Method |
|
|
231 | (2) |
|
9.2.2 Kuhn Tucker Theorem |
|
|
233 | (2) |
|
9.3 Support Vector Machines for Classification |
|
|
235 | (12) |
|
9.3.1 Optimal Hyperplane Algorithm |
|
|
236 | (2) |
|
9.3.2 Support Vector Machine Construction |
|
|
238 | (3) |
|
9.3.3 Algorithmic Approaches to Solve Quadratic Programming |
|
|
241 | (1) |
|
9.3.4 Sequential Minimal Optimization |
|
|
242 | (2) |
|
9.3.5 Other Optimization Algorithms |
|
|
244 | (1) |
|
9.3.6 SVM and Regularization Methods* |
|
|
244 | (3) |
|
9.4 Multiclass Support Vector Machines |
|
|
247 | (1) |
|
9.4.1 One-Versus-Rest Method |
|
|
247 | (1) |
|
9.4.2 One-Versus-One Method |
|
|
247 | (1) |
|
|
248 | (1) |
|
9.5 Support Vector Machines for Regression |
|
|
248 | (8) |
|
9.5.1 Regression with Quadratic e-Insensitive Loss |
|
|
249 | (3) |
|
9.5.2 Kernel Ridge Regression |
|
|
252 | (2) |
|
9.5.3 Regression with Linear e-Insensitive Loss |
|
|
254 | (2) |
|
9.5.4 Other Approaches to Support Vector Regression |
|
|
256 | (1) |
|
|
256 | (2) |
|
9.6.1 Regression with Gaussian Processes |
|
|
257 | (1) |
|
9.7 Kernel Fisher Discriminant |
|
|
258 | (4) |
|
9.7.1 Fisher's Linear Discriminant |
|
|
258 | (2) |
|
9.7.2 Fisher Discriminant in Feature Space |
|
|
260 | (2) |
|
|
262 | (2) |
|
9.8.1 Centering in Feature Space |
|
|
262 | (2) |
|
|
264 | (5) |
|
9.9.1 One-Class SVM Optimization |
|
|
267 | (2) |
|
9.10 Kernel Clustering Methods |
|
|
269 | (9) |
|
|
270 | (2) |
|
|
272 | (1) |
|
|
272 | (1) |
|
9.10.4 One-Class SVM Extensions |
|
|
273 | (1) |
|
9.10.5 Kernel Fuzzy Clustering Methods |
|
|
274 | (4) |
|
|
278 | (9) |
|
9.11.1 Shi and Malik Algorithm |
|
|
280 | (1) |
|
9.11.2 Ng-Jordan-Weiss' Algorithm |
|
|
281 | (1) |
|
|
282 | (1) |
|
9.11.4 Connection Between Spectral and Kernel Clustering Methods |
|
|
283 | (4) |
|
|
287 | (1) |
|
|
287 | (8) |
|
|
288 | (1) |
|
|
289 | (6) |
|
10 Markovian Models for Sequential Data |
|
|
295 | (46) |
|
|
295 | (1) |
|
10.2 Hidden Markov Models |
|
|
296 | (4) |
|
10.2.1 Emission Probability Functions |
|
|
300 | (1) |
|
|
300 | (1) |
|
10.4 The Likelihood Problem and the Trellis** |
|
|
301 | (3) |
|
10.5 The Decoding Problem** |
|
|
304 | (4) |
|
10.6 The Learning Problem** |
|
|
308 | (7) |
|
10.6.1 Parameter Initialization |
|
|
309 | (1) |
|
10.6.2 Estimation of the Initial State Probabilities |
|
|
310 | (1) |
|
10.6.3 Estimation of the Transition Probabilities |
|
|
311 | (1) |
|
10.6.4 Emission Probability Function Parameters Estimation |
|
|
312 | (3) |
|
|
315 | (2) |
|
10.8 Linear-Chain Conditional Random Fields |
|
|
317 | (6) |
|
10.8.1 From HMMs to Linear-Chain CRFs |
|
|
319 | (2) |
|
|
321 | (1) |
|
10.8.3 The Three Problems |
|
|
322 | (1) |
|
10.9 The Inference Problem for Linear Chain CRFs |
|
|
323 | (1) |
|
10.10 The Training Problem for Linear Chain CRFs |
|
|
323 | (2) |
|
10.11 JV-gram Models and Statistical Language Modeling |
|
|
325 | (5) |
|
|
325 | (1) |
|
|
326 | (1) |
|
10.11.3 N-grams Parameter Estimation |
|
|
327 | (1) |
|
10.11.4 The Sparseness Problem and the Language Case |
|
|
328 | (2) |
|
10.12 Discounting and Smoothing Methods for N-gram Models** |
|
|
330 | (6) |
|
10.12.1 The Leaving-One-Out Method |
|
|
331 | (2) |
|
10.12.2 The Turing Good Estimates |
|
|
333 | (1) |
|
10.12.3 Katz's Discounting Model |
|
|
334 | (2) |
|
10.13 Building a Language Model with JV-grams |
|
|
336 | (5) |
|
|
337 | (1) |
|
|
338 | (3) |
|
11 Feature Extraction Methods and Manifold Learning Methods |
|
|
341 | (48) |
|
|
341 | (2) |
|
11.2 *The Curse of Dimensionality |
|
|
343 | (1) |
|
|
344 | (13) |
|
|
345 | (2) |
|
|
347 | (8) |
|
|
355 | (2) |
|
11.4 Principal Component Analysis |
|
|
357 | (5) |
|
11.4.1 PCA as ID Estimator |
|
|
359 | (2) |
|
11.4.2 Nonlinear Principal Component Analysis |
|
|
361 | (1) |
|
11.5 Independent Component Analysis |
|
|
362 | (8) |
|
11.5.1 Statistical Independence |
|
|
363 | (1) |
|
|
364 | (3) |
|
11.5.3 ICA by Mutual Information Minimization |
|
|
367 | (2) |
|
|
369 | (1) |
|
11.6 Multidimensional Scaling Methods |
|
|
370 | (2) |
|
|
371 | (1) |
|
|
372 | (7) |
|
11.7.1 The Manifold Learning Problem |
|
|
372 | (2) |
|
|
374 | (1) |
|
11.7.3 Locally Linear Embedding |
|
|
375 | (3) |
|
11.7.4 Laplacian Eigenmaps |
|
|
378 | (1) |
|
|
379 | (10) |
|
|
379 | (2) |
|
|
381 | (8) |
|
|
|
12 Speech and Handwriting Recognition |
|
|
389 | (32) |
|
|
389 | (1) |
|
12.2 The General Approach |
|
|
390 | (2) |
|
|
392 | (5) |
|
12.3.1 The Handwriting Front End |
|
|
393 | (1) |
|
12.3.2 The Speech Front End |
|
|
394 | (3) |
|
|
397 | (3) |
|
12.4.1 Lexicon and Training Set |
|
|
397 | (1) |
|
12.4.2 Hidden Markov Models Training |
|
|
398 | (2) |
|
12.5 Recognition and Performance Measures |
|
|
400 | (3) |
|
|
400 | (1) |
|
12.5.2 Performance Measurement |
|
|
401 | (2) |
|
12.6 Recognition Experiments |
|
|
403 | (6) |
|
|
404 | (1) |
|
12.6.2 N-gram Model Performance |
|
|
405 | (2) |
|
12.6.3 Cambridge Database Results |
|
|
407 | (1) |
|
12.6.4 IAM Database Results |
|
|
408 | (1) |
|
12.7 Speech Recognition Results |
|
|
409 | (2) |
|
|
411 | (10) |
|
12.8.1 Applications of Handwriting Recognition |
|
|
411 | (2) |
|
12.8.2 Applications of Speech Recognition |
|
|
413 | (2) |
|
|
415 | (6) |
|
13 Automatic Face Recognition |
|
|
421 | (28) |
|
|
421 | (2) |
|
13.2 Face Recognition: General Approach |
|
|
423 | (1) |
|
13.3 Face Detection and Localization |
|
|
424 | (4) |
|
13.3.1 Face Segmentation and Normalization with TorchVision |
|
|
426 | (2) |
|
13.4 Lighting Normalization |
|
|
428 | (2) |
|
13.4.1 Center/Surround Retinex |
|
|
428 | (1) |
|
13.4.2 Gross and Brajovic's Algorithm |
|
|
429 | (1) |
|
13.4.3 Normalization with TorchVision |
|
|
429 | (1) |
|
|
430 | (7) |
|
13.5.1 Holistic Approaches |
|
|
430 | (4) |
|
|
434 | (1) |
|
13.5.3 Feature Extraction with TorchVision |
|
|
434 | (3) |
|
|
437 | (2) |
|
13.7 Performance Assessment |
|
|
439 | (3) |
|
13.7.1 The FERET Database |
|
|
440 | (1) |
|
|
441 | (1) |
|
|
442 | (7) |
|
13.8.1 Data and Experimental Protocol |
|
|
443 | (1) |
|
13.8.2 Euclidean Distance-Based Classifier |
|
|
443 | (2) |
|
13.8.3 SVM-Based Classification |
|
|
445 | (1) |
|
|
445 | (4) |
|
14 Video Segmentation and Keyframe Extraction |
|
|
449 | (18) |
|
|
449 | (2) |
|
14.2 Applications of Video Segmentation |
|
|
451 | (1) |
|
14.3 Shot Boundary Detection |
|
|
452 | (6) |
|
14.3.1 Pixel-Based Approaches |
|
|
453 | (2) |
|
14.3.2 Block-Based Approaches |
|
|
455 | (1) |
|
14.3.3 Histogram-Based Approaches |
|
|
455 | (1) |
|
14.3.4 Clustering-Based Approaches |
|
|
456 | (1) |
|
14.3.5 Performance Measures |
|
|
457 | (1) |
|
14.4 Shot Boundary Detection with Torchvision |
|
|
458 | (2) |
|
|
460 | (2) |
|
14.6 Keyframe Extraction with Torchvision and Torch |
|
|
462 | (5) |
|
|
463 | (4) |
|
15 Real-Time Hand Pose Recognition |
|
|
467 | (18) |
|
|
467 | (1) |
|
15.2 Hand Pose Recognition Methods |
|
|
468 | (3) |
|
15.3 Hand Pose Recognition by a Data Glove |
|
|
471 | (4) |
|
15.4 Hand Pose Color-Based Recognition |
|
|
475 | (10) |
|
15.4.1 Segmentation Module |
|
|
476 | (2) |
|
15.4.2 Feature Extraction |
|
|
478 | (1) |
|
|
479 | (1) |
|
15.4.4 Experimental Results |
|
|
480 | (3) |
|
|
483 | (2) |
|
16 Automatic Personality Perception |
|
|
485 | (16) |
|
|
485 | (1) |
|
|
486 | (2) |
|
16.2.1 Nonverbal Behaviour |
|
|
487 | (1) |
|
|
488 | (1) |
|
16.3 Personality and Its Measurement |
|
|
488 | (2) |
|
16.4 Speech-Based Automatic Personality Perception |
|
|
490 | (4) |
|
16.4.1 The SSPNet Speaker Personality Corpus |
|
|
491 | (1) |
|
|
492 | (1) |
|
16.4.3 Extraction of Short-Term Features |
|
|
492 | (1) |
|
16.4.4 Extraction of Statisticals |
|
|
493 | (1) |
|
|
493 | (1) |
|
16.5 Experiments and Results |
|
|
494 | (2) |
|
|
496 | (5) |
|
|
497 | (4) |
|
|
Appendix A Statistics |
|
501 | (12) |
Appendix B Signal Processing |
|
513 | (12) |
Appendix C Matrix Algebra |
|
525 | (6) |
Appendix D Mathematical Foundations of Kernel Methods |
|
531 | (20) |
Index |
|
551 | |