Muutke küpsiste eelistusi

E-raamat: Machine Learning for Audio, Image and Video Analysis: Theory and Applications

  • Formaat - PDF+DRM
  • Hind: 67,91 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This second edition focuses on audio, image and video data, the three main types of input that machines deal with when interacting with the real world. A set of appendices provides the reader with self-contained introductions to the mathematical background necessary to read the book.
Divided into three main parts, From Perception to Computation introduces methodologies aimed at representing the data in forms suitable for computer processing, especially when it comes to audio and images. Whilst the second part,Machine Learning includes an extensive overview of statistical techniques aimed at addressing three main problems, namely classification (automatically assigning a data sample to one of the classes belonging to a predefined set), clustering (automatically grouping data samples according to the similarity of their properties) and sequence analysis (automatically mapping a sequence of observations into a sequence of human-understandable symbols). The third partApplications shows how the abstract problems defined in the second part underlie technologies capable to perform complex tasks such as the recognition of hand gestures or the transcription of handwritten data.

Machine Learning for Audio, Image and Video Analysis is suitable for students to acquire a solid background in machine learning as well as for practitioners to deepen their knowledge of the state-of-the-art. All application chapters are based on publicly available data and free software packages, thus allowing readers to replicate the experiments.

Arvustused

This nice book of over 560 pages is really useful for students, researchers, practitioners, and anybody who is interested in machine learning and related subjects. (Michael M. Dediu, Mathematical Reviews, May, 2017)

1 Introduction
1(12)
1.1 Two Fundamental Questions
1(3)
1.1.1 Why Should One Read the Book?
1(1)
1.1.2 What Is the Book About?
2(2)
1.2 The Structure of the Book
4(4)
1.2.1 Part I: From Perception to Computation
4(1)
1.2.2 Part II: Machine Learning
5(1)
1.2.3 Part III: Applications
6(1)
1.2.4 Appendices
7(1)
1.3 How to Read This Book
8(1)
1.3.1 Background and Learning Objectives
8(1)
1.3.2 Difficulty Level
8(1)
1.3.3 Problems
9(1)
1.3.4 Software
9(1)
1.4 Reading Tracks
9(4)
Part I From Perception to Computation
2 Audio Acquisition, Representation and Storage
13(44)
2.1 Introduction
13(2)
2.2 Sound Physics, Production and Perception
15(7)
2.2.1 Acoustic Waves Physics
15(3)
2.2.2 Speech Production
18(2)
2.2.3 Sound Perception
20(2)
2.3 Audio Acquisition
22(10)
2.3.1 Sampling and Aliasing
23(2)
2.3.2 The Sampling Theorem**
25(3)
2.3.3 Linear Quantization
28(2)
2.3.4 Nonuniform Scalar Quantization
30(2)
2.4 Audio Encoding and Storage Formats
32(6)
2.4.1 Linear PCM and Compact Discs
33(1)
2.4.2 MPEG Digital Audio Coding
34(1)
2.4.3 AAC Digital Audio Coding
35(1)
2.4.4 Perceptual Coding
36(2)
2.5 Time-Domain Audio Processing
38(9)
2.5.1 Linear and Time-Invariant Systems
39(1)
2.5.2 Short-Term Analysis
40(3)
2.5.3 Time-Domain Measures
43(4)
2.6 Linear Predictive Coding
47(5)
2.6.1 Parameter Estimation
50(2)
2.7 Conclusions
52(5)
Problems
52(1)
References
53(4)
3 Image and Video Acquisition, Representation and Storage
57(42)
3.1 Introduction
57(1)
3.2 Human Eye Physiology
58(2)
3.2.1 Structure of the Human Eye
58(2)
3.3 Image Acquisition Devices
60(3)
3.3.1 Digital Camera
60(3)
3.4 Color Representation
63(13)
3.4.1 Human Color Perception
63(1)
3.4.2 Color Models
64(12)
3.5 Image Formats
76(5)
3.5.1 Image File Format Standards
76(1)
3.5.2 JPEG Standard
77(4)
3.6 Image Descriptors
81(7)
3.6.1 Global Image Descriptors
81(4)
3.6.2 SIFT Descriptors
85(3)
3.7 Video Principles
88(1)
3.8 MPEG Standard
89(4)
3.8.1 Further MPEG Standards
90(3)
3.9 Conclusions
93(6)
Problems
93(2)
References
95(4)
Part II Machine Learning
4 Machine Learning
99(8)
4.1 Introduction
99(1)
4.2 Taxonomy of Machine Learning
100(1)
4.2.1 Rote Learning
100(1)
4.2.2 Learning from Instruction
101(1)
4.2.3 Learning by Analogy
101(1)
4.3 Learning from Examples
101(4)
4.3.1 Supervised Learning
102(1)
4.3.2 Reinforcement Learning
103(1)
4.3.3 Unsupervised Learning
103(1)
4.3.4 Semi-supervised Learning
104(1)
4.4 Conclusions
105(2)
References
105(2)
5 Bayesian Theory of Decision
107(24)
5.1 Introduction
107(1)
5.2 Bayes Decision Rule
108(2)
5.3 Bayes Classifier*
110(2)
5.4 Loss Function
112(3)
5.4.1 Binary Classification
114(1)
5.5 Zero-One Loss Function
115(1)
5.6 Discriminant Functions
116(2)
5.6.1 Binary Classification Case
117(1)
5.7 Gaussian Density
118(4)
5.7.1 Univariate Gaussian Density
118(1)
5.7.2 Multivariate Gaussian Density
119(1)
5.7.3 Whitening Transformation
120(2)
5.8 Discriminant Functions for Gaussian Likelihood
122(3)
5.8.1 Features Are Statistically Independent
122(1)
5.8.2 Covariance Matrix is the Same for all Classes
123(2)
5.8.3 Covariance Matrix is Not the Same for all Classes
125(1)
5.9 Receiver Operating Curves
125(2)
5.10 Conclusions
127(4)
Problems
128(1)
References
129(2)
6 Clustering Methods
131(38)
6.1 Introduction
131(2)
6.2 Expectation and Maximization Algorithm*
133(3)
6.2.1 Basic EM*
134(2)
6.3 Basic Notions and Terminology
136(5)
6.3.1 Codebooks and Codevectors
136(1)
6.3.2 Quantization Error Minimization
137(1)
6.3.3 Entropy Maximization
138(1)
6.3.4 Vector Quantization
139(2)
6.4 K-Means
141(5)
6.4.1 Batch K-Means
142(1)
6.4.2 Online K-Means
143(3)
6.4.3 K-Means Software Packages
146(1)
6.5 Self-Organizing Maps
146(3)
6.5.1 SOM Software Packages
148(1)
6.5.2 SOM Drawbacks
148(1)
6.6 Neural Gas and Topology Representing Network
149(2)
6.6.1 Neural Gas
149(1)
6.6.2 Topology Representing Network
150(1)
6.6.3 Neural Gas and TRN Software Package
151(1)
6.6.4 Neural Gas and TRN Drawbacks
151(1)
6.7 General Topographic Mapping*
151(4)
6.7.1 Latent Variables*
152(1)
6.7.2 Optimization by EM Algorithm*
153(1)
6.7.3 GTM Versus SOM*
154(1)
6.7.4 GTM Software Package
155(1)
6.8 Fuzzy Clustering Algorithms
155(2)
6.8.1 FCM
156(1)
6.9 Hierarchical Clustering
157(2)
6.10 Mixtures of Gaussians
159(4)
6.10.1 The E-Step
160(1)
6.10.2 The M-Step
161(2)
6.11 Conclusion
163(6)
Problems
164(1)
References
165(4)
7 Foundations of Statistical Learning and Model Selection
169(22)
7.1 Introduction
169(1)
7.2 Bias-Variance Dilemma
170(3)
7.2.1 Bias-Variance Dilemma for Regression
170(1)
7.2.2 Bias-Variance Decomposition for Classification*
171(2)
7.3 Model Complexity
173(3)
7.4 VC Dimension and Structural Risk Minimization
176(3)
7.5 Statistical Learning Theory*
179(3)
7.5.1 Vapnik-Chervonenkis Theory
180(2)
7.6 AIC and BIC Criteria
182(2)
7.6.1 Akaike Information Criterion
182(1)
7.6.2 Bayesian Information Criterion
183(1)
7.7 Minimum Description Length Approach
184(2)
7.8 Crossvalidation
186(2)
7.8.1 Generalized Crossvalidation
186(2)
7.9 Conclusion
188(3)
Problems
188(1)
References
189(2)
8 Supervised Neural Networks and Ensemble Methods
191(38)
8.1 Introduction
191(1)
8.2 Artificial Neural Networks and Neural Computation
192(1)
8.3 Artificial Neurons
193(3)
8.4 Connections and Network Architectures
196(2)
8.5 Single-Layer Networks
198(5)
8.5.1 Linear Discriminant Functions and Single-Layer Networks
199(1)
8.5.2 Linear Discriminants and the Logistic Sigmoid
200(1)
8.5.3 Generalized Linear Discriminants and the Perceptron
201(2)
8.6 Multilayer Networks
203(2)
8.6.1 The Multilayer Perceptron
204(1)
8.7 Multilayer Networks Training
205(7)
8.7.1 Error Back-Propagation for Feed-Forwards Networks*
206(2)
8.7.2 Parameter Update: The Error Surface
208(2)
8.7.3 Parameters Update: The Gradient Descent*
210(2)
8.7.4 The Torch Package
212(1)
8.8 Learning Vector Quantization
212(3)
8.8.1 The LVQ_PAK Software Package
214(1)
8.9 Nearest Neighbour Classification
215(2)
8.9.1 Probabilistic Interpretation
217(1)
8.10 Ensemble Methods
217(7)
8.10.1 Classifier Diversity and Ensemble Performance*
218(2)
8.10.2 Creating Ensemble of Diverse Classifiers
220(4)
8.11 Conclusions
224(5)
Problems
224(1)
References
225(4)
9 Kernel Methods
229(66)
9.1 Introduction
229(2)
9.2 Lagrange Method and Kuhn Tucker Theorem
231(4)
9.2.1 Lagrange Multipliers Method
231(2)
9.2.2 Kuhn Tucker Theorem
233(2)
9.3 Support Vector Machines for Classification
235(12)
9.3.1 Optimal Hyperplane Algorithm
236(2)
9.3.2 Support Vector Machine Construction
238(3)
9.3.3 Algorithmic Approaches to Solve Quadratic Programming
241(1)
9.3.4 Sequential Minimal Optimization
242(2)
9.3.5 Other Optimization Algorithms
244(1)
9.3.6 SVM and Regularization Methods*
244(3)
9.4 Multiclass Support Vector Machines
247(1)
9.4.1 One-Versus-Rest Method
247(1)
9.4.2 One-Versus-One Method
247(1)
9.4.3 Other Methods
248(1)
9.5 Support Vector Machines for Regression
248(8)
9.5.1 Regression with Quadratic e-Insensitive Loss
249(3)
9.5.2 Kernel Ridge Regression
252(2)
9.5.3 Regression with Linear e-Insensitive Loss
254(2)
9.5.4 Other Approaches to Support Vector Regression
256(1)
9.6 Gaussian Processes
256(2)
9.6.1 Regression with Gaussian Processes
257(1)
9.7 Kernel Fisher Discriminant
258(4)
9.7.1 Fisher's Linear Discriminant
258(2)
9.7.2 Fisher Discriminant in Feature Space
260(2)
9.8 Kernel PCA
262(2)
9.8.1 Centering in Feature Space
262(2)
9.9 One-Class SVM
264(5)
9.9.1 One-Class SVM Optimization
267(2)
9.10 Kernel Clustering Methods
269(9)
9.10.1 Kernel K-Means
270(2)
9.10.2 Kernel SOM
272(1)
9.10.3 Kernel Neural Gas
272(1)
9.10.4 One-Class SVM Extensions
273(1)
9.10.5 Kernel Fuzzy Clustering Methods
274(4)
9.11 Spectral Clustering
278(9)
9.11.1 Shi and Malik Algorithm
280(1)
9.11.2 Ng-Jordan-Weiss' Algorithm
281(1)
9.11.3 Other Methods
282(1)
9.11.4 Connection Between Spectral and Kernel Clustering Methods
283(4)
9.12 Software Packages
287(1)
9.13 Conclusion
287(8)
Problems
288(1)
References
289(6)
10 Markovian Models for Sequential Data
295(46)
10.1 Introduction
295(1)
10.2 Hidden Markov Models
296(4)
10.2.1 Emission Probability Functions
300(1)
10.3 The Three Problems
300(1)
10.4 The Likelihood Problem and the Trellis**
301(3)
10.5 The Decoding Problem**
304(4)
10.6 The Learning Problem**
308(7)
10.6.1 Parameter Initialization
309(1)
10.6.2 Estimation of the Initial State Probabilities
310(1)
10.6.3 Estimation of the Transition Probabilities
311(1)
10.6.4 Emission Probability Function Parameters Estimation
312(3)
10.7 HMM Variants
315(2)
10.8 Linear-Chain Conditional Random Fields
317(6)
10.8.1 From HMMs to Linear-Chain CRFs
319(2)
10.8.2 General CRFs
321(1)
10.8.3 The Three Problems
322(1)
10.9 The Inference Problem for Linear Chain CRFs
323(1)
10.10 The Training Problem for Linear Chain CRFs
323(2)
10.11 JV-gram Models and Statistical Language Modeling
325(5)
10.11.1 N-gram Models
325(1)
10.11.2 The Perplexity
326(1)
10.11.3 N-grams Parameter Estimation
327(1)
10.11.4 The Sparseness Problem and the Language Case
328(2)
10.12 Discounting and Smoothing Methods for N-gram Models**
330(6)
10.12.1 The Leaving-One-Out Method
331(2)
10.12.2 The Turing Good Estimates
333(1)
10.12.3 Katz's Discounting Model
334(2)
10.13 Building a Language Model with JV-grams
336(5)
Problems
337(1)
References
338(3)
11 Feature Extraction Methods and Manifold Learning Methods
341(48)
11.1 Introduction
341(2)
11.2 *The Curse of Dimensionality
343(1)
11.3 Data Dimensionality
344(13)
11.3.1 Local Methods
345(2)
11.3.2 Global Methods
347(8)
11.3.3 Mixed Methods
355(2)
11.4 Principal Component Analysis
357(5)
11.4.1 PCA as ID Estimator
359(2)
11.4.2 Nonlinear Principal Component Analysis
361(1)
11.5 Independent Component Analysis
362(8)
11.5.1 Statistical Independence
363(1)
11.5.2 ICA Estimation
364(3)
11.5.3 ICA by Mutual Information Minimization
367(2)
11.5.4 FastICA Algorithm
369(1)
11.6 Multidimensional Scaling Methods
370(2)
11.6.1 Sammon's Mapping
371(1)
11.7 Manifold Learning
372(7)
11.7.1 The Manifold Learning Problem
372(2)
11.7.2 Isomap
374(1)
11.7.3 Locally Linear Embedding
375(3)
11.7.4 Laplacian Eigenmaps
378(1)
11.8 Conclusion
379(10)
Problems
379(2)
References
381(8)
Part III Applications
12 Speech and Handwriting Recognition
389(32)
12.1 Introduction
389(1)
12.2 The General Approach
390(2)
12.3 The Front End
392(5)
12.3.1 The Handwriting Front End
393(1)
12.3.2 The Speech Front End
394(3)
12.4 HMM Training
397(3)
12.4.1 Lexicon and Training Set
397(1)
12.4.2 Hidden Markov Models Training
398(2)
12.5 Recognition and Performance Measures
400(3)
12.5.1 Recognition
400(1)
12.5.2 Performance Measurement
401(2)
12.6 Recognition Experiments
403(6)
12.6.1 Lexicon Selection
404(1)
12.6.2 N-gram Model Performance
405(2)
12.6.3 Cambridge Database Results
407(1)
12.6.4 IAM Database Results
408(1)
12.7 Speech Recognition Results
409(2)
12.8 Applications
411(10)
12.8.1 Applications of Handwriting Recognition
411(2)
12.8.2 Applications of Speech Recognition
413(2)
References
415(6)
13 Automatic Face Recognition
421(28)
13.1 Introduction
421(2)
13.2 Face Recognition: General Approach
423(1)
13.3 Face Detection and Localization
424(4)
13.3.1 Face Segmentation and Normalization with TorchVision
426(2)
13.4 Lighting Normalization
428(2)
13.4.1 Center/Surround Retinex
428(1)
13.4.2 Gross and Brajovic's Algorithm
429(1)
13.4.3 Normalization with TorchVision
429(1)
13.5 Feature Extraction
430(7)
13.5.1 Holistic Approaches
430(4)
13.5.2 Local Approaches
434(1)
13.5.3 Feature Extraction with TorchVision
434(3)
13.6 Classification
437(2)
13.7 Performance Assessment
439(3)
13.7.1 The FERET Database
440(1)
13.7.2 The FRVT Database
441(1)
13.8 Experiments
442(7)
13.8.1 Data and Experimental Protocol
443(1)
13.8.2 Euclidean Distance-Based Classifier
443(2)
13.8.3 SVM-Based Classification
445(1)
References
445(4)
14 Video Segmentation and Keyframe Extraction
449(18)
14.1 Introduction
449(2)
14.2 Applications of Video Segmentation
451(1)
14.3 Shot Boundary Detection
452(6)
14.3.1 Pixel-Based Approaches
453(2)
14.3.2 Block-Based Approaches
455(1)
14.3.3 Histogram-Based Approaches
455(1)
14.3.4 Clustering-Based Approaches
456(1)
14.3.5 Performance Measures
457(1)
14.4 Shot Boundary Detection with Torchvision
458(2)
14.5 Keyframe Extraction
460(2)
14.6 Keyframe Extraction with Torchvision and Torch
462(5)
References
463(4)
15 Real-Time Hand Pose Recognition
467(18)
15.1 Introduction
467(1)
15.2 Hand Pose Recognition Methods
468(3)
15.3 Hand Pose Recognition by a Data Glove
471(4)
15.4 Hand Pose Color-Based Recognition
475(10)
15.4.1 Segmentation Module
476(2)
15.4.2 Feature Extraction
478(1)
15.4.3 The Classifier
479(1)
15.4.4 Experimental Results
480(3)
References
483(2)
16 Automatic Personality Perception
485(16)
16.1 Introduction
485(1)
16.2 Previous Work
486(2)
16.2.1 Nonverbal Behaviour
487(1)
16.2.2 Social Media
488(1)
16.3 Personality and Its Measurement
488(2)
16.4 Speech-Based Automatic Personality Perception
490(4)
16.4.1 The SSPNet Speaker Personality Corpus
491(1)
16.4.2 The Approach
492(1)
16.4.3 Extraction of Short-Term Features
492(1)
16.4.4 Extraction of Statisticals
493(1)
16.4.5 Prediction
493(1)
16.5 Experiments and Results
494(2)
16.6 Conclusions
496(5)
References
497(4)
Part IV Appendices
Appendix A Statistics 501(12)
Appendix B Signal Processing 513(12)
Appendix C Matrix Algebra 525(6)
Appendix D Mathematical Foundations of Kernel Methods 531(20)
Index 551