Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Machine Learning for Audio, Image and Video Analysis: Theory and Applications

3.60/5 (5 hinnangut Goodreads-ist)

Francesco Camastra, Alessandro Vinciarelli

Formaat: PDF+DRM
Sari: Advanced Information and Knowledge Processing
Ilmumisaeg: 21-Jul-2015
Kirjastus: Springer London Ltd
Keel: eng
ISBN-13: 9781447167358

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 67,91 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Sari: Advanced Information and Knowledge Processing
Ilmumisaeg: 21-Jul-2015
Kirjastus: Springer London Ltd
Keel: eng
ISBN-13: 9781447167358

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

This second edition focuses on audio, image and video data, the three main types of input that machines deal with when interacting with the real world. A set of appendices provides the reader with self-contained introductions to the mathematical background necessary to read the book.
Divided into three main parts, From Perception to Computation introduces methodologies aimed at representing the data in forms suitable for computer processing, especially when it comes to audio and images. Whilst the second part,Machine Learning includes an extensive overview of statistical techniques aimed at addressing three main problems, namely classification (automatically assigning a data sample to one of the classes belonging to a predefined set), clustering (automatically grouping data samples according to the similarity of their properties) and sequence analysis (automatically mapping a sequence of observations into a sequence of human-understandable symbols). The third partApplications shows how the abstract problems defined in the second part underlie technologies capable to perform complex tasks such as the recognition of hand gestures or the transcription of handwritten data.

Machine Learning for Audio, Image and Video Analysis is suitable for students to acquire a solid background in machine learning as well as for practitioners to deepen their knowledge of the state-of-the-art. All application chapters are based on publicly available data and free software packages, thus allowing readers to replicate the experiments.

Arvustused

This nice book of over 560 pages is really useful for students, researchers, practitioners, and anybody who is interested in machine learning and related subjects. (Michael M. Dediu, Mathematical Reviews, May, 2017)

1 Introduction

(12)

1.1 Two Fundamental Questions

(3)

1.1.1 Why Should One Read the Book?

(1)

1.1.2 What Is the Book About?

(2)

1.2 The Structure of the Book

(4)

1.2.1 Part I: From Perception to Computation

(1)

1.2.2 Part II: Machine Learning

(1)

1.2.3 Part III: Applications

(1)

1.2.4 Appendices

(1)

1.3 How to Read This Book

(1)

1.3.1 Background and Learning Objectives

(1)

1.3.2 Difficulty Level

(1)

1.3.3 Problems

(1)

1.3.4 Software

(1)

1.4 Reading Tracks

(4)

Part I From Perception to Computation

2 Audio Acquisition, Representation and Storage

(44)

2.1 Introduction

(2)

2.2 Sound Physics, Production and Perception

(7)

2.2.1 Acoustic Waves Physics

(3)

2.2.2 Speech Production

(2)

2.2.3 Sound Perception

(2)

2.3 Audio Acquisition

(10)

2.3.1 Sampling and Aliasing

(2)

2.3.2 The Sampling Theorem**

(3)

2.3.3 Linear Quantization

(2)

2.3.4 Nonuniform Scalar Quantization

(2)

2.4 Audio Encoding and Storage Formats

(6)

2.4.1 Linear PCM and Compact Discs

(1)

2.4.2 MPEG Digital Audio Coding

(1)

2.4.3 AAC Digital Audio Coding

(1)

2.4.4 Perceptual Coding

(2)

2.5 Time-Domain Audio Processing

(9)

2.5.1 Linear and Time-Invariant Systems

(1)

2.5.2 Short-Term Analysis

(3)

2.5.3 Time-Domain Measures

(4)

2.6 Linear Predictive Coding

(5)

2.6.1 Parameter Estimation

(2)

2.7 Conclusions

(5)

Problems

(1)

References

(4)

3 Image and Video Acquisition, Representation and Storage

(42)

3.1 Introduction

(1)

3.2 Human Eye Physiology

(2)

3.2.1 Structure of the Human Eye

(2)

3.3 Image Acquisition Devices

(3)

3.3.1 Digital Camera

(3)

3.4 Color Representation

(13)

3.4.1 Human Color Perception

(1)

3.4.2 Color Models

(12)

3.5 Image Formats

(5)

3.5.1 Image File Format Standards

(1)

3.5.2 JPEG Standard

(4)

3.6 Image Descriptors

(7)

3.6.1 Global Image Descriptors

(4)

3.6.2 SIFT Descriptors

(3)

3.7 Video Principles

(1)

3.8 MPEG Standard

(4)

3.8.1 Further MPEG Standards

(3)

3.9 Conclusions

(6)

Problems

(2)

References

(4)

Part II Machine Learning

4 Machine Learning

(8)

4.1 Introduction

(1)

4.2 Taxonomy of Machine Learning

100

(1)

4.2.1 Rote Learning

100

(1)

4.2.2 Learning from Instruction

101

(1)

4.2.3 Learning by Analogy

101

(1)

4.3 Learning from Examples

101

(4)

4.3.1 Supervised Learning

102

(1)

4.3.2 Reinforcement Learning

103

(1)

4.3.3 Unsupervised Learning

103

(1)

4.3.4 Semi-supervised Learning

104

(1)

4.4 Conclusions

105

(2)

References

105

(2)

5 Bayesian Theory of Decision

107

(24)

5.1 Introduction

107

(1)

5.2 Bayes Decision Rule

108

(2)

5.3 Bayes Classifier*

110

(2)

5.4 Loss Function

112

(3)

5.4.1 Binary Classification

114

(1)

5.5 Zero-One Loss Function

115

(1)

5.6 Discriminant Functions

116

(2)

5.6.1 Binary Classification Case

117

(1)

5.7 Gaussian Density

118

(4)

5.7.1 Univariate Gaussian Density

118

(1)

5.7.2 Multivariate Gaussian Density

119

(1)

5.7.3 Whitening Transformation

120

(2)

5.8 Discriminant Functions for Gaussian Likelihood

122

(3)

5.8.1 Features Are Statistically Independent

122

(1)

5.8.2 Covariance Matrix is the Same for all Classes

123

(2)

5.8.3 Covariance Matrix is Not the Same for all Classes

125

(1)

5.9 Receiver Operating Curves

125

(2)

5.10 Conclusions

127

(4)

Problems

128

(1)

References

129

(2)

6 Clustering Methods

131

(38)

6.1 Introduction

131

(2)

6.2 Expectation and Maximization Algorithm*

133

(3)

6.2.1 Basic EM*

134

(2)

6.3 Basic Notions and Terminology

136

(5)

6.3.1 Codebooks and Codevectors

136

(1)

6.3.2 Quantization Error Minimization

137

(1)

6.3.3 Entropy Maximization

138

(1)

6.3.4 Vector Quantization

139

(2)

6.4 K-Means

141

(5)

6.4.1 Batch K-Means

142

(1)

6.4.2 Online K-Means

143

(3)

6.4.3 K-Means Software Packages

146

(1)

6.5 Self-Organizing Maps

146

(3)

6.5.1 SOM Software Packages

148

(1)

6.5.2 SOM Drawbacks

148

(1)

6.6 Neural Gas and Topology Representing Network

149

(2)

6.6.1 Neural Gas

149

(1)

6.6.2 Topology Representing Network

150

(1)

6.6.3 Neural Gas and TRN Software Package

151

(1)

6.6.4 Neural Gas and TRN Drawbacks

151

(1)

6.7 General Topographic Mapping*

151

(4)

6.7.1 Latent Variables*

152

(1)

6.7.2 Optimization by EM Algorithm*

153

(1)

6.7.3 GTM Versus SOM*

154

(1)

6.7.4 GTM Software Package

155

(1)

6.8 Fuzzy Clustering Algorithms

155

(2)

6.8.1 FCM

156

(1)

6.9 Hierarchical Clustering

157

(2)

6.10 Mixtures of Gaussians

159

(4)

6.10.1 The E-Step

160

(1)

6.10.2 The M-Step

161

(2)

6.11 Conclusion

163

(6)

Problems

164

(1)

References

165

(4)

7 Foundations of Statistical Learning and Model Selection

169

(22)

7.1 Introduction

169

(1)

7.2 Bias-Variance Dilemma

170

(3)

7.2.1 Bias-Variance Dilemma for Regression

170

(1)

7.2.2 Bias-Variance Decomposition for Classification*

171

(2)

7.3 Model Complexity

173

(3)

7.4 VC Dimension and Structural Risk Minimization

176

(3)

7.5 Statistical Learning Theory*

179

(3)

7.5.1 Vapnik-Chervonenkis Theory

180

(2)

7.6 AIC and BIC Criteria

182

(2)

7.6.1 Akaike Information Criterion

182

(1)

7.6.2 Bayesian Information Criterion

183

(1)

7.7 Minimum Description Length Approach

184

(2)

7.8 Crossvalidation

186

(2)

7.8.1 Generalized Crossvalidation

186

(2)

7.9 Conclusion

188

(3)

Problems

188

(1)

References

189

(2)

8 Supervised Neural Networks and Ensemble Methods

191

(38)

8.1 Introduction

191

(1)

8.2 Artificial Neural Networks and Neural Computation

192

(1)

8.3 Artificial Neurons

193

(3)

8.4 Connections and Network Architectures

196

(2)

8.5 Single-Layer Networks

198

(5)

8.5.1 Linear Discriminant Functions and Single-Layer Networks

199

(1)

8.5.2 Linear Discriminants and the Logistic Sigmoid

200

(1)

8.5.3 Generalized Linear Discriminants and the Perceptron

201

(2)

8.6 Multilayer Networks

203

(2)

8.6.1 The Multilayer Perceptron

204

(1)

8.7 Multilayer Networks Training

205

(7)

8.7.1 Error Back-Propagation for Feed-Forwards Networks*

206

(2)

8.7.2 Parameter Update: The Error Surface

208

(2)

8.7.3 Parameters Update: The Gradient Descent*

210

(2)

8.7.4 The Torch Package

212

(1)

8.8 Learning Vector Quantization

212

(3)

8.8.1 The LVQ_PAK Software Package

214

(1)

8.9 Nearest Neighbour Classification

215

(2)

8.9.1 Probabilistic Interpretation

217

(1)

8.10 Ensemble Methods

217

(7)

8.10.1 Classifier Diversity and Ensemble Performance*

218

(2)

8.10.2 Creating Ensemble of Diverse Classifiers

220

(4)

8.11 Conclusions

224

(5)

Problems

224

(1)

References

225

(4)

9 Kernel Methods

229

(66)

9.1 Introduction

229

(2)

9.2 Lagrange Method and Kuhn Tucker Theorem

231

(4)

9.2.1 Lagrange Multipliers Method

231

(2)

9.2.2 Kuhn Tucker Theorem

233

(2)

9.3 Support Vector Machines for Classification

235

(12)

9.3.1 Optimal Hyperplane Algorithm

236

(2)

9.3.2 Support Vector Machine Construction

238

(3)

9.3.3 Algorithmic Approaches to Solve Quadratic Programming

241

(1)

9.3.4 Sequential Minimal Optimization

242

(2)

9.3.5 Other Optimization Algorithms

244

(1)

9.3.6 SVM and Regularization Methods*

244

(3)

9.4 Multiclass Support Vector Machines

247

(1)

9.4.1 One-Versus-Rest Method

247

(1)

9.4.2 One-Versus-One Method

247

(1)

9.4.3 Other Methods

248

(1)

9.5 Support Vector Machines for Regression

248

(8)

9.5.1 Regression with Quadratic e-Insensitive Loss

249

(3)

9.5.2 Kernel Ridge Regression

252

(2)

9.5.3 Regression with Linear e-Insensitive Loss

254

(2)

9.5.4 Other Approaches to Support Vector Regression

256

(1)

9.6 Gaussian Processes

256

(2)

9.6.1 Regression with Gaussian Processes

257

(1)

9.7 Kernel Fisher Discriminant

258

(4)

9.7.1 Fisher's Linear Discriminant

258

(2)

9.7.2 Fisher Discriminant in Feature Space

260

(2)

9.8 Kernel PCA

262

(2)

9.8.1 Centering in Feature Space

262

(2)

9.9 One-Class SVM

264

(5)

9.9.1 One-Class SVM Optimization

267

(2)

9.10 Kernel Clustering Methods

269

(9)

9.10.1 Kernel K-Means

270

(2)

9.10.2 Kernel SOM

272

(1)

9.10.3 Kernel Neural Gas

272

(1)

9.10.4 One-Class SVM Extensions

273

(1)

9.10.5 Kernel Fuzzy Clustering Methods

274

(4)

9.11 Spectral Clustering

278

(9)

9.11.1 Shi and Malik Algorithm

280

(1)

9.11.2 Ng-Jordan-Weiss' Algorithm

281

(1)

9.11.3 Other Methods

282

(1)

9.11.4 Connection Between Spectral and Kernel Clustering Methods

283

(4)

9.12 Software Packages

287

(1)

9.13 Conclusion

287

(8)

Problems

288

(1)

References

289

(6)

10 Markovian Models for Sequential Data

295

(46)

10.1 Introduction

295

(1)

10.2 Hidden Markov Models

296

(4)

10.2.1 Emission Probability Functions

300

(1)

10.3 The Three Problems

300

(1)

10.4 The Likelihood Problem and the Trellis**

301

(3)

10.5 The Decoding Problem**

304

(4)

10.6 The Learning Problem**

308

(7)

10.6.1 Parameter Initialization

309

(1)

10.6.2 Estimation of the Initial State Probabilities

310

(1)

10.6.3 Estimation of the Transition Probabilities

311

(1)

10.6.4 Emission Probability Function Parameters Estimation

312

(3)

10.7 HMM Variants

315

(2)

10.8 Linear-Chain Conditional Random Fields

317

(6)

10.8.1 From HMMs to Linear-Chain CRFs

319

(2)

10.8.2 General CRFs

321

(1)

10.8.3 The Three Problems

322

(1)

10.9 The Inference Problem for Linear Chain CRFs

323

(1)

10.10 The Training Problem for Linear Chain CRFs

323

(2)

10.11 JV-gram Models and Statistical Language Modeling

325

(5)

10.11.1 N-gram Models

325

(1)

10.11.2 The Perplexity

326

(1)

10.11.3 N-grams Parameter Estimation

327

(1)

10.11.4 The Sparseness Problem and the Language Case

328

(2)

10.12 Discounting and Smoothing Methods for N-gram Models**

330

(6)

10.12.1 The Leaving-One-Out Method

331

(2)

10.12.2 The Turing Good Estimates

333

(1)

10.12.3 Katz's Discounting Model

334

(2)

10.13 Building a Language Model with JV-grams

336

(5)

Problems

337

(1)

References

338

(3)

11 Feature Extraction Methods and Manifold Learning Methods

341

(48)

11.1 Introduction

341

(2)

11.2 *The Curse of Dimensionality

343

(1)

11.3 Data Dimensionality

344

(13)

11.3.1 Local Methods

345

(2)

11.3.2 Global Methods

347

(8)

11.3.3 Mixed Methods

355

(2)

11.4 Principal Component Analysis

357

(5)

11.4.1 PCA as ID Estimator

359

(2)

11.4.2 Nonlinear Principal Component Analysis

361

(1)

11.5 Independent Component Analysis

362

(8)

11.5.1 Statistical Independence

363

(1)

11.5.2 ICA Estimation

364

(3)

11.5.3 ICA by Mutual Information Minimization

367

(2)

11.5.4 FastICA Algorithm

369

(1)

11.6 Multidimensional Scaling Methods

370

(2)

11.6.1 Sammon's Mapping

371

(1)

11.7 Manifold Learning

372

(7)

11.7.1 The Manifold Learning Problem

372

(2)

11.7.2 Isomap

374

(1)

11.7.3 Locally Linear Embedding

375

(3)

11.7.4 Laplacian Eigenmaps

378

(1)

11.8 Conclusion

379

(10)

Problems

379

(2)

References

381

(8)

Part III Applications

12 Speech and Handwriting Recognition

389

(32)

12.1 Introduction

389

(1)

12.2 The General Approach

390

(2)

12.3 The Front End

392

(5)

12.3.1 The Handwriting Front End

393

(1)

12.3.2 The Speech Front End

394

(3)

12.4 HMM Training

397

(3)

12.4.1 Lexicon and Training Set

397

(1)

12.4.2 Hidden Markov Models Training

398

(2)

12.5 Recognition and Performance Measures

400

(3)

12.5.1 Recognition

400

(1)

12.5.2 Performance Measurement

401

(2)

12.6 Recognition Experiments

403

(6)

12.6.1 Lexicon Selection

404

(1)

12.6.2 N-gram Model Performance

405

(2)

12.6.3 Cambridge Database Results

407

(1)

12.6.4 IAM Database Results

408

(1)

12.7 Speech Recognition Results

409

(2)

12.8 Applications

411

(10)

12.8.1 Applications of Handwriting Recognition

411

(2)

12.8.2 Applications of Speech Recognition

413

(2)

References

415

(6)

13 Automatic Face Recognition

421

(28)

13.1 Introduction

421

(2)

13.2 Face Recognition: General Approach

423

(1)

13.3 Face Detection and Localization

424

(4)

13.3.1 Face Segmentation and Normalization with TorchVision

426

(2)

13.4 Lighting Normalization

428

(2)

13.4.1 Center/Surround Retinex

428

(1)

13.4.2 Gross and Brajovic's Algorithm

429

(1)

13.4.3 Normalization with TorchVision

429

(1)

13.5 Feature Extraction

430

(7)

13.5.1 Holistic Approaches

430

(4)

13.5.2 Local Approaches

434

(1)

13.5.3 Feature Extraction with TorchVision

434

(3)

13.6 Classification

437

(2)

13.7 Performance Assessment

439

(3)

13.7.1 The FERET Database

440

(1)

13.7.2 The FRVT Database

441

(1)

13.8 Experiments

442

(7)

13.8.1 Data and Experimental Protocol

443

(1)

13.8.2 Euclidean Distance-Based Classifier

443

(2)

13.8.3 SVM-Based Classification

445

(1)

References

445

(4)

14 Video Segmentation and Keyframe Extraction

449

(18)

14.1 Introduction

449

(2)

14.2 Applications of Video Segmentation

451

(1)

14.3 Shot Boundary Detection

452

(6)

14.3.1 Pixel-Based Approaches

453

(2)

14.3.2 Block-Based Approaches

455

(1)

14.3.3 Histogram-Based Approaches

455

(1)

14.3.4 Clustering-Based Approaches

456

(1)

14.3.5 Performance Measures

457

(1)

14.4 Shot Boundary Detection with Torchvision

458

(2)

14.5 Keyframe Extraction

460

(2)

14.6 Keyframe Extraction with Torchvision and Torch

462

(5)

References

463

(4)

15 Real-Time Hand Pose Recognition

467

(18)

15.1 Introduction

467

(1)

15.2 Hand Pose Recognition Methods

468

(3)

15.3 Hand Pose Recognition by a Data Glove

471

(4)

15.4 Hand Pose Color-Based Recognition

475

(10)

15.4.1 Segmentation Module

476

(2)

15.4.2 Feature Extraction

478

(1)

15.4.3 The Classifier

479

(1)

15.4.4 Experimental Results

480

(3)

References

483

(2)

16 Automatic Personality Perception

485

(16)

16.1 Introduction

485

(1)

16.2 Previous Work

486

(2)

16.2.1 Nonverbal Behaviour

487

(1)

16.2.2 Social Media

488

(1)

16.3 Personality and Its Measurement

488

(2)

16.4 Speech-Based Automatic Personality Perception

490

(4)

16.4.1 The SSPNet Speaker Personality Corpus

491

(1)

16.4.2 The Approach

492

(1)

16.4.3 Extraction of Short-Term Features

492

(1)

16.4.4 Extraction of Statisticals

493

(1)

16.4.5 Prediction

493

(1)

16.5 Experiments and Results

494

(2)

16.6 Conclusions

496

(5)

References

497

(4)

Part IV Appendices

Appendix A Statistics

501

(12)

Appendix B Signal Processing

513

(12)

Appendix C Matrix Algebra

525

(6)

Appendix D Mathematical Foundations of Kernel Methods

531

(20)

Index

551

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97814471673582e.html

Märksõnad:

E-raamat: Machine Learning for Audio, Image and Video Analysis: Theory and Applications

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv