Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications 2nd edition [Kõva köide]

3.00/5 (2 hinnangut Goodreads-ist)

Alexander Lerch

Formaat: Hardback, 464 pages, kaal: 1211 g
Ilmumisaeg: 08-Nov-2022
Kirjastus: Wiley-IEEE Press
ISBN-10: 1119890942
ISBN-13: 9781119890942

Teised raamatud teemal:

Kõva köide
Hind: 140,50 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja
Raamatukogudele

Formaat: Hardback, 464 pages, kaal: 1211 g
Ilmumisaeg: 08-Nov-2022
Kirjastus: Wiley-IEEE Press
ISBN-10: 1119890942
ISBN-13: 9781119890942

Teised raamatud teemal:

Püsilink: https://www.kriso.ee/db/9781119890942.html

Märksõnad:

An Introduction to Audio Content Analysis

Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches

An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation.

To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website.

Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include:

Digital audio signals and their representation, common time-frequency transforms, audio features
Pitch and fundamental frequency detection, key and chord
Representation of dynamics in music and intensity-related features
Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment
Audio fingerprinting, musical genre, mood, and instrument classification

An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.

Author Biography

xvii

Preface

xix

Acronyms

xxi

List of Symbols

xxv

Source Code Repositories

xxix

1 Introduction

(8)

1.1 A Short History of Audio Content Analysis

(1)

1.2 Applications and Use Cases

(5)

1.2.1 Music Browsing and Music Discovery

(1)

1.2.2 Music Consumption

(1)

1.2.3 Music Production

(1)

1.2.4 Music Education

(1)

1.2.5 Generative Music

(1)

References

(3)

Part I Fundamentals of Audio Content Analysis

(118)

2 Analysis of Audio Signals

(6)

2.1 Audio Content

(2)

2.2 Audio Content Analysis Process

(2)

2.3 Exercises

(2)

2.3.1 Questions

(1)

References

(2)

3 Input Representation

(74)

3.1 Audio Signals

(9)

3.1.1 Periodic Signals

(2)

3.1.2 Random Signals

(1)

3.1.3 Statistical Signal Description

(2)

3.1.3.1 Arithmetic Mean

(1)

3.1.3.2 Geometric Mean

(1)

3.1.3.3 Harmonic Mean

(1)

3.1.3.4 Variance and Standard Deviation

(1)

3.1.3.5 Quantiles and Quantile Ranges

(1)

3.1.4 Digital Audio Signals

(1)

3.2 Audio Preprocessing

(6)

3.2.1 Down-Mixing

(1)

3.2.2 DC Removal

(1)

3.2.3 Normalization

(1)

3.2.4 Sample Rate Conversion

(1)

3.2.5 Block-Based Processing

(3)

3.2.6 Other Preprocessing Options

(1)

3.3 Time-Frequency Representations

(7)

3.3.1 Fourier Transform

(3)

3.3.2 Constant Q Transform

(1)

3.3.3 Log-Mel Spectrogram

(1)

3.3.4 Filterbanks

(2)

3.4 Other Input Representations

(1)

3.5 Instantaneous Features

(26)

3.5.1 Spectral Centroid

(2)

3.5.2 Spectral Spread

(1)

3.5.3 Spectral Skewness and Spectral Kurtosis

(2)

3.5.4 Spectral Rolloff

(2)

3.5.5 Spectral Decrease

(1)

3.5.6 Spectral Slope

(1)

3.5.7 Mel Frequency Cepstral Coefficients

(4)

3.5.8 Spectral Flux

(2)

3.5.9 Spectral Crest Factor

(1)

3.5.10 Spectral Flatness

(2)

3.5.11 Tonal Power Ratio

(2)

3.5.12 Maximum of Autocorrelation Function

(2)

3.5.13 Zero Crossing Rate

(1)

3.6 Learned Features

(2)

3.7 Feature Postprocessing

(9)

3.7.1 Derived Features

(1)

3.7.2 Feature Aggregation

(1)

3.7.3 Normalization and Mapping

(3)

3.7.4 Feature Dimensionality Reduction

(1)

3.7.4.1 Feature Subset Selection

(2)

3.7.4.2 Feature Space Transformation

(1)

3.8 Exercises

(15)

3.8.1 Questions

(2)

3.8.2 Assignments

(3)

References

(10)

4 Inference

(16)

4.1 Classification

(5)

4.2 Regression

(2)

4.3 Clustering

(2)

4.4 Distance and Similarity

101

(1)

4.5 Underfitting and Overfitting

102

(1)

4.6 Exercises

103

(4)

4.6.1 Questions

103

(1)

4.6.2 Assignments

103

(1)

References

104

(3)

5 Data

107

(12)

5.1 Data Split

109

(2)

5.1.1 N-Fold Cross Validation

110

(1)

5.2 Training Data Augmentation

111

(2)

5.3 Utilization of Data From Related Tasks

113

(1)

5.4 Reducing Accuracy Requirements for Data Annotation

114

(1)

5.5 Semi-, Self-, and Unsupervised Learning

115

(1)

5.6 Exercises

116

(3)

5.6.1 Questions

116

(1)

5.6.2 Assignments

116

(1)

References

116

(3)

6 Evaluation

119

(8)

6.1 Metrics

121

(5)

6.1.1 Classification

121

(3)

6.1.2 Regression

124

(1)

6.1.3 Clustering

125

(1)

6.2 Exercises

126

(1)

6.2.1 Questions

126

(1)

References

126

(1)

Part II Music Transcription

127

(176)

7 Tonal Analysis

129

(88)

7.1 Human Perception of Pitch

129

(4)

7.1.1 Pitch Scales

129

(3)

7.1.2 Chroma Perception

132

(1)

7.2 Representation of Pitch in Music

133

(5)

7.2.1 Pitch Classes and Names

133

(1)

7.2.2 Intervals

134

(1)

7.2.3 The Frequency of Musical Pitch

135

(1)

7.2.3.1 Temperament

136

(1)

7.2.3.2 Intonation

137

(1)

7.3 Fundamental Frequency Detection

138

(28)

7.3.1 Detection Accuracy

139

(1)

7.3.1.1 Time Domain

139

(1)

7.3.1.2 Frequency Domain

140

(1)

7.3.1.3 Potential Solutions

141

(2)

7.3.2 Preprocessing

143

(1)

7.3.3 Monophonic Input Signals

143

(1)

7.3.3.1 Zero Crossing Rate

144

(1)

7.3.3.2 Autocorrelation Function

145

(2)

7.3.3.3 Average Magnitude Difference Function

147

(2)

7.3.3.4 Harmonic Product Spectrum and Harmonic Sum Spectrum

149

(3)

7.3.3.5 Autocorrelation Function of the Magnitude Spectrum

152

(1)

7.3.3.6 Cepstral Pitch Detection

152

(2)

7.3.3.7 Maximum Likelihood and Template Matching

154

(1)

7.3.3.8 Auditory-Motivated Pitch Tracking

155

(1)

7.3.4 Polyphonic Input Signals

156

(1)

7.3.4.1 Iterative Subtraction

157

(2)

7.3.4.2 Nonnegative Matrix Factorization

159

(4)

7.3.4.3 Other Approaches

163

(1)

7.3.5 Evaluation

164

(1)

7.3.5.1 Metrics

164

(2)

7.3.5.2 Datasets

166

(1)

7.3.5.3 Results

166

(1)

7.4 Tuning Frequency Estimation

166

(4)

7.4.1 Approaches to Tuning Frequency Estimation

168

(2)

7.4.2 Evaluation

170

(1)

7.5 Key Detection

170

(15)

7.5.1 Pitch Chroma

173

(5)

7.5.1.1 Pitch Chroma Properties

178

(1)

7.5.1.2 Features Derived from the Pitch Chroma

179

(1)

7.5.2 Approaches to Key Detection

180

(1)

7.5.2.1 Key Profiles

181

(1)

7.5.2.2 Similarity Measure between Template and Extracted Vector

182

(1)

7.5.3 Evaluation

183

(1)

7.5.3.1 Metrics

184

(1)

7.5.3.2 Datasets

184

(1)

7.5.3.3 Results

184

(1)

7.6 Chord Recognition

185

(9)

7.6.1 Approaches to Chord Recognition

185

(3)

7.6.2 Viterbi Algorithm

188

(4)

7.6.3 Evaluation

192

(1)

7.6.3.1 Metrics

193

(1)

7.6.3.2 Datasets

193

(1)

7.6.3.3 Results

193

(1)

7.7 Exercises

194

(23)

7.7.1 Questions

194

(2)

7.7.2 Assignments

196

(5)

References

201

(16)

8 Intensity

217

(12)

8.1 Human Perception of Intensity and Loudness

217

(2)

8.2 Representation of Dynamics in Music

219

(1)

8.3 Features

220

(5)

8.3.1 Root Mean Square

220

(1)

8.3.2 Weighted Root Mean Square

221

(2)

8.3.3 Peak Envelope

223

(2)

8.3.4 Psycho-Acoustic Loudness Features

225

(1)

8.4 Exercises

225

(4)

8.4.1 Questions

225

(1)

8.4.2 Assignments

226

(1)

References

227

(2)

9 Temporal Analysis

229

(52)

9.1 Human Perception of Temporal Events

229

(5)

9.1.1 Onsets

229

(3)

9.1.2 Tempo and Meter

232

(1)

9.1.3 Rhythm

233

(1)

9.1.4 Timing

234

(1)

9.2 Representation of Temporal Events in Music

234

(2)

9.2.1 Tempo and Time Signature

235

(1)

9.2.2 Note Value

235

(1)

9.3 Onset Detection

236

(7)

9.3.1 Novelty Function

236

(3)

9.3.2 Peak Picking

239

(2)

9.3.3 Evaluation

241

(1)

9.3.3.1 Metrics

241

(2)

9.3.3.2 Datasets

243

(1)

9.3.3.3 Results

243

(1)

9.4 Beat Histogram

243

(2)

9.4.1 Beat Histogram Features

245

(1)

9.5 Detection of Tempo and Beat Phase

245

(5)

9.5.1 Evaluation

249

(1)

9.5.1.1 Metrics

249

(1)

9.5.1.2 Datasets

250

(1)

9.5.1.3 Results

250

(1)

9.6 Detection of Meter and Downbeat

250

(2)

9.7 Structure Detection

252

(8)

9.7.1 Self-Similarity Matrix

253

(3)

9.7.2 Approaches to Structure Detection

256

(1)

9.7.2.1 Novelty Analysis

256

(1)

9.7.2.2 Homogeneity Analysis

256

(1)

9.7.2.3 Repetition Analysis

256

(2)

9.7.3 Evaluation

258

(1)

9.7.3.1 Metrics

259

(1)

9.7.3.2 Datasets

259

(1)

9.7.3.3 Results

260

(1)

9.8 Automatic Drum Transcription

260

(2)

9.8.1 Transcription of Drum Onsets

261

(1)

9.8.2 Evaluation

262

(1)

9.9 Exercises

262

(19)

9.9.1 Questions

262

(1)

9.9.2 Assignments

263

(3)

References

266

(15)

10 Alignment

281

(22)

10.1 Dynamic Time Warping

281

(8)

10.1.1 Example

286

(1)

10.1.2 Common Variants

287

(1)

10.1.3 Optimizations

288

(1)

10.2 Audio-to-Audio Alignment

289

(2)

10.3 Audio-to-Score Alignment

291

(3)

10.3.1 Real-Time Systems

292

(1)

10.3.2 Non-Real-Time Systems

293

(1)

10.4 Evaluation

294

(2)

10.4.1 Metrics

294

(1)

10.4.2 Data

295

(1)

10.5 Exercises

296

(7)

10.5.1 Questions

296

(1)

10.5.2 Assignments

296

(2)

References

298

(5)

Part III Music Identification, Classification, and Assessment

303

(62)

11 Audio Fingerprinting

305

(12)

11.1 Fingerprint Extraction

307

(1)

11.2 Fingerprint Matching

308

(1)

11.3 Fingerprinting System: Example

309

(3)

11.4 Evaluation

312

(5)

References

312

(5)

12 Music Similarity Detection and Music Genre Classification

317

(20)

12.1 Music Similarity Detection

317

(2)

12.1.1 Approaches to Music Similarity Computation

318

(1)

12.1.2 Evaluation

319

(1)

12.2 Musical Genre Classification

319

(18)

12.2.1 Approaches to Musical Genre Classification

321

(3)

12.2.2 Genre Classification: Example

324

(1)

12.2.3 Evaluation

325

(1)

12.2.3.1 Metrics

326

(1)

12.2.3.2 Data

326

(1)

12.2.3.3 Results

326

(1)

12.2.4 Exercises

326

(1)

12.2.5 Questions

326

(1)

12.2.6 Assignments

327

(1)

References

328

(9)

13 Mood Recognition

337

(10)

13.1 Approaches to Mood Recognition

338

(3)

13.2 Evaluation

341

(6)

References

342

(5)

14 Musical Instrument Recognition

347

(8)

14.1 Evaluation

349

(6)

References

350

(5)

15 Music Performance Assessment

355

(10)

15.1 Music Performance

355

(2)

15.2 Music Performance Analysis

357

(1)

15.3 Approaches to Music Performance Assessment

358

(7)

References

360

(5)

Part IV Appendices

365

(54)

Appendix A Fundamentals

367

(5)

A.1 Sampling and Quantization

367

(1)

A.1.1 Sampling

367

(2)

A.1.2 Quantization

369

(3)

A.1 Convolution

372

(6)

A.2.1 Identity

372

(1)

A.2.2 Commutativity

373

(1)

A.2.3 Associativity

373

(1)

A.2.4 Distributivity

374

(1)

A.2.5 Circularity

374

(1)

A.2.6 Simple Filter Examples

375

(1)

A.2.6.1 Moving Average Filter

375

(1)

A.2.6.2 Single-Pole Low-Pass Filter

376

(1)

A.2.7 Zero-Phase Filtering with IIRs

377

(1)

A.3 Correlation Function

378

(7)

A.3.1 Normalization

379

(1)

A.3.2 Autocorrelation Function

380

(1)

A.3.3 Applications

380

(1)

A.3.4 Calculation in the Frequency Domain

381

(1)

A.3.4.1 Frequency Domain Compression

382

(1)

References

382

(3)

Appendix B Fourier Transform

385

(12)

B.1 Properties of the Fourier Transformation

386

(1)

B.1.1 Inverse Fourier Transform

386

(1)

B.1.2 Superposition

386

(1)

B.1.3 Convolution and Multiplication

386

(1)

B.1.4 Parseval's Theorem

387

(1)

B.1.5 Time and Frequency Shift

388

(1)

B.1.6 Symmetry

388

(2)

B.1.7 Time and Frequency Scaling

390

(1)

B.1.8 Derivatives

390

(1)

B.2 Spectrum of Example Time Domain Signals

390

(1)

B.2.1 Delta Function

390

(1)

B.2.2 Constant

391

(1)

B.2.3 Cosine

391

(1)

B.2.4 Rectangular Window

391

(1)

B.2.5 Delta Pulse

392

(1)

B.3 Transformation of Sampled Time Signals

392

(1)

B.4 Short Time Fourier Transform of Continuous Signals

393

(2)

B.4.1 Window Functions

395

(1)

B.4.1.1 Rectangular Window

395

(1)

B.4.1.2 Bartlett Window

396

(1)

B.4.1.3 Generalized Superposed Cosines

396

(1)

B.4.1.4 Generalized Power of Cosine

396

(1)

B.5 Discrete Fourier Transform

397

(8)

B.5.1 Window Functions

398

(1)

B.5.1.1 Discrete Window Properties

398

(1)

B.5.2 Fast Fourier Transform

399

(1)

B.6 Frequency Reassignment: Instantaneous Frequency

399

(3)

References

402

(3)

Appendix C Principal Component Analysis

405

(4)

C.1 Computation of the Transformation Matrix

406

(1)

C.2 Interpretation of the Transformation Matrix

407

(2)

Appendix D Linear Regression

409

(2)

Appendix E Software for Audio Analysis

411

(6)

E.1 Frameworks and Libraries

412

(1)

E.1.1 librosa

412

(1)

E.1.2 Essentia

412

(1)

E.1.3 openSMILE

412

(1)

E.1.4 Marsyas

413

(1)

E.1.5 jMIR

413

(1)

E.1.6 MIRtoolbox

413

(1)

E.1.7 Yaafe

413

(1)

E.1.8 madmom

413

(1)

E.1.9 Software for Education

414

(1)

E.1.10 Other Software

414

(1)

E.2 Data Annotation and Visualization

414

(1)

References

415

(2)

Appendix F Datasets

417

(2)

References

419

(6)

Index

425

Alexander Lerch, PhD, is an Associate Professor at the Center for Music Technology, Georgia Institute of Technology. His research focuses on signal processing and machine learning applied to music, an interdisciplinary field commonly referred to as music information retrieval. He has authored more than 50 peer-reviewed publications and his website, www.AudioContentAnalysis.org, is a popular resource on Audio Content Analysis, providing video lectures, code examples, and other materials.

Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications 2nd edition [Kõva köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv