Muutke küpsiste eelistusi

Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications 2nd edition [Kõva köide]

  • Formaat: Hardback, 464 pages, kaal: 1211 g
  • Ilmumisaeg: 08-Nov-2022
  • Kirjastus: Wiley-IEEE Press
  • ISBN-10: 1119890942
  • ISBN-13: 9781119890942
  • Formaat: Hardback, 464 pages, kaal: 1211 g
  • Ilmumisaeg: 08-Nov-2022
  • Kirjastus: Wiley-IEEE Press
  • ISBN-10: 1119890942
  • ISBN-13: 9781119890942
An Introduction to Audio Content Analysis

Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches

An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation.

To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website.

Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include:

  • Digital audio signals and their representation, common time-frequency transforms, audio features
  • Pitch and fundamental frequency detection, key and chord
  • Representation of dynamics in music and intensity-related features
  • Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment
  • Audio fingerprinting, musical genre, mood, and instrument classification

An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.

Author Biography xvii
Preface xix
Acronyms xxi
List of Symbols
xxv
Source Code Repositories xxix
1 Introduction
1(8)
1.1 A Short History of Audio Content Analysis
3(1)
1.2 Applications and Use Cases
4(5)
1.2.1 Music Browsing and Music Discovery
4(1)
1.2.2 Music Consumption
4(1)
1.2.3 Music Production
5(1)
1.2.4 Music Education
5(1)
1.2.5 Generative Music
6(1)
References
6(3)
Part I Fundamentals of Audio Content Analysis
9(118)
2 Analysis of Audio Signals
11(6)
2.1 Audio Content
11(2)
2.2 Audio Content Analysis Process
13(2)
2.3 Exercises
15(2)
2.3.1 Questions
15(1)
References
15(2)
3 Input Representation
17(74)
3.1 Audio Signals
17(9)
3.1.1 Periodic Signals
18(2)
3.1.2 Random Signals
20(1)
3.1.3 Statistical Signal Description
20(2)
3.1.3.1 Arithmetic Mean
22(1)
3.1.3.2 Geometric Mean
22(1)
3.1.3.3 Harmonic Mean
23(1)
3.1.3.4 Variance and Standard Deviation
23(1)
3.1.3.5 Quantiles and Quantile Ranges
24(1)
3.1.4 Digital Audio Signals
25(1)
3.2 Audio Preprocessing
26(6)
3.2.1 Down-Mixing
26(1)
3.2.2 DC Removal
27(1)
3.2.3 Normalization
28(1)
3.2.4 Sample Rate Conversion
29(1)
3.2.5 Block-Based Processing
29(3)
3.2.6 Other Preprocessing Options
32(1)
3.3 Time-Frequency Representations
32(7)
3.3.1 Fourier Transform
32(3)
3.3.2 Constant Q Transform
35(1)
3.3.3 Log-Mel Spectrogram
36(1)
3.3.4 Filterbanks
37(2)
3.4 Other Input Representations
39(1)
3.5 Instantaneous Features
39(26)
3.5.1 Spectral Centroid
43(2)
3.5.2 Spectral Spread
45(1)
3.5.3 Spectral Skewness and Spectral Kurtosis
46(2)
3.5.4 Spectral Rolloff
48(2)
3.5.5 Spectral Decrease
50(1)
3.5.6 Spectral Slope
50(1)
3.5.7 Mel Frequency Cepstral Coefficients
51(4)
3.5.8 Spectral Flux
55(2)
3.5.9 Spectral Crest Factor
57(1)
3.5.10 Spectral Flatness
58(2)
3.5.11 Tonal Power Ratio
60(2)
3.5.12 Maximum of Autocorrelation Function
62(2)
3.5.13 Zero Crossing Rate
64(1)
3.6 Learned Features
65(2)
3.7 Feature Postprocessing
67(9)
3.7.1 Derived Features
67(1)
3.7.2 Feature Aggregation
68(1)
3.7.3 Normalization and Mapping
69(3)
3.7.4 Feature Dimensionality Reduction
72(1)
3.7.4.1 Feature Subset Selection
73(2)
3.7.4.2 Feature Space Transformation
75(1)
3.8 Exercises
76(15)
3.8.1 Questions
76(2)
3.8.2 Assignments
78(3)
References
81(10)
4 Inference
91(16)
4.1 Classification
92(5)
4.2 Regression
97(2)
4.3 Clustering
99(2)
4.4 Distance and Similarity
101(1)
4.5 Underfitting and Overfitting
102(1)
4.6 Exercises
103(4)
4.6.1 Questions
103(1)
4.6.2 Assignments
103(1)
References
104(3)
5 Data
107(12)
5.1 Data Split
109(2)
5.1.1 N-Fold Cross Validation
110(1)
5.2 Training Data Augmentation
111(2)
5.3 Utilization of Data From Related Tasks
113(1)
5.4 Reducing Accuracy Requirements for Data Annotation
114(1)
5.5 Semi-, Self-, and Unsupervised Learning
115(1)
5.6 Exercises
116(3)
5.6.1 Questions
116(1)
5.6.2 Assignments
116(1)
References
116(3)
6 Evaluation
119(8)
6.1 Metrics
121(5)
6.1.1 Classification
121(3)
6.1.2 Regression
124(1)
6.1.3 Clustering
125(1)
6.2 Exercises
126(1)
6.2.1 Questions
126(1)
References
126(1)
Part II Music Transcription
127(176)
7 Tonal Analysis
129(88)
7.1 Human Perception of Pitch
129(4)
7.1.1 Pitch Scales
129(3)
7.1.2 Chroma Perception
132(1)
7.2 Representation of Pitch in Music
133(5)
7.2.1 Pitch Classes and Names
133(1)
7.2.2 Intervals
134(1)
7.2.3 The Frequency of Musical Pitch
135(1)
7.2.3.1 Temperament
136(1)
7.2.3.2 Intonation
137(1)
7.3 Fundamental Frequency Detection
138(28)
7.3.1 Detection Accuracy
139(1)
7.3.1.1 Time Domain
139(1)
7.3.1.2 Frequency Domain
140(1)
7.3.1.3 Potential Solutions
141(2)
7.3.2 Preprocessing
143(1)
7.3.3 Monophonic Input Signals
143(1)
7.3.3.1 Zero Crossing Rate
144(1)
7.3.3.2 Autocorrelation Function
145(2)
7.3.3.3 Average Magnitude Difference Function
147(2)
7.3.3.4 Harmonic Product Spectrum and Harmonic Sum Spectrum
149(3)
7.3.3.5 Autocorrelation Function of the Magnitude Spectrum
152(1)
7.3.3.6 Cepstral Pitch Detection
152(2)
7.3.3.7 Maximum Likelihood and Template Matching
154(1)
7.3.3.8 Auditory-Motivated Pitch Tracking
155(1)
7.3.4 Polyphonic Input Signals
156(1)
7.3.4.1 Iterative Subtraction
157(2)
7.3.4.2 Nonnegative Matrix Factorization
159(4)
7.3.4.3 Other Approaches
163(1)
7.3.5 Evaluation
164(1)
7.3.5.1 Metrics
164(2)
7.3.5.2 Datasets
166(1)
7.3.5.3 Results
166(1)
7.4 Tuning Frequency Estimation
166(4)
7.4.1 Approaches to Tuning Frequency Estimation
168(2)
7.4.2 Evaluation
170(1)
7.5 Key Detection
170(15)
7.5.1 Pitch Chroma
173(5)
7.5.1.1 Pitch Chroma Properties
178(1)
7.5.1.2 Features Derived from the Pitch Chroma
179(1)
7.5.2 Approaches to Key Detection
180(1)
7.5.2.1 Key Profiles
181(1)
7.5.2.2 Similarity Measure between Template and Extracted Vector
182(1)
7.5.3 Evaluation
183(1)
7.5.3.1 Metrics
184(1)
7.5.3.2 Datasets
184(1)
7.5.3.3 Results
184(1)
7.6 Chord Recognition
185(9)
7.6.1 Approaches to Chord Recognition
185(3)
7.6.2 Viterbi Algorithm
188(4)
7.6.3 Evaluation
192(1)
7.6.3.1 Metrics
193(1)
7.6.3.2 Datasets
193(1)
7.6.3.3 Results
193(1)
7.7 Exercises
194(23)
7.7.1 Questions
194(2)
7.7.2 Assignments
196(5)
References
201(16)
8 Intensity
217(12)
8.1 Human Perception of Intensity and Loudness
217(2)
8.2 Representation of Dynamics in Music
219(1)
8.3 Features
220(5)
8.3.1 Root Mean Square
220(1)
8.3.2 Weighted Root Mean Square
221(2)
8.3.3 Peak Envelope
223(2)
8.3.4 Psycho-Acoustic Loudness Features
225(1)
8.4 Exercises
225(4)
8.4.1 Questions
225(1)
8.4.2 Assignments
226(1)
References
227(2)
9 Temporal Analysis
229(52)
9.1 Human Perception of Temporal Events
229(5)
9.1.1 Onsets
229(3)
9.1.2 Tempo and Meter
232(1)
9.1.3 Rhythm
233(1)
9.1.4 Timing
234(1)
9.2 Representation of Temporal Events in Music
234(2)
9.2.1 Tempo and Time Signature
235(1)
9.2.2 Note Value
235(1)
9.3 Onset Detection
236(7)
9.3.1 Novelty Function
236(3)
9.3.2 Peak Picking
239(2)
9.3.3 Evaluation
241(1)
9.3.3.1 Metrics
241(2)
9.3.3.2 Datasets
243(1)
9.3.3.3 Results
243(1)
9.4 Beat Histogram
243(2)
9.4.1 Beat Histogram Features
245(1)
9.5 Detection of Tempo and Beat Phase
245(5)
9.5.1 Evaluation
249(1)
9.5.1.1 Metrics
249(1)
9.5.1.2 Datasets
250(1)
9.5.1.3 Results
250(1)
9.6 Detection of Meter and Downbeat
250(2)
9.7 Structure Detection
252(8)
9.7.1 Self-Similarity Matrix
253(3)
9.7.2 Approaches to Structure Detection
256(1)
9.7.2.1 Novelty Analysis
256(1)
9.7.2.2 Homogeneity Analysis
256(1)
9.7.2.3 Repetition Analysis
256(2)
9.7.3 Evaluation
258(1)
9.7.3.1 Metrics
259(1)
9.7.3.2 Datasets
259(1)
9.7.3.3 Results
260(1)
9.8 Automatic Drum Transcription
260(2)
9.8.1 Transcription of Drum Onsets
261(1)
9.8.2 Evaluation
262(1)
9.9 Exercises
262(19)
9.9.1 Questions
262(1)
9.9.2 Assignments
263(3)
References
266(15)
10 Alignment
281(22)
10.1 Dynamic Time Warping
281(8)
10.1.1 Example
286(1)
10.1.2 Common Variants
287(1)
10.1.3 Optimizations
288(1)
10.2 Audio-to-Audio Alignment
289(2)
10.3 Audio-to-Score Alignment
291(3)
10.3.1 Real-Time Systems
292(1)
10.3.2 Non-Real-Time Systems
293(1)
10.4 Evaluation
294(2)
10.4.1 Metrics
294(1)
10.4.2 Data
295(1)
10.5 Exercises
296(7)
10.5.1 Questions
296(1)
10.5.2 Assignments
296(2)
References
298(5)
Part III Music Identification, Classification, and Assessment
303(62)
11 Audio Fingerprinting
305(12)
11.1 Fingerprint Extraction
307(1)
11.2 Fingerprint Matching
308(1)
11.3 Fingerprinting System: Example
309(3)
11.4 Evaluation
312(5)
References
312(5)
12 Music Similarity Detection and Music Genre Classification
317(20)
12.1 Music Similarity Detection
317(2)
12.1.1 Approaches to Music Similarity Computation
318(1)
12.1.2 Evaluation
319(1)
12.2 Musical Genre Classification
319(18)
12.2.1 Approaches to Musical Genre Classification
321(3)
12.2.2 Genre Classification: Example
324(1)
12.2.3 Evaluation
325(1)
12.2.3.1 Metrics
326(1)
12.2.3.2 Data
326(1)
12.2.3.3 Results
326(1)
12.2.4 Exercises
326(1)
12.2.5 Questions
326(1)
12.2.6 Assignments
327(1)
References
328(9)
13 Mood Recognition
337(10)
13.1 Approaches to Mood Recognition
338(3)
13.2 Evaluation
341(6)
References
342(5)
14 Musical Instrument Recognition
347(8)
14.1 Evaluation
349(6)
References
350(5)
15 Music Performance Assessment
355(10)
15.1 Music Performance
355(2)
15.2 Music Performance Analysis
357(1)
15.3 Approaches to Music Performance Assessment
358(7)
References
360(5)
Part IV Appendices
365(54)
Appendix A Fundamentals
367(5)
A.1 Sampling and Quantization
367(1)
A.1.1 Sampling
367(2)
A.1.2 Quantization
369(3)
A.1 Convolution
372(6)
A.2.1 Identity
372(1)
A.2.2 Commutativity
373(1)
A.2.3 Associativity
373(1)
A.2.4 Distributivity
374(1)
A.2.5 Circularity
374(1)
A.2.6 Simple Filter Examples
375(1)
A.2.6.1 Moving Average Filter
375(1)
A.2.6.2 Single-Pole Low-Pass Filter
376(1)
A.2.7 Zero-Phase Filtering with IIRs
377(1)
A.3 Correlation Function
378(7)
A.3.1 Normalization
379(1)
A.3.2 Autocorrelation Function
380(1)
A.3.3 Applications
380(1)
A.3.4 Calculation in the Frequency Domain
381(1)
A.3.4.1 Frequency Domain Compression
382(1)
References
382(3)
Appendix B Fourier Transform
385(12)
B.1 Properties of the Fourier Transformation
386(1)
B.1.1 Inverse Fourier Transform
386(1)
B.1.2 Superposition
386(1)
B.1.3 Convolution and Multiplication
386(1)
B.1.4 Parseval's Theorem
387(1)
B.1.5 Time and Frequency Shift
388(1)
B.1.6 Symmetry
388(2)
B.1.7 Time and Frequency Scaling
390(1)
B.1.8 Derivatives
390(1)
B.2 Spectrum of Example Time Domain Signals
390(1)
B.2.1 Delta Function
390(1)
B.2.2 Constant
391(1)
B.2.3 Cosine
391(1)
B.2.4 Rectangular Window
391(1)
B.2.5 Delta Pulse
392(1)
B.3 Transformation of Sampled Time Signals
392(1)
B.4 Short Time Fourier Transform of Continuous Signals
393(2)
B.4.1 Window Functions
395(1)
B.4.1.1 Rectangular Window
395(1)
B.4.1.2 Bartlett Window
396(1)
B.4.1.3 Generalized Superposed Cosines
396(1)
B.4.1.4 Generalized Power of Cosine
396(1)
B.5 Discrete Fourier Transform
397(8)
B.5.1 Window Functions
398(1)
B.5.1.1 Discrete Window Properties
398(1)
B.5.2 Fast Fourier Transform
399(1)
B.6 Frequency Reassignment: Instantaneous Frequency
399(3)
References
402(3)
Appendix C Principal Component Analysis
405(4)
C.1 Computation of the Transformation Matrix
406(1)
C.2 Interpretation of the Transformation Matrix
407(2)
Appendix D Linear Regression
409(2)
Appendix E Software for Audio Analysis
411(6)
E.1 Frameworks and Libraries
412(1)
E.1.1 librosa
412(1)
E.1.2 Essentia
412(1)
E.1.3 openSMILE
412(1)
E.1.4 Marsyas
413(1)
E.1.5 jMIR
413(1)
E.1.6 MIRtoolbox
413(1)
E.1.7 Yaafe
413(1)
E.1.8 madmom
413(1)
E.1.9 Software for Education
414(1)
E.1.10 Other Software
414(1)
E.2 Data Annotation and Visualization
414(1)
References
415(2)
Appendix F Datasets
417(2)
References 419(6)
Index 425
Alexander Lerch, PhD, is an Associate Professor at the Center for Music Technology, Georgia Institute of Technology. His research focuses on signal processing and machine learning applied to music, an interdisciplinary field commonly referred to as music information retrieval. He has authored more than 50 peer-reviewed publications and his website, www.AudioContentAnalysis.org, is a popular resource on Audio Content Analysis, providing video lectures, code examples, and other materials.