Author Biography |
|
xvii | |
Preface |
|
xix | |
Acronyms |
|
xxi | |
|
|
xxv | |
Source Code Repositories |
|
xxix | |
|
|
1 | (8) |
|
1.1 A Short History of Audio Content Analysis |
|
|
3 | (1) |
|
1.2 Applications and Use Cases |
|
|
4 | (5) |
|
1.2.1 Music Browsing and Music Discovery |
|
|
4 | (1) |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
6 | (3) |
|
Part I Fundamentals of Audio Content Analysis |
|
|
9 | (118) |
|
2 Analysis of Audio Signals |
|
|
11 | (6) |
|
|
11 | (2) |
|
2.2 Audio Content Analysis Process |
|
|
13 | (2) |
|
|
15 | (2) |
|
|
15 | (1) |
|
|
15 | (2) |
|
|
17 | (74) |
|
|
17 | (9) |
|
|
18 | (2) |
|
|
20 | (1) |
|
3.1.3 Statistical Signal Description |
|
|
20 | (2) |
|
|
22 | (1) |
|
|
22 | (1) |
|
|
23 | (1) |
|
3.1.3.4 Variance and Standard Deviation |
|
|
23 | (1) |
|
3.1.3.5 Quantiles and Quantile Ranges |
|
|
24 | (1) |
|
3.1.4 Digital Audio Signals |
|
|
25 | (1) |
|
|
26 | (6) |
|
|
26 | (1) |
|
|
27 | (1) |
|
|
28 | (1) |
|
3.2.4 Sample Rate Conversion |
|
|
29 | (1) |
|
3.2.5 Block-Based Processing |
|
|
29 | (3) |
|
3.2.6 Other Preprocessing Options |
|
|
32 | (1) |
|
3.3 Time-Frequency Representations |
|
|
32 | (7) |
|
|
32 | (3) |
|
3.3.2 Constant Q Transform |
|
|
35 | (1) |
|
3.3.3 Log-Mel Spectrogram |
|
|
36 | (1) |
|
|
37 | (2) |
|
3.4 Other Input Representations |
|
|
39 | (1) |
|
3.5 Instantaneous Features |
|
|
39 | (26) |
|
|
43 | (2) |
|
|
45 | (1) |
|
3.5.3 Spectral Skewness and Spectral Kurtosis |
|
|
46 | (2) |
|
|
48 | (2) |
|
|
50 | (1) |
|
|
50 | (1) |
|
3.5.7 Mel Frequency Cepstral Coefficients |
|
|
51 | (4) |
|
|
55 | (2) |
|
3.5.9 Spectral Crest Factor |
|
|
57 | (1) |
|
|
58 | (2) |
|
|
60 | (2) |
|
3.5.12 Maximum of Autocorrelation Function |
|
|
62 | (2) |
|
3.5.13 Zero Crossing Rate |
|
|
64 | (1) |
|
|
65 | (2) |
|
3.7 Feature Postprocessing |
|
|
67 | (9) |
|
|
67 | (1) |
|
3.7.2 Feature Aggregation |
|
|
68 | (1) |
|
3.7.3 Normalization and Mapping |
|
|
69 | (3) |
|
3.7.4 Feature Dimensionality Reduction |
|
|
72 | (1) |
|
3.7.4.1 Feature Subset Selection |
|
|
73 | (2) |
|
3.7.4.2 Feature Space Transformation |
|
|
75 | (1) |
|
|
76 | (15) |
|
|
76 | (2) |
|
|
78 | (3) |
|
|
81 | (10) |
|
|
91 | (16) |
|
|
92 | (5) |
|
|
97 | (2) |
|
|
99 | (2) |
|
4.4 Distance and Similarity |
|
|
101 | (1) |
|
4.5 Underfitting and Overfitting |
|
|
102 | (1) |
|
|
103 | (4) |
|
|
103 | (1) |
|
|
103 | (1) |
|
|
104 | (3) |
|
|
107 | (12) |
|
|
109 | (2) |
|
5.1.1 N-Fold Cross Validation |
|
|
110 | (1) |
|
5.2 Training Data Augmentation |
|
|
111 | (2) |
|
5.3 Utilization of Data From Related Tasks |
|
|
113 | (1) |
|
5.4 Reducing Accuracy Requirements for Data Annotation |
|
|
114 | (1) |
|
5.5 Semi-, Self-, and Unsupervised Learning |
|
|
115 | (1) |
|
|
116 | (3) |
|
|
116 | (1) |
|
|
116 | (1) |
|
|
116 | (3) |
|
|
119 | (8) |
|
|
121 | (5) |
|
|
121 | (3) |
|
|
124 | (1) |
|
|
125 | (1) |
|
|
126 | (1) |
|
|
126 | (1) |
|
|
126 | (1) |
|
Part II Music Transcription |
|
|
127 | (176) |
|
|
129 | (88) |
|
7.1 Human Perception of Pitch |
|
|
129 | (4) |
|
|
129 | (3) |
|
|
132 | (1) |
|
7.2 Representation of Pitch in Music |
|
|
133 | (5) |
|
7.2.1 Pitch Classes and Names |
|
|
133 | (1) |
|
|
134 | (1) |
|
7.2.3 The Frequency of Musical Pitch |
|
|
135 | (1) |
|
|
136 | (1) |
|
|
137 | (1) |
|
7.3 Fundamental Frequency Detection |
|
|
138 | (28) |
|
|
139 | (1) |
|
|
139 | (1) |
|
|
140 | (1) |
|
7.3.1.3 Potential Solutions |
|
|
141 | (2) |
|
|
143 | (1) |
|
7.3.3 Monophonic Input Signals |
|
|
143 | (1) |
|
7.3.3.1 Zero Crossing Rate |
|
|
144 | (1) |
|
7.3.3.2 Autocorrelation Function |
|
|
145 | (2) |
|
7.3.3.3 Average Magnitude Difference Function |
|
|
147 | (2) |
|
7.3.3.4 Harmonic Product Spectrum and Harmonic Sum Spectrum |
|
|
149 | (3) |
|
7.3.3.5 Autocorrelation Function of the Magnitude Spectrum |
|
|
152 | (1) |
|
7.3.3.6 Cepstral Pitch Detection |
|
|
152 | (2) |
|
7.3.3.7 Maximum Likelihood and Template Matching |
|
|
154 | (1) |
|
7.3.3.8 Auditory-Motivated Pitch Tracking |
|
|
155 | (1) |
|
7.3.4 Polyphonic Input Signals |
|
|
156 | (1) |
|
7.3.4.1 Iterative Subtraction |
|
|
157 | (2) |
|
7.3.4.2 Nonnegative Matrix Factorization |
|
|
159 | (4) |
|
|
163 | (1) |
|
|
164 | (1) |
|
|
164 | (2) |
|
|
166 | (1) |
|
|
166 | (1) |
|
7.4 Tuning Frequency Estimation |
|
|
166 | (4) |
|
7.4.1 Approaches to Tuning Frequency Estimation |
|
|
168 | (2) |
|
|
170 | (1) |
|
|
170 | (15) |
|
|
173 | (5) |
|
7.5.1.1 Pitch Chroma Properties |
|
|
178 | (1) |
|
7.5.1.2 Features Derived from the Pitch Chroma |
|
|
179 | (1) |
|
7.5.2 Approaches to Key Detection |
|
|
180 | (1) |
|
|
181 | (1) |
|
7.5.2.2 Similarity Measure between Template and Extracted Vector |
|
|
182 | (1) |
|
|
183 | (1) |
|
|
184 | (1) |
|
|
184 | (1) |
|
|
184 | (1) |
|
|
185 | (9) |
|
7.6.1 Approaches to Chord Recognition |
|
|
185 | (3) |
|
|
188 | (4) |
|
|
192 | (1) |
|
|
193 | (1) |
|
|
193 | (1) |
|
|
193 | (1) |
|
|
194 | (23) |
|
|
194 | (2) |
|
|
196 | (5) |
|
|
201 | (16) |
|
|
217 | (12) |
|
8.1 Human Perception of Intensity and Loudness |
|
|
217 | (2) |
|
8.2 Representation of Dynamics in Music |
|
|
219 | (1) |
|
|
220 | (5) |
|
|
220 | (1) |
|
8.3.2 Weighted Root Mean Square |
|
|
221 | (2) |
|
|
223 | (2) |
|
8.3.4 Psycho-Acoustic Loudness Features |
|
|
225 | (1) |
|
|
225 | (4) |
|
|
225 | (1) |
|
|
226 | (1) |
|
|
227 | (2) |
|
|
229 | (52) |
|
9.1 Human Perception of Temporal Events |
|
|
229 | (5) |
|
|
229 | (3) |
|
|
232 | (1) |
|
|
233 | (1) |
|
|
234 | (1) |
|
9.2 Representation of Temporal Events in Music |
|
|
234 | (2) |
|
9.2.1 Tempo and Time Signature |
|
|
235 | (1) |
|
|
235 | (1) |
|
|
236 | (7) |
|
|
236 | (3) |
|
|
239 | (2) |
|
|
241 | (1) |
|
|
241 | (2) |
|
|
243 | (1) |
|
|
243 | (1) |
|
|
243 | (2) |
|
9.4.1 Beat Histogram Features |
|
|
245 | (1) |
|
9.5 Detection of Tempo and Beat Phase |
|
|
245 | (5) |
|
|
249 | (1) |
|
|
249 | (1) |
|
|
250 | (1) |
|
|
250 | (1) |
|
9.6 Detection of Meter and Downbeat |
|
|
250 | (2) |
|
|
252 | (8) |
|
9.7.1 Self-Similarity Matrix |
|
|
253 | (3) |
|
9.7.2 Approaches to Structure Detection |
|
|
256 | (1) |
|
|
256 | (1) |
|
9.7.2.2 Homogeneity Analysis |
|
|
256 | (1) |
|
9.7.2.3 Repetition Analysis |
|
|
256 | (2) |
|
|
258 | (1) |
|
|
259 | (1) |
|
|
259 | (1) |
|
|
260 | (1) |
|
9.8 Automatic Drum Transcription |
|
|
260 | (2) |
|
9.8.1 Transcription of Drum Onsets |
|
|
261 | (1) |
|
|
262 | (1) |
|
|
262 | (19) |
|
|
262 | (1) |
|
|
263 | (3) |
|
|
266 | (15) |
|
|
281 | (22) |
|
10.1 Dynamic Time Warping |
|
|
281 | (8) |
|
|
286 | (1) |
|
|
287 | (1) |
|
|
288 | (1) |
|
10.2 Audio-to-Audio Alignment |
|
|
289 | (2) |
|
10.3 Audio-to-Score Alignment |
|
|
291 | (3) |
|
|
292 | (1) |
|
10.3.2 Non-Real-Time Systems |
|
|
293 | (1) |
|
|
294 | (2) |
|
|
294 | (1) |
|
|
295 | (1) |
|
|
296 | (7) |
|
|
296 | (1) |
|
|
296 | (2) |
|
|
298 | (5) |
|
Part III Music Identification, Classification, and Assessment |
|
|
303 | (62) |
|
|
305 | (12) |
|
11.1 Fingerprint Extraction |
|
|
307 | (1) |
|
11.2 Fingerprint Matching |
|
|
308 | (1) |
|
11.3 Fingerprinting System: Example |
|
|
309 | (3) |
|
|
312 | (5) |
|
|
312 | (5) |
|
12 Music Similarity Detection and Music Genre Classification |
|
|
317 | (20) |
|
12.1 Music Similarity Detection |
|
|
317 | (2) |
|
12.1.1 Approaches to Music Similarity Computation |
|
|
318 | (1) |
|
|
319 | (1) |
|
12.2 Musical Genre Classification |
|
|
319 | (18) |
|
12.2.1 Approaches to Musical Genre Classification |
|
|
321 | (3) |
|
12.2.2 Genre Classification: Example |
|
|
324 | (1) |
|
|
325 | (1) |
|
|
326 | (1) |
|
|
326 | (1) |
|
|
326 | (1) |
|
|
326 | (1) |
|
|
326 | (1) |
|
|
327 | (1) |
|
|
328 | (9) |
|
|
337 | (10) |
|
13.1 Approaches to Mood Recognition |
|
|
338 | (3) |
|
|
341 | (6) |
|
|
342 | (5) |
|
14 Musical Instrument Recognition |
|
|
347 | (8) |
|
|
349 | (6) |
|
|
350 | (5) |
|
15 Music Performance Assessment |
|
|
355 | (10) |
|
|
355 | (2) |
|
15.2 Music Performance Analysis |
|
|
357 | (1) |
|
15.3 Approaches to Music Performance Assessment |
|
|
358 | (7) |
|
|
360 | (5) |
|
|
365 | (54) |
|
|
367 | (5) |
|
A.1 Sampling and Quantization |
|
|
367 | (1) |
|
|
367 | (2) |
|
|
369 | (3) |
|
|
372 | (6) |
|
|
372 | (1) |
|
|
373 | (1) |
|
|
373 | (1) |
|
|
374 | (1) |
|
|
374 | (1) |
|
A.2.6 Simple Filter Examples |
|
|
375 | (1) |
|
A.2.6.1 Moving Average Filter |
|
|
375 | (1) |
|
A.2.6.2 Single-Pole Low-Pass Filter |
|
|
376 | (1) |
|
A.2.7 Zero-Phase Filtering with IIRs |
|
|
377 | (1) |
|
|
378 | (7) |
|
|
379 | (1) |
|
A.3.2 Autocorrelation Function |
|
|
380 | (1) |
|
|
380 | (1) |
|
A.3.4 Calculation in the Frequency Domain |
|
|
381 | (1) |
|
A.3.4.1 Frequency Domain Compression |
|
|
382 | (1) |
|
|
382 | (3) |
|
Appendix B Fourier Transform |
|
|
385 | (12) |
|
B.1 Properties of the Fourier Transformation |
|
|
386 | (1) |
|
B.1.1 Inverse Fourier Transform |
|
|
386 | (1) |
|
|
386 | (1) |
|
B.1.3 Convolution and Multiplication |
|
|
386 | (1) |
|
|
387 | (1) |
|
B.1.5 Time and Frequency Shift |
|
|
388 | (1) |
|
|
388 | (2) |
|
B.1.7 Time and Frequency Scaling |
|
|
390 | (1) |
|
|
390 | (1) |
|
B.2 Spectrum of Example Time Domain Signals |
|
|
390 | (1) |
|
|
390 | (1) |
|
|
391 | (1) |
|
|
391 | (1) |
|
|
391 | (1) |
|
|
392 | (1) |
|
B.3 Transformation of Sampled Time Signals |
|
|
392 | (1) |
|
B.4 Short Time Fourier Transform of Continuous Signals |
|
|
393 | (2) |
|
|
395 | (1) |
|
B.4.1.1 Rectangular Window |
|
|
395 | (1) |
|
|
396 | (1) |
|
B.4.1.3 Generalized Superposed Cosines |
|
|
396 | (1) |
|
B.4.1.4 Generalized Power of Cosine |
|
|
396 | (1) |
|
B.5 Discrete Fourier Transform |
|
|
397 | (8) |
|
|
398 | (1) |
|
B.5.1.1 Discrete Window Properties |
|
|
398 | (1) |
|
B.5.2 Fast Fourier Transform |
|
|
399 | (1) |
|
B.6 Frequency Reassignment: Instantaneous Frequency |
|
|
399 | (3) |
|
|
402 | (3) |
|
Appendix C Principal Component Analysis |
|
|
405 | (4) |
|
C.1 Computation of the Transformation Matrix |
|
|
406 | (1) |
|
C.2 Interpretation of the Transformation Matrix |
|
|
407 | (2) |
|
Appendix D Linear Regression |
|
|
409 | (2) |
|
Appendix E Software for Audio Analysis |
|
|
411 | (6) |
|
E.1 Frameworks and Libraries |
|
|
412 | (1) |
|
|
412 | (1) |
|
|
412 | (1) |
|
|
412 | (1) |
|
|
413 | (1) |
|
|
413 | (1) |
|
|
413 | (1) |
|
|
413 | (1) |
|
|
413 | (1) |
|
E.1.9 Software for Education |
|
|
414 | (1) |
|
|
414 | (1) |
|
E.2 Data Annotation and Visualization |
|
|
414 | (1) |
|
|
415 | (2) |
|
|
417 | (2) |
References |
|
419 | (6) |
Index |
|
425 | |