|
|
xiii | |
Preface |
|
xv | |
About the Companion Website |
|
xix | |
|
Part I Analysis and Synthesis of Spatial Sound |
|
|
1 | (68) |
|
1 Time-Frequency Processing: Methods and Tools |
|
|
3 | (22) |
|
|
|
|
3 | (1) |
|
1.2 Time-Frequency Processing |
|
|
4 | (12) |
|
|
4 | (1) |
|
1.2.2 Uniform Filter Banks |
|
|
5 | (1) |
|
1.2.3 Prototype Filters and Modulation |
|
|
6 | (2) |
|
1.2.4 A Robust Complex-Modulated Filter Bank, and Comparison with STFT |
|
|
8 | (4) |
|
1.2.5 Overlap-Add and Windowing |
|
|
12 | (1) |
|
1.2.6 Example Implementation of a Robust Filter Bank in Matlab |
|
|
13 | (2) |
|
|
15 | (1) |
|
1.3 Processing of Spatial Audio |
|
|
16 | (9) |
|
1.3.1 Stochastic Estimates |
|
|
17 | (1) |
|
|
18 | (1) |
|
1.3.3 Optimal and Generalized Solution for Spatial Sound Processing Using Covariance Matrices |
|
|
19 | (4) |
|
|
23 | (2) |
|
2 Spatial Decomposition by Spherical Array Processing |
|
|
25 | (24) |
|
|
|
|
25 | (1) |
|
2.2 Sound Field Measurement by a Spherical Array |
|
|
26 | (1) |
|
2.3 Array Processing and Plane-Wave Decomposition |
|
|
26 | (3) |
|
2.4 Sensitivity to Noise and Standard Regularization Methods |
|
|
29 | (3) |
|
2.5 Optimal Noise-Robust Design |
|
|
32 | (5) |
|
2.5.1 PWD Estimation Error Measure |
|
|
32 | (2) |
|
2.5.2 PWD Error Minimization |
|
|
34 | (1) |
|
2.5.3 R-PWD Simulation Study |
|
|
35 | (2) |
|
2.6 Spatial Aliasing and High Frequency Performance Limit |
|
|
37 | (2) |
|
2.7 High Frequency Bandwidth Extension by Aliasing Cancellation |
|
|
39 | (3) |
|
2.7.1 Spatial Aliasing Error |
|
|
39 | (1) |
|
2.7.2 AC-PWD Simulation Study |
|
|
40 | (2) |
|
2.8 High Performance Broadband PWD Example |
|
|
42 | (3) |
|
2.8.1 Broadband Measurement Model |
|
|
42 | (1) |
|
2.8.2 Minimizing Broadband PWD Error |
|
|
42 | (2) |
|
2.8.3 BB-PWD Simulation Study |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
46 | (3) |
|
|
46 | (3) |
|
3 Sound Field Analysis Using Sparse Recovery |
|
|
49 | (20) |
|
|
|
|
|
49 | (1) |
|
3.2 The Plane-Wave Decomposition Problem |
|
|
50 | (1) |
|
3.2.1 Sparse Plane-Wave Decomposition |
|
|
51 | (1) |
|
3.2.2 The Iteratively Reweighted Least-Squares Algorithm |
|
|
52 | (1) |
|
3.3 Bayesian Approach to Plane-Wave Decomposition |
|
|
53 | (2) |
|
3.4 Calculating the IRLS Noise-Power Regularization Parameter |
|
|
55 | (3) |
|
3.4.1 Estimation of the Relative Noise Power |
|
|
56 | (2) |
|
3.5 Numerical Simulations |
|
|
58 | (1) |
|
3.6 Experiment: Echoic Sound Scene Analysis |
|
|
59 | (6) |
|
|
65 | (4) |
|
|
65 | (1) |
|
|
66 | (3) |
|
Part II Reproduction of Spatial Sound |
|
|
69 | (183) |
|
4 Overview of Time--Frequency Domain Parametric Spatial Audio Techniques |
|
|
71 | (18) |
|
|
|
|
|
71 | (2) |
|
4.2 Parametric Processing Overview |
|
|
73 | (16) |
|
4.2.1 Analysis Principles |
|
|
74 | (1) |
|
4.2.2 Synthesis Principles |
|
|
75 | (1) |
|
4.2.3 Spatial Audio Coding and Up-Mixing |
|
|
76 | (2) |
|
4.2.4 Spatial Sound Recording and Reproduction |
|
|
78 | (3) |
|
4.2.5 Auralization of Measured Room Acoustics and Spatial Rendering of Room Impulse Responses |
|
|
81 | (1) |
|
|
82 | (7) |
|
5 First-Order Directional Audio Coding (DirAC) |
|
|
89 | (52) |
|
|
|
|
|
|
5.1 Representing Spatial Sound with First-Order B-Format Signals |
|
|
89 | (3) |
|
5.2 Some Notes on the Evolution of the Technique |
|
|
92 | (2) |
|
5.3 DirAC with Ideal B-Format Signals |
|
|
94 | (3) |
|
5.4 Analysis of Directional Parameters with Real Microphone Setups |
|
|
97 | (8) |
|
5.4.1 DOA Analysis with Open 2D Microphone Arrays |
|
|
97 | (2) |
|
5.4.2 DOA Analysis with 2D Arrays with a Rigid Baffle |
|
|
99 | (2) |
|
5.4.3 DOA Analysis in Underdetermined Cases |
|
|
101 | (1) |
|
5.4.4 DOA Analysis: Further Methods |
|
|
102 | (1) |
|
5.4.5 Effect of Spatial Aliasing and Microphone Noise on the Analysis of Diffuseness |
|
|
103 | (2) |
|
5.5 First-Order DirAC with Monophonic Audio Transmission |
|
|
105 | (1) |
|
5.6 First-Order DirAC with Multichannel Audio Transmission |
|
|
106 | (11) |
|
5.6.1 Stream-Based Virtual Microphone Rendering |
|
|
106 | (3) |
|
5.6.2 Evaluation of Virtual Microphone DirAC |
|
|
109 | (2) |
|
5.6.3 Discussion of Virtual Microphone DirAC |
|
|
111 | (1) |
|
5.6.4 Optimized DirAC Synthesis |
|
|
111 | (3) |
|
5.6.5 DirAC-Based Reproduction of Spaced-Array Recordings |
|
|
114 | (3) |
|
5.7 DirAC Synthesis for Headphones and for Hearing Aids |
|
|
117 | (2) |
|
5.7.1 Reproduction of B-Format Signals |
|
|
117 | (1) |
|
5.7.2 DirAC in Hearing Aids |
|
|
118 | (1) |
|
5.8 Optimizing the Time--Frequency Resolution of DirAC for Critical Signals |
|
|
119 | (1) |
|
5.9 Example Implementation |
|
|
120 | (16) |
|
5.9.1 Executing DirAC and Plotting Parameter History |
|
|
122 | (3) |
|
5.9.2 DirAC Initialization |
|
|
125 | (6) |
|
|
131 | (5) |
|
5.9 A Simplistic Binaural Synthesis of Loudspeaker Listening |
|
|
136 | (1) |
|
|
137 | (4) |
|
|
138 | (3) |
|
6 Higher-Order Directional Audio Coding |
|
|
141 | (20) |
|
|
|
|
141 | (3) |
|
|
144 | (1) |
|
6.3 Energetic Analysis and Estimation of Parameters |
|
|
145 | (6) |
|
6.3.1 Analysis of Intensity and Diffuseness in the Spherical Harmonic Domain |
|
|
146 | (1) |
|
6.3.2 Higher-Order Energetic Analysis |
|
|
147 | (2) |
|
|
149 | (2) |
|
6.4 Synthesis of Target Setup Signals |
|
|
151 | (606) |
|
6.4.1 Loudspeaker Rendering |
|
|
152 | (3) |
|
|
155 | (2) |
|
6.5 Subjective Evaluation |
|
|
157 | (1) |
|
|
157 | (4) |
|
|
158 | (3) |
|
7 Multi-Channel Sound Acquisition Using a Multi-Wave Sound Field Model |
|
|
161 | (40) |
|
|
|
|
161 | (2) |
|
7.2 Parametric Sound Acquisition and Processing |
|
|
163 | (1) |
|
7.2.1 Problem Formulation |
|
|
163 | (3) |
|
7.2.2 Principal Estimation of the Target Signal |
|
|
166 | (1) |
|
7.3 Multi-Wave Sound Field and Signal Model |
|
|
167 | (3) |
|
|
168 | (1) |
|
7.3.2 Diffuse Sound Model |
|
|
169 | (1) |
|
|
169 | (1) |
|
7.4 Direct and Diffuse Signal Estimation |
|
|
170 | (9) |
|
7.4.1 Estimation of the Direct Signal Ys(k,n) |
|
|
170 | (6) |
|
7.4.2 Estimation of the Diffuse Signal Yd(k,n) |
|
|
176 | (3) |
|
|
179 | (7) |
|
7.5.1 Estimation of the Number of Sources |
|
|
179 | (2) |
|
7.5.2 Direction of Arrival Estimation |
|
|
181 | (1) |
|
7.5.3 Microphone Input PSD Matrix |
|
|
181 | (1) |
|
7.5.4 Noise PSD Estimation |
|
|
182 | (1) |
|
7.5.5 Diffuse Sound PSD Estimation |
|
|
182 | (3) |
|
7.5.6 Signal PSD Estimation in Multi-Wave Scenarios |
|
|
185 | (1) |
|
7.6 Application to Spatial Sound Reproduction |
|
|
186 | (8) |
|
|
186 | (1) |
|
7.6.2 Spatial Sound Reproduction Based on Informed Spatial Filtering |
|
|
187 | (7) |
|
|
194 | (7) |
|
|
195 | (6) |
|
8 Adaptive Mixing of Excessively Directive and Robust Beamformers for Reproduction of Spatial Sound |
|
|
201 | (14) |
|
|
|
|
201 | (1) |
|
8.2 Notation and Signal Model |
|
|
202 | (1) |
|
8.3 Overview of the Method |
|
|
203 | (1) |
|
8.4 Loudspeaker-Based Spatial Sound Reproduction |
|
|
204 | (5) |
|
8.4.1 Estimation of the Target Covariance Matrix Cy |
|
|
204 | (2) |
|
8.4.2 Estimation of the Synthesis Beamforming Signals Ws |
|
|
206 | (1) |
|
8.4.3 Processing the Synthesis Signals (Wsx) to Obtain the Target Covariance Matrix Cy |
|
|
206 | (1) |
|
8.4.4 Spatial Energy Distribution |
|
|
207 | (1) |
|
|
208 | (1) |
|
8.5 Binaural-Based Spatial Sound Reproduction |
|
|
209 | (3) |
|
8.5.1 Estimation of the Analysis and Synthesis Beamforming Weight Matrices |
|
|
210 | (1) |
|
8.5.2 Diffuse-Field Equalization of HRTFs |
|
|
210 | (1) |
|
8.5.3 Adaptive Mixing and Decorrelation |
|
|
211 | (1) |
|
8.5.4 Subjective Evaluation |
|
|
211 | (1) |
|
|
212 | (3) |
|
|
212 | (3) |
|
9 Source Separation and Reconstruction of Spatial Audio Using Spectrogram Factorization |
|
|
215 | (37) |
|
|
|
|
215 | (2) |
|
9.2 Spectrogram Factorization |
|
|
217 | (9) |
|
|
217 | (1) |
|
9.2.2 Magnitude Spectrogram Models |
|
|
218 | (3) |
|
9.2.3 Complex-Valued Spectrogram Models |
|
|
221 | (4) |
|
9.2.4 Source Separation by Time-Frequency Filtering |
|
|
225 | (1) |
|
9.3 Array Signal Processing and Spectrogram Factorization |
|
|
226 | (5) |
|
9.3.1 Spaced Microphone Arrays |
|
|
226 | (1) |
|
9.3.2 Model for Spatial Covariance Based on Direction of Arrival |
|
|
227 | (2) |
|
9.3.3 Complex-Valued NMF with the Spatial Covariance Model |
|
|
229 | (2) |
|
9.4 Applications of Spectrogram Factorization in Spatial Audio |
|
|
231 | (12) |
|
9.4.1 Parameterization of Surround Sound: Upmixing by Time-Frequency Filtering |
|
|
231 | (2) |
|
9.4.2 Source Separation Using a Compact Microphone Array |
|
|
233 | (5) |
|
9.4.3 Reconstruction of Binaural Sound Through Source Separation |
|
|
238 | (5) |
|
|
243 | (1) |
|
|
243 | (9) |
|
|
247 | (5) |
|
Part III Signal-Dependent Spatial Filtering |
|
|
252 | (75) |
|
10 Time-Frequency Domain Spatial Audio Enhancement |
|
|
253 | (12) |
|
|
|
|
253 | (1) |
|
10.2 Signal-Independent Enhancement |
|
|
254 | (1) |
|
10.3 Signal-Dependent Enhancement |
|
|
255 | (10) |
|
10.3.1 Adaptive Beamformers |
|
|
255 | (2) |
|
|
257 | (1) |
|
|
257 | (2) |
|
10.3.4 Estimating Post-Filters with Machine Learning |
|
|
259 | (1) |
|
10.3.5 Post-Filter Design Based on Spatial Parameters |
|
|
259 | (2) |
|
|
261 | (4) |
|
11 Cross-Spectrum-Based Post-Filter Utilizing Noisy and Robust Beamformers |
|
|
265 | (26) |
|
|
|
|
265 | (2) |
|
11.2 Notation and Signal Model |
|
|
267 | (2) |
|
11.2.1 Virtual Microphone Design Utilizing Pressure Microphones |
|
|
268 | (1) |
|
11.3 Estimation of the Cross-Spectrum-Based Post-Filter |
|
|
269 | (10) |
|
11.3.1 Post-Filter Estimation Utilizing Two Static Beamformers |
|
|
270 | (2) |
|
11.3.2 Post-Filter Estimation Utilizing a Static and an Adaptive Beamformer |
|
|
272 | (5) |
|
11.3.3 Smoothing Techniques |
|
|
277 | (2) |
|
11.4 Implementation Examples |
|
|
279 | (4) |
|
|
279 | (2) |
|
11.4.2 Prototype Microphone Arrays |
|
|
281 | (2) |
|
11.5 Conclusions and Further Remarks |
|
|
283 | (1) |
|
|
284 | (7) |
|
|
287 | (4) |
|
12 Microphone-Array-Based Speech Enhancement Using Neural Networks |
|
|
291 | (36) |
|
|
|
291 | (2) |
|
12.2 Time--Frequency Masks for Speech Enhancement Using Supervised Learning |
|
|
293 | (5) |
|
12.2.1 Beamforming with Post-Filtering |
|
|
293 | (1) |
|
12.2.2 Overview of Mask Prediction |
|
|
294 | (1) |
|
12.2.3 Features for Mask Learning |
|
|
295 | (2) |
|
12.2.4 Target Mask Design |
|
|
297 | (1) |
|
12.3 Artificial Neural Networks |
|
|
298 | (7) |
|
12.3.1 Learning the Weights |
|
|
299 | (2) |
|
|
301 | (4) |
|
12.3.3 Deep Neural Networks |
|
|
305 | (1) |
|
12.4 Mask Learning: A Simulated Example |
|
|
305 | (5) |
|
12.4.1 Feature Extraction |
|
|
306 | (1) |
|
12.4.2 Target Mask Design |
|
|
306 | (1) |
|
12.4.3 Neural Network Training |
|
|
307 | (1) |
|
|
308 | (2) |
|
12.5 Mask Learning: A Real-World Example |
|
|
310 | (8) |
|
12.5.1 Brief Description of the Third CHiME Challenge Data |
|
|
310 | (2) |
|
12.5.2 Data Processing and Beamforming |
|
|
312 | (1) |
|
12.5.3 Description of Network Structure, Features, and Targets |
|
|
312 | (2) |
|
12.5.4 Mask Prediction Results and Discussion |
|
|
314 | (2) |
|
12.5.5 Speech Enhancement Results |
|
|
316 | (2) |
|
|
318 | (1) |
|
|
318 | (9) |
|
12.7.1 Matlab Code for Neural-Network-Based Sawtooth Denoising Example |
|
|
318 | (3) |
|
12.1.2 Matlab Code for Phase Feature Extraction |
|
|
321 | (3) |
|
|
324 | (3) |
|
|
327 | (60) |
|
13 Upmixing and Beamforming in Professional Audio |
|
|
329 | (18) |
|
|
|
329 | (1) |
|
13.2 Stereo-to-Multichannel Upmix Processor |
|
|
329 | (7) |
|
13.2.1 Product Description |
|
|
329 | (2) |
|
13.2.2 Considerations for Professional Audio and Broadcast |
|
|
331 | (1) |
|
|
332 | (4) |
|
13.3 Digitally Enhanced Shotgun Microphone |
|
|
336 | (5) |
|
13.3.1 Product Description |
|
|
336 | (1) |
|
|
336 | (1) |
|
|
336 | (3) |
|
13.3.4 Evaluations and Measurements |
|
|
339 | (2) |
|
13.4 Surround Microphone System Based on Two Microphone Elements |
|
|
341 | (4) |
|
13.4.1 Product Description |
|
|
341 | (3) |
|
|
344 | (1) |
|
|
345 | (2) |
|
|
345 | (2) |
|
14 Spatial Sound Scene Synthesis and Manipulation for Virtual Reality and Audio Effects |
|
|
347 | (16) |
|
|
|
|
|
|
347 | (1) |
|
14.2 Parametric Sound Scene Synthesis for Virtual Reality |
|
|
348 | (7) |
|
|
348 | (2) |
|
14.2.2 Synthesis of Virtual Sources |
|
|
350 | (2) |
|
14.2.3 Synthesis of Room Reverberation |
|
|
352 | (1) |
|
14.2.4 Augmentation of Virtual Reality with Real Spatial Recordings |
|
|
352 | (1) |
|
14.2.5 Higher-Order Processing |
|
|
353 | (1) |
|
14.2.6 Loudspeaker-Signal Bus |
|
|
354 | (1) |
|
14.3 Spatial Manipulation of Sound Scenes |
|
|
355 | (5) |
|
14.3.1 Parametric Directional Transformations |
|
|
356 | (1) |
|
14.3.2 Sweet-Spot Translation and Zooming |
|
|
356 | (1) |
|
|
356 | (1) |
|
14.3.4 Spatial Modulation |
|
|
357 | (1) |
|
14.3.5 Diffuse Field Level Control |
|
|
358 | (1) |
|
14.3.6 Ambience Extraction |
|
|
359 | (1) |
|
14.3.7 Spatialization of Monophonic Signals |
|
|
360 | (1) |
|
|
360 | (3) |
|
|
361 | (2) |
|
15 Parametric Spatial Audio Techniques in Teleconferencing and Remote Presence |
|
|
363 | (24) |
|
|
|
|
|
15.1 Introduction and Motivation |
|
|
363 | (2) |
|
|
365 | (1) |
|
15.3 Immersive Audio Communication System (ImmACS) |
|
|
366 | (10) |
|
|
366 | (7) |
|
|
373 | (3) |
|
15.4 Capture and Reproduction of Crowded Acoustic Environments |
|
|
376 | (8) |
|
15.4.1 Sound Source Positioning Based on VBAP |
|
|
376 | (1) |
|
15.4.2 Non-Parametric Approach |
|
|
377 | (2) |
|
15.4.3 Parametric Approach |
|
|
379 | (3) |
|
15.4.4 Example Application |
|
|
382 | (2) |
|
|
384 | (3) |
|
|
384 | (3) |
Index |
|
387 | |