Muutke küpsiste eelistusi

Parametric Time-Frequency Domain Spatial Audio [Kõva köide]

  • Formaat: Hardback, 416 pages, kõrgus x laius x paksus: 244x173x25 mm, kaal: 771 g
  • Sari: IEEE Press
  • Ilmumisaeg: 15-Dec-2017
  • Kirjastus: Wiley-IEEE Press
  • ISBN-10: 1119252598
  • ISBN-13: 9781119252597
Teised raamatud teemal:
  • Formaat: Hardback, 416 pages, kõrgus x laius x paksus: 244x173x25 mm, kaal: 771 g
  • Sari: IEEE Press
  • Ilmumisaeg: 15-Dec-2017
  • Kirjastus: Wiley-IEEE Press
  • ISBN-10: 1119252598
  • ISBN-13: 9781119252597
Teised raamatud teemal:

A comprehensive guide that addresses the theory and practice of spatial audio

This book provides readers with the principles and best practices in spatial audio signal processing. It describes how sound fields and their perceptual attributes are captured and analyzed within the time-frequency domain, how essential representation parameters are coded, and how such signals are efficiently reproduced for practical applications. The book is split into four parts starting with an overview of the fundamentals. It then goes on to explain the reproduction of spatial sound before offering an examination of signal-dependent spatial filtering. The book finishes with coverage of both current and future applications and the direction that spatial audio research is heading in.

Parametric Time-frequency Domain Spatial Audio focuses on applications in entertainment audio, including music, home cinema, and gaming—covering the capturing and reproduction of spatial sound as well as its generation, transduction, representation, transmission, and perception. This book will teach readers the tools needed for such processing, and provides an overview to existing research. It also shows recent up-to-date projects and commercial applications built on top of the systems.

  • Provides an in-depth presentation of the principles, past developments, state-of-the-art methods, and future research directions of spatial audio technologies
  • Includes contributions from leading researchers in the field
  • Offers MATLAB codes with selected chapters

An advanced book aimed at readers who are capable of digesting mathematical expressions about digital signal processing and sound field analysis, Parametric Time-frequency Domain Spatial Audio is best suited for researchers in academia and in the audio industry.

List of Contributors
xiii
Preface xv
About the Companion Website xix
Part I Analysis and Synthesis of Spatial Sound
1(68)
1 Time-Frequency Processing: Methods and Tools
3(22)
Juha Vilkamo
Tom Backstrom
1.1 Introduction
3(1)
1.2 Time-Frequency Processing
4(12)
1.2.1 Basic Structure
4(1)
1.2.2 Uniform Filter Banks
5(1)
1.2.3 Prototype Filters and Modulation
6(2)
1.2.4 A Robust Complex-Modulated Filter Bank, and Comparison with STFT
8(4)
1.2.5 Overlap-Add and Windowing
12(1)
1.2.6 Example Implementation of a Robust Filter Bank in Matlab
13(2)
1.2.7 Cascaded Filters
15(1)
1.3 Processing of Spatial Audio
16(9)
1.3.1 Stochastic Estimates
17(1)
1.3.2 Decorrelation
18(1)
1.3.3 Optimal and Generalized Solution for Spatial Sound Processing Using Covariance Matrices
19(4)
References
23(2)
2 Spatial Decomposition by Spherical Array Processing
25(24)
David Lou Alon
Boaz Rafaely
2.1 Introduction
25(1)
2.2 Sound Field Measurement by a Spherical Array
26(1)
2.3 Array Processing and Plane-Wave Decomposition
26(3)
2.4 Sensitivity to Noise and Standard Regularization Methods
29(3)
2.5 Optimal Noise-Robust Design
32(5)
2.5.1 PWD Estimation Error Measure
32(2)
2.5.2 PWD Error Minimization
34(1)
2.5.3 R-PWD Simulation Study
35(2)
2.6 Spatial Aliasing and High Frequency Performance Limit
37(2)
2.7 High Frequency Bandwidth Extension by Aliasing Cancellation
39(3)
2.7.1 Spatial Aliasing Error
39(1)
2.7.2 AC-PWD Simulation Study
40(2)
2.8 High Performance Broadband PWD Example
42(3)
2.8.1 Broadband Measurement Model
42(1)
2.8.2 Minimizing Broadband PWD Error
42(2)
2.8.3 BB-PWD Simulation Study
44(1)
2.9 Summary
45(1)
2.10 Acknowledgment
46(3)
References
46(3)
3 Sound Field Analysis Using Sparse Recovery
49(20)
Craig T. Jin
Nicolas Epain
Tahereh Noohi
3.1 Introduction
49(1)
3.2 The Plane-Wave Decomposition Problem
50(1)
3.2.1 Sparse Plane-Wave Decomposition
51(1)
3.2.2 The Iteratively Reweighted Least-Squares Algorithm
52(1)
3.3 Bayesian Approach to Plane-Wave Decomposition
53(2)
3.4 Calculating the IRLS Noise-Power Regularization Parameter
55(3)
3.4.1 Estimation of the Relative Noise Power
56(2)
3.5 Numerical Simulations
58(1)
3.6 Experiment: Echoic Sound Scene Analysis
59(6)
3.7 Conclusions
65(4)
Appendix
65(1)
References
66(3)
Part II Reproduction of Spatial Sound
69(183)
4 Overview of Time--Frequency Domain Parametric Spatial Audio Techniques
71(18)
Archontis Politis
Symeon Delikaris-Manias
Ville Pulkki
4.1 Introduction
71(2)
4.2 Parametric Processing Overview
73(16)
4.2.1 Analysis Principles
74(1)
4.2.2 Synthesis Principles
75(1)
4.2.3 Spatial Audio Coding and Up-Mixing
76(2)
4.2.4 Spatial Sound Recording and Reproduction
78(3)
4.2.5 Auralization of Measured Room Acoustics and Spatial Rendering of Room Impulse Responses
81(1)
References
82(7)
5 First-Order Directional Audio Coding (DirAC)
89(52)
Ville Pulkki
Archontis Politis
Mikko-Ville Laitinen
Juha Vilkamo
Jukka Ahonen
5.1 Representing Spatial Sound with First-Order B-Format Signals
89(3)
5.2 Some Notes on the Evolution of the Technique
92(2)
5.3 DirAC with Ideal B-Format Signals
94(3)
5.4 Analysis of Directional Parameters with Real Microphone Setups
97(8)
5.4.1 DOA Analysis with Open 2D Microphone Arrays
97(2)
5.4.2 DOA Analysis with 2D Arrays with a Rigid Baffle
99(2)
5.4.3 DOA Analysis in Underdetermined Cases
101(1)
5.4.4 DOA Analysis: Further Methods
102(1)
5.4.5 Effect of Spatial Aliasing and Microphone Noise on the Analysis of Diffuseness
103(2)
5.5 First-Order DirAC with Monophonic Audio Transmission
105(1)
5.6 First-Order DirAC with Multichannel Audio Transmission
106(11)
5.6.1 Stream-Based Virtual Microphone Rendering
106(3)
5.6.2 Evaluation of Virtual Microphone DirAC
109(2)
5.6.3 Discussion of Virtual Microphone DirAC
111(1)
5.6.4 Optimized DirAC Synthesis
111(3)
5.6.5 DirAC-Based Reproduction of Spaced-Array Recordings
114(3)
5.7 DirAC Synthesis for Headphones and for Hearing Aids
117(2)
5.7.1 Reproduction of B-Format Signals
117(1)
5.7.2 DirAC in Hearing Aids
118(1)
5.8 Optimizing the Time--Frequency Resolution of DirAC for Critical Signals
119(1)
5.9 Example Implementation
120(16)
5.9.1 Executing DirAC and Plotting Parameter History
122(3)
5.9.2 DirAC Initialization
125(6)
5.9.3 DirAC Runtime
131(5)
5.9 A Simplistic Binaural Synthesis of Loudspeaker Listening
136(1)
5.10 Summary
137(4)
References
138(3)
6 Higher-Order Directional Audio Coding
141(20)
Archontis Politis
Ville Pulkki
6.1 Introduction
141(3)
6.2 Sound Field Model
144(1)
6.3 Energetic Analysis and Estimation of Parameters
145(6)
6.3.1 Analysis of Intensity and Diffuseness in the Spherical Harmonic Domain
146(1)
6.3.2 Higher-Order Energetic Analysis
147(2)
6.3.3 Sector Profiles
149(2)
6.4 Synthesis of Target Setup Signals
151(606)
6.4.1 Loudspeaker Rendering
152(3)
6.4.2 Binaural Rendering
155(2)
6.5 Subjective Evaluation
157(1)
6.6 Conclusions
157(4)
References
158(3)
7 Multi-Channel Sound Acquisition Using a Multi-Wave Sound Field Model
161(40)
Oliver Thiergart
Emanuel Habets
7.1 Introduction
161(2)
7.2 Parametric Sound Acquisition and Processing
163(1)
7.2.1 Problem Formulation
163(3)
7.2.2 Principal Estimation of the Target Signal
166(1)
7.3 Multi-Wave Sound Field and Signal Model
167(3)
7.3.1 Direct Sound Model
168(1)
7.3.2 Diffuse Sound Model
169(1)
7.3.3 Noise Model
169(1)
7.4 Direct and Diffuse Signal Estimation
170(9)
7.4.1 Estimation of the Direct Signal Ys(k,n)
170(6)
7.4.2 Estimation of the Diffuse Signal Yd(k,n)
176(3)
7.5 Parameter Estimation
179(7)
7.5.1 Estimation of the Number of Sources
179(2)
7.5.2 Direction of Arrival Estimation
181(1)
7.5.3 Microphone Input PSD Matrix
181(1)
7.5.4 Noise PSD Estimation
182(1)
7.5.5 Diffuse Sound PSD Estimation
182(3)
7.5.6 Signal PSD Estimation in Multi-Wave Scenarios
185(1)
7.6 Application to Spatial Sound Reproduction
186(8)
7.6.1 State of the Art
186(1)
7.6.2 Spatial Sound Reproduction Based on Informed Spatial Filtering
187(7)
7.7 Summary
194(7)
References
195(6)
8 Adaptive Mixing of Excessively Directive and Robust Beamformers for Reproduction of Spatial Sound
201(14)
Symeon Delikaris-Manias
Juha Vilkamo
8.1 Introduction
201(1)
8.2 Notation and Signal Model
202(1)
8.3 Overview of the Method
203(1)
8.4 Loudspeaker-Based Spatial Sound Reproduction
204(5)
8.4.1 Estimation of the Target Covariance Matrix Cy
204(2)
8.4.2 Estimation of the Synthesis Beamforming Signals Ws
206(1)
8.4.3 Processing the Synthesis Signals (Wsx) to Obtain the Target Covariance Matrix Cy
206(1)
8.4.4 Spatial Energy Distribution
207(1)
8.4.5 Listening Tests
208(1)
8.5 Binaural-Based Spatial Sound Reproduction
209(3)
8.5.1 Estimation of the Analysis and Synthesis Beamforming Weight Matrices
210(1)
8.5.2 Diffuse-Field Equalization of HRTFs
210(1)
8.5.3 Adaptive Mixing and Decorrelation
211(1)
8.5.4 Subjective Evaluation
211(1)
8.6 Conclusions
212(3)
References
212(3)
9 Source Separation and Reconstruction of Spatial Audio Using Spectrogram Factorization
215(37)
Joonas Nikunen
Tuomas Virtanen
9.1 Introduction
215(2)
9.2 Spectrogram Factorization
217(9)
9.2.1 Mixtures of Sounds
217(1)
9.2.2 Magnitude Spectrogram Models
218(3)
9.2.3 Complex-Valued Spectrogram Models
221(4)
9.2.4 Source Separation by Time-Frequency Filtering
225(1)
9.3 Array Signal Processing and Spectrogram Factorization
226(5)
9.3.1 Spaced Microphone Arrays
226(1)
9.3.2 Model for Spatial Covariance Based on Direction of Arrival
227(2)
9.3.3 Complex-Valued NMF with the Spatial Covariance Model
229(2)
9.4 Applications of Spectrogram Factorization in Spatial Audio
231(12)
9.4.1 Parameterization of Surround Sound: Upmixing by Time-Frequency Filtering
231(2)
9.4.2 Source Separation Using a Compact Microphone Array
233(5)
9.4.3 Reconstruction of Binaural Sound Through Source Separation
238(5)
9.5 Discussion
243(1)
9.6 Matlab Example
243(9)
References
247(5)
Part III Signal-Dependent Spatial Filtering
252(75)
10 Time-Frequency Domain Spatial Audio Enhancement
253(12)
Symeon Delikaris-Manias
Pasi Pertila
10.1 Introduction
253(1)
10.2 Signal-Independent Enhancement
254(1)
10.3 Signal-Dependent Enhancement
255(10)
10.3.1 Adaptive Beamformers
255(2)
10.3.2 Post-Filters
257(1)
10.3.3 Post-Filter Types
257(2)
10.3.4 Estimating Post-Filters with Machine Learning
259(1)
10.3.5 Post-Filter Design Based on Spatial Parameters
259(2)
References
261(4)
11 Cross-Spectrum-Based Post-Filter Utilizing Noisy and Robust Beamformers
265(26)
Symeon Delikaris-Manias
Ville Pulkki
11.1 Introduction
265(2)
11.2 Notation and Signal Model
267(2)
11.2.1 Virtual Microphone Design Utilizing Pressure Microphones
268(1)
11.3 Estimation of the Cross-Spectrum-Based Post-Filter
269(10)
11.3.1 Post-Filter Estimation Utilizing Two Static Beamformers
270(2)
11.3.2 Post-Filter Estimation Utilizing a Static and an Adaptive Beamformer
272(5)
11.3.3 Smoothing Techniques
277(2)
11.4 Implementation Examples
279(4)
11.4.1 Ideal Conditions
279(2)
11.4.2 Prototype Microphone Arrays
281(2)
11.5 Conclusions and Further Remarks
283(1)
11.6 Source Code
284(7)
References
287(4)
12 Microphone-Array-Based Speech Enhancement Using Neural Networks
291(36)
Pasi Pertita
12.1 Introduction
291(2)
12.2 Time--Frequency Masks for Speech Enhancement Using Supervised Learning
293(5)
12.2.1 Beamforming with Post-Filtering
293(1)
12.2.2 Overview of Mask Prediction
294(1)
12.2.3 Features for Mask Learning
295(2)
12.2.4 Target Mask Design
297(1)
12.3 Artificial Neural Networks
298(7)
12.3.1 Learning the Weights
299(2)
12.3.2 Generalization
301(4)
12.3.3 Deep Neural Networks
305(1)
12.4 Mask Learning: A Simulated Example
305(5)
12.4.1 Feature Extraction
306(1)
12.4.2 Target Mask Design
306(1)
12.4.3 Neural Network Training
307(1)
12.4.4 Results
308(2)
12.5 Mask Learning: A Real-World Example
310(8)
12.5.1 Brief Description of the Third CHiME Challenge Data
310(2)
12.5.2 Data Processing and Beamforming
312(1)
12.5.3 Description of Network Structure, Features, and Targets
312(2)
12.5.4 Mask Prediction Results and Discussion
314(2)
12.5.5 Speech Enhancement Results
316(2)
12.6 Conclusions
318(1)
12.7 Source Code
318(9)
12.7.1 Matlab Code for Neural-Network-Based Sawtooth Denoising Example
318(3)
12.1.2 Matlab Code for Phase Feature Extraction
321(3)
References
324(3)
Part IV Applications
327(60)
13 Upmixing and Beamforming in Professional Audio
329(18)
Christof Falter
13.1 Introduction
329(1)
13.2 Stereo-to-Multichannel Upmix Processor
329(7)
13.2.1 Product Description
329(2)
13.2.2 Considerations for Professional Audio and Broadcast
331(1)
13.2.3 Signal Processing
332(4)
13.3 Digitally Enhanced Shotgun Microphone
336(5)
13.3.1 Product Description
336(1)
13.3.2 Concept
336(1)
13.3.3 Signal Processing
336(3)
13.3.4 Evaluations and Measurements
339(2)
13.4 Surround Microphone System Based on Two Microphone Elements
341(4)
13.4.1 Product Description
341(3)
13.4.2 Concept
344(1)
13.5 Summary
345(2)
References
345(2)
14 Spatial Sound Scene Synthesis and Manipulation for Virtual Reality and Audio Effects
347(16)
Ville Pulkki
Archontis Politis
Tapani Pihlajamaki
Mikko-Ville Laitinen
14.1 Introduction
347(1)
14.2 Parametric Sound Scene Synthesis for Virtual Reality
348(7)
14.2.1 Overall Structure
348(2)
14.2.2 Synthesis of Virtual Sources
350(2)
14.2.3 Synthesis of Room Reverberation
352(1)
14.2.4 Augmentation of Virtual Reality with Real Spatial Recordings
352(1)
14.2.5 Higher-Order Processing
353(1)
14.2.6 Loudspeaker-Signal Bus
354(1)
14.3 Spatial Manipulation of Sound Scenes
355(5)
14.3.1 Parametric Directional Transformations
356(1)
14.3.2 Sweet-Spot Translation and Zooming
356(1)
14.3.3 Spatial Filtering
356(1)
14.3.4 Spatial Modulation
357(1)
14.3.5 Diffuse Field Level Control
358(1)
14.3.6 Ambience Extraction
359(1)
14.3.7 Spatialization of Monophonic Signals
360(1)
14.4 Summary
360(3)
References
361(2)
15 Parametric Spatial Audio Techniques in Teleconferencing and Remote Presence
363(24)
Anastasios Alexandridis
Despoina Pavlidi
Nikolaos Stefanakis
Athanasios Mouchtaris
15.1 Introduction and Motivation
363(2)
15.2 Background
365(1)
15.3 Immersive Audio Communication System (ImmACS)
366(10)
15.3.1 Encoder
366(7)
15.3.2 Decoder
373(3)
15.4 Capture and Reproduction of Crowded Acoustic Environments
376(8)
15.4.1 Sound Source Positioning Based on VBAP
376(1)
15.4.2 Non-Parametric Approach
377(2)
15.4.3 Parametric Approach
379(3)
15.4.4 Example Application
382(2)
15.5 Conclusions
384(3)
References
384(3)
Index 387
VILLE PULKKI, PHD, is an Associate Professor leading the Communication Acoustics Research Group in the Department of Signal Processing and Acoustics, Aalto University, Finland. He has received distinguished medal awards from Society of Motion Picture and Television Engineers and from Audio Engineering Society.

SYMEON DELIKARIS-MANIAS is a postdoc researcher affiliated with the Communication Acoustics Research Group in the Department of Signal Processing and Acoustics at Aalto University, Finland.

ARCHONTIS POLITIS, PHD, is a postdoc researcher affiliated with the Communication Acoustics Research Group in the Department of Signal Processing and Acoustics at Aalto University and Tampere University of Technology in Finland.