|
|
|
1.1 Multimodal Speech Enhancement |
|
|
1 | (1) |
|
1.2 Cognitively Inspired Intelligent Flexibility |
|
|
2 | (3) |
|
|
3 | (2) |
|
2 Audio and Visual Speech Relationship |
|
|
5 | (8) |
|
2.1 Audio and Visual Speech Production |
|
|
5 | (1) |
|
|
5 | (1) |
|
2.1.2 Phonemes and Visemes |
|
|
6 | (1) |
|
2.2 Multimodal Speech Phenomena |
|
|
6 | (1) |
|
2.2.1 Cocktail Party Problem |
|
|
6 | (1) |
|
|
7 | (1) |
|
|
7 | (1) |
|
2.3 Audiovisual Speech Correlation Background |
|
|
7 | (1) |
|
2.4 Multimodal Correlation Analysis |
|
|
8 | (5) |
|
2.4.1 Correlation Measurement |
|
|
9 | (1) |
|
2.4.2 Multimodal Correlation Analysis Results |
|
|
9 | (2) |
|
|
11 | (2) |
|
|
13 | (22) |
|
3.1 Application of Speech Processing Techniques to Hearing Aids |
|
|
13 | (4) |
|
3.1.1 Directional Microphones |
|
|
14 | (2) |
|
3.1.2 Noise Cancelling Algorithms |
|
|
16 | (1) |
|
3.2 Audiovisual Speech Enhancement Techniques |
|
|
17 | (8) |
|
|
17 | (1) |
|
3.2.2 Audiovisual Blind Source Separation |
|
|
18 | (2) |
|
3.2.3 Multimodal Fragment Decoding |
|
|
20 | (2) |
|
3.2.4 Visually Derived Wiener Filtering |
|
|
22 | (3) |
|
3.3 Visual Tracking and Detection |
|
|
25 | (3) |
|
|
25 | (1) |
|
3.3.2 Region of Interest Detection |
|
|
26 | (2) |
|
3.4 Audiovisual Speech Corpora |
|
|
28 | (7) |
|
3.4.1 The BANCA Speech Database |
|
|
28 | (1) |
|
3.4.2 The Extended M2VTS Database |
|
|
29 | (1) |
|
3.4.3 The AVICAR Speech Database |
|
|
29 | (1) |
|
3.4.4 The VidTIMIT Multimodal Database |
|
|
29 | (1) |
|
|
30 | (1) |
|
|
30 | (5) |
|
4 A Two Stage Multimodal Speech Enhancement System |
|
|
35 | (18) |
|
4.1 Overall Design Framework of the Two-Stage Multimodal System |
|
|
35 | (1) |
|
4.2 Reverberant Room Environment |
|
|
36 | (1) |
|
4.3 Multiple Microphone Array |
|
|
37 | (1) |
|
4.4 Audio Feature Extraction |
|
|
38 | (2) |
|
4.5 Visual Feature Extraction |
|
|
40 | (5) |
|
4.6 Visually Derived Wiener Filtering |
|
|
45 | (1) |
|
4.7 Gaussian Mixture Model for Audiovisual Clean Speech Estimation |
|
|
46 | (2) |
|
|
48 | (5) |
|
|
50 | (3) |
|
5 Experiments, Results, and Analysis |
|
|
53 | (22) |
|
5.1 Speech Enhancement Evaluation Approaches |
|
|
53 | (3) |
|
5.1.1 Subjective Speech Quality Evaluation Measures |
|
|
54 | (1) |
|
5.1.2 Objective Speech Quality Evaluation Measures |
|
|
54 | (2) |
|
5.2 Preliminary Experimentation |
|
|
56 | (1) |
|
5.3 Automated Lip Detection Evaluation |
|
|
56 | (3) |
|
5.3.1 Problem Description |
|
|
56 | (1) |
|
|
57 | (1) |
|
5.3.3 Results and Discussion |
|
|
57 | (2) |
|
5.4 Noisy Audio Environments |
|
|
59 | (8) |
|
5.4.1 Problem Description |
|
|
59 | (1) |
|
|
60 | (1) |
|
5.4.3 Results and Discussion |
|
|
60 | (7) |
|
5.5 Testing with Novel Corpus |
|
|
67 | (3) |
|
5.5.1 Problem Description |
|
|
67 | (1) |
|
|
67 | (1) |
|
5.5.3 Results and Discussion |
|
|
68 | (2) |
|
5.6 Inconsistent Audio Environment |
|
|
70 | (5) |
|
5.6.1 Problem Description |
|
|
70 | (1) |
|
|
70 | (1) |
|
5.6.3 Results and Discussion |
|
|
71 | (2) |
|
|
73 | (2) |
|
6 Towards Fuzzy Logic Based Multimodal Speech Filtering |
|
|
75 | (16) |
|
6.1 Limitations of Current Two-Stage System |
|
|
75 | (1) |
|
6.2 Fuzzy Logic Based Model Justification |
|
|
76 | (3) |
|
6.2.1 Requirements of Autonomous, Adaptive, and Context Aware Speech Filtering |
|
|
77 | (1) |
|
6.2.2 Fuzzy Logic Based Decision Making |
|
|
78 | (1) |
|
6.3 Potential Alternative Approaches |
|
|
79 | (2) |
|
6.3.1 Hidden Markov Models |
|
|
79 | (1) |
|
|
80 | (1) |
|
6.4 Fuzzy Based Multimodal Speech Enhancement Framework |
|
|
81 | (10) |
|
6.4.1 Overall Design Framework of Fuzzy System |
|
|
81 | (1) |
|
6.4.2 Fuzzy Logic Based Framework Inputs |
|
|
82 | (4) |
|
6.4.3 Fuzzy Logic Based Switching Supervisor |
|
|
86 | (3) |
|
|
89 | (2) |
|
7 Evaluation of Fuzzy Logic Proof of Concept |
|
|
91 | (20) |
|
|
91 | (1) |
|
7.2 Experimentation Limitations |
|
|
92 | (1) |
|
7.3 Recording of Challenging Audiovisual Speech Corpus |
|
|
92 | (3) |
|
7.3.1 Corpus Configuration |
|
|
92 | (3) |
|
7.4 Fuzzy Input Variable Evaluation |
|
|
95 | (6) |
|
7.4.1 Visual Quality Fuzzy Indicator |
|
|
95 | (4) |
|
7.4.2 Previous Frame Fuzzy Input Variable |
|
|
99 | (2) |
|
7.5 Detailed System Evaluation |
|
|
101 | (9) |
|
7.5.1 Problem Description |
|
|
101 | (1) |
|
|
102 | (1) |
|
7.5.3 Subjective Testing with Broadband Noise |
|
|
103 | (2) |
|
7.5.4 Detailed Fuzzy Switching Performance |
|
|
105 | (5) |
|
7.6 Discussion of Results |
|
|
110 | (1) |
|
|
110 | (1) |
|
8 Potential Future Research Directions |
|
|
111 | (4) |
|
8.1 Improvement of Individual Speech Processing Components |
|
|
111 | (1) |
|
8.2 Extension of Overall Speech Filtering Framework |
|
|
112 | (1) |
|
8.3 Further Development of Fuzzy Logic Based Switching Controller |
|
|
112 | (1) |
|
8.4 Practical Implementation of System |
|
|
113 | (2) |
|
|
114 | (1) |
Index |
|
115 | |