Muutke küpsiste eelistusi

Cognitively Inspired Audiovisual Speech Filtering: Towards an Intelligent, Fuzzy Based, Multimodal, Two-Stage Speech Enhancement System 2015 ed. [Pehme köide]

  • Formaat: Paperback / softback, 121 pages, kõrgus x laius: 235x155 mm, kaal: 2488 g, 37 Illustrations, color; 4 Illustrations, black and white; XVIII, 121 p. 41 illus., 37 illus. in color., 1 Paperback / softback
  • Sari: SpringerBriefs in Cognitive Computation 5
  • Ilmumisaeg: 19-Aug-2015
  • Kirjastus: Springer International Publishing AG
  • ISBN-10: 3319135082
  • ISBN-13: 9783319135083
  • Pehme köide
  • Hind: 48,70 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 57,29 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 121 pages, kõrgus x laius: 235x155 mm, kaal: 2488 g, 37 Illustrations, color; 4 Illustrations, black and white; XVIII, 121 p. 41 illus., 37 illus. in color., 1 Paperback / softback
  • Sari: SpringerBriefs in Cognitive Computation 5
  • Ilmumisaeg: 19-Aug-2015
  • Kirjastus: Springer International Publishing AG
  • ISBN-10: 3319135082
  • ISBN-13: 9783319135083
This book presents a summary of the cognitively inspired basis behind multimodal speech enhancement, covering the relationship between audio and visual modalities in speech, as well as recent research into audiovisual speech correlation. A number of audiovisual speech filtering approaches that make use of this relationship are also discussed. A novel multimodal speech enhancement system, making use of both visual and audio information to filter speech, is presented, and this book explores the extension of this system with the use of fuzzy logic to demonstrate an initial implementation of an autonomous, adaptive, and context aware multimodal system. This work also discusses the challenges presented with regard to testing such a system, the limitations with many current audiovisual speech corpora, and discusses a suitable approach towards development of a corpus designed to test this novel, cognitively inspired, speech filtering system.                                                                                

Arvustused

                                                                                                                                                   

1 Introduction
1.1 Multimodal Speech Enhancement
1(1)
1.2 Cognitively Inspired Intelligent Flexibility
2(3)
References
3(2)
2 Audio and Visual Speech Relationship
5(8)
2.1 Audio and Visual Speech Production
5(1)
2.1.1 Speech Production
5(1)
2.1.2 Phonemes and Visemes
6(1)
2.2 Multimodal Speech Phenomena
6(1)
2.2.1 Cocktail Party Problem
6(1)
2.2.2 McGurk Effect
7(1)
2.2.3 Lombard Effect
7(1)
2.3 Audiovisual Speech Correlation Background
7(1)
2.4 Multimodal Correlation Analysis
8(5)
2.4.1 Correlation Measurement
9(1)
2.4.2 Multimodal Correlation Analysis Results
9(2)
References
11(2)
3 The Research Context
13(22)
3.1 Application of Speech Processing Techniques to Hearing Aids
13(4)
3.1.1 Directional Microphones
14(2)
3.1.2 Noise Cancelling Algorithms
16(1)
3.2 Audiovisual Speech Enhancement Techniques
17(8)
3.2.1 Background
17(1)
3.2.2 Audiovisual Blind Source Separation
18(2)
3.2.3 Multimodal Fragment Decoding
20(2)
3.2.4 Visually Derived Wiener Filtering
22(3)
3.3 Visual Tracking and Detection
25(3)
3.3.1 Lip Tracking
25(1)
3.3.2 Region of Interest Detection
26(2)
3.4 Audiovisual Speech Corpora
28(7)
3.4.1 The BANCA Speech Database
28(1)
3.4.2 The Extended M2VTS Database
29(1)
3.4.3 The AVICAR Speech Database
29(1)
3.4.4 The VidTIMIT Multimodal Database
29(1)
3.4.5 The GRID Corpus
30(1)
References
30(5)
4 A Two Stage Multimodal Speech Enhancement System
35(18)
4.1 Overall Design Framework of the Two-Stage Multimodal System
35(1)
4.2 Reverberant Room Environment
36(1)
4.3 Multiple Microphone Array
37(1)
4.4 Audio Feature Extraction
38(2)
4.5 Visual Feature Extraction
40(5)
4.6 Visually Derived Wiener Filtering
45(1)
4.7 Gaussian Mixture Model for Audiovisual Clean Speech Estimation
46(2)
4.8 Beamforming
48(5)
References
50(3)
5 Experiments, Results, and Analysis
53(22)
5.1 Speech Enhancement Evaluation Approaches
53(3)
5.1.1 Subjective Speech Quality Evaluation Measures
54(1)
5.1.2 Objective Speech Quality Evaluation Measures
54(2)
5.2 Preliminary Experimentation
56(1)
5.3 Automated Lip Detection Evaluation
56(3)
5.3.1 Problem Description
56(1)
5.3.2 Experiment Setup
57(1)
5.3.3 Results and Discussion
57(2)
5.4 Noisy Audio Environments
59(8)
5.4.1 Problem Description
59(1)
5.4.2 Experiment Setup
60(1)
5.4.3 Results and Discussion
60(7)
5.5 Testing with Novel Corpus
67(3)
5.5.1 Problem Description
67(1)
5.5.2 Experiment Setup
67(1)
5.5.3 Results and Discussion
68(2)
5.6 Inconsistent Audio Environment
70(5)
5.6.1 Problem Description
70(1)
5.6.2 Experiment Setup
70(1)
5.6.3 Results and Discussion
71(2)
References
73(2)
6 Towards Fuzzy Logic Based Multimodal Speech Filtering
75(16)
6.1 Limitations of Current Two-Stage System
75(1)
6.2 Fuzzy Logic Based Model Justification
76(3)
6.2.1 Requirements of Autonomous, Adaptive, and Context Aware Speech Filtering
77(1)
6.2.2 Fuzzy Logic Based Decision Making
78(1)
6.3 Potential Alternative Approaches
79(2)
6.3.1 Hidden Markov Models
79(1)
6.3.2 Neural Networks
80(1)
6.4 Fuzzy Based Multimodal Speech Enhancement Framework
81(10)
6.4.1 Overall Design Framework of Fuzzy System
81(1)
6.4.2 Fuzzy Logic Based Framework Inputs
82(4)
6.4.3 Fuzzy Logic Based Switching Supervisor
86(3)
References
89(2)
7 Evaluation of Fuzzy Logic Proof of Concept
91(20)
7.1 Testing Requirements
91(1)
7.2 Experimentation Limitations
92(1)
7.3 Recording of Challenging Audiovisual Speech Corpus
92(3)
7.3.1 Corpus Configuration
92(3)
7.4 Fuzzy Input Variable Evaluation
95(6)
7.4.1 Visual Quality Fuzzy Indicator
95(4)
7.4.2 Previous Frame Fuzzy Input Variable
99(2)
7.5 Detailed System Evaluation
101(9)
7.5.1 Problem Description
101(1)
7.5.2 Experiment Setup
102(1)
7.5.3 Subjective Testing with Broadband Noise
103(2)
7.5.4 Detailed Fuzzy Switching Performance
105(5)
7.6 Discussion of Results
110(1)
References
110(1)
8 Potential Future Research Directions
111(4)
8.1 Improvement of Individual Speech Processing Components
111(1)
8.2 Extension of Overall Speech Filtering Framework
112(1)
8.3 Further Development of Fuzzy Logic Based Switching Controller
112(1)
8.4 Practical Implementation of System
113(2)
References
114(1)
Index 115