About the authors |
|
xxiii | (2) |
Foreword |
|
xxv | (2) |
Preface |
|
xxvii | (2) |
Acknowledgments |
|
xxix | |
Part A SPEECH COMMUNICATION BY HUMANS AND MACHINES |
|
1 | (124) |
|
Chapter 1 NATURE AND PERCEPTION OF SPEECH SOUNDS |
|
|
3 | (34) |
|
|
4 | (5) |
|
1.1.1 The speech apparatus |
|
|
4 | (1) |
|
1.1.2 Articulatory phonetics |
|
|
5 | (1) |
|
1.1.3 Articulatory models |
|
|
6 | (1) |
|
1.1.4 Production of speech in noise |
|
|
7 | (2) |
|
|
9 | (15) |
|
1.2.1 Representations of Speech |
|
|
9 | (1) |
|
1.2.2 Phonemes and allophones |
|
|
10 | (1) |
|
|
11 | (4) |
|
|
15 | (5) |
|
1.2.5 Acoustic-phonetic changes due to the Lombard reflex |
|
|
20 | (4) |
|
1.3 HEARING AND PERCEPTION |
|
|
24 | (13) |
|
1.3.1 The auditory system |
|
|
24 | (3) |
|
1.3.2 Perception of sounds |
|
|
27 | (3) |
|
1.3.3 Influence of the Lombard reflex on speech perception |
|
|
30 | (7) |
|
Chapter 2 BACKGROUND ON SPEECH ANALYSIS |
|
|
37 | (36) |
|
2.1 PRINCIPLES AND AIMS OF SPEECH ANALYSIS METHODS |
|
|
38 | (3) |
|
|
38 | (1) |
|
2.1.2 The Fourier transforms |
|
|
39 | (1) |
|
2.1.3 Digital filter-banks |
|
|
41 | (1) |
|
2.2 SPEECH ANALYSIS BASED ON A PRODUCTION MODEL |
|
|
41 | (4) |
|
2.2.1 Introduction to the linear prediction analysis |
|
|
41 | (1) |
|
|
42 | (2) |
|
2.2.3 Spectral modeling using LPC |
|
|
44 | (1) |
|
|
45 | (4) |
|
|
45 | (1) |
|
2.3.2 Typical LPC parameters used in recognition |
|
|
45 | (4) |
|
2.3.3 Vector quantization |
|
|
49 | (1) |
|
2.4 TIME-FREQUENCY REPRESENTATIONS OF SPEECH |
|
|
49 | (2) |
|
|
51 | (3) |
|
2.6 HIGHER-ORDER SPECTRAL ANALYSIS |
|
|
54 | (2) |
|
2.7 SPEECH ANALYSIS BASED ON AUDITORY MODELS |
|
|
56 | (5) |
|
|
56 | (2) |
|
2.7.2 Physiological and psychoacoustic models |
|
|
58 | (2) |
|
|
61 | (1) |
|
2.8 LIMITS OF STANDARD ANALYSES IN PRESENCE OF NOISE |
|
|
62 | (11) |
|
Chapter 3 FUNDAMENTALS OF AUTOMATIC SPEECH RECOGNITION |
|
|
73 | (52) |
|
|
74 | (6) |
|
|
74 | (3) |
|
3.1.2 Historical background |
|
|
77 | (3) |
|
|
80 | (4) |
|
|
80 | (1) |
|
3.2.2 Spectral distance measures |
|
|
81 | (2) |
|
3.2.3 Distance measures and speech perception |
|
|
83 | (1) |
|
3.3 PATTERN RECOGNITION METHODS FOR ASR |
|
|
84 | (28) |
|
|
84 | (1) |
|
|
85 | (5) |
|
|
90 | (12) |
|
|
102 | (10) |
|
3.4 SPEAKER-DEPENDENT AND SPEKER-INDEPENDENT RECOGNITION |
|
|
112 | (1) |
|
|
112 | (1) |
|
3.4.2 Template seletion in pattern reognition ASR systems |
|
|
113 | (1) |
|
3.5 PERFORMING FINE DISTINCTIONS IN ASR |
|
|
113 | (12) |
Part B ROBUSTNESS IN ASR: PROBLEMS AND ISSUES |
|
125 | (66) |
|
Chapter 4 SPEAKER VARIABILITY AND SPECIFICITY |
|
|
127 | (28) |
|
4.1 VARIANTS OF SPEECH AND SPEAKING STYLES |
|
|
128 | (10) |
|
|
128 | (4) |
|
4.1.2 Read versus spontaneous speech |
|
|
132 | (1) |
|
4.1.3 Stress and emotion in speech |
|
|
132 | (2) |
|
4.1.4 Male-female differences |
|
|
134 | (1) |
|
|
135 | (5) |
|
4.1.6 Available databases to study speaking styles |
|
|
137 | (1) |
|
4.2 VARIABILITY AND INVARIANCE |
|
|
138 | (17) |
|
|
138 | (3) |
|
4.2.2 Personal variation or intra-speaker variability |
|
|
141 | (1) |
|
4.2.3 Inter-speaker variability |
|
|
142 | (1) |
|
4.2.4 Environment variability |
|
|
143 | (1) |
|
4.2.5 Linguistic variability |
|
|
144 | (1) |
|
4.2.6 Contextual variation |
|
|
145 | (1) |
|
4.2.7 Robust phonetic features in the presence of noise |
|
|
146 | (1) |
|
4.2.8 Relational invariance |
|
|
147 | (8) |
|
Chapter 5 DEALING WITH NOISY SPEECH AND CHANNEL DISTORTIONS |
|
|
155 | (36) |
|
5.1 TYPICAL NOISE SOURCES AND CHANNEL DISTORTIONS |
|
|
156 | (11) |
|
|
156 | (2) |
|
5.1.2 Signal-to-noise ratio evaluation |
|
|
158 | (2) |
|
5.1.3 General assumptions |
|
|
160 | (1) |
|
5.1.4 Characteristics of some common noises |
|
|
161 | (6) |
|
5.2 EFFECTS OF ADDITIVE NOISE ON SPEECH |
|
|
167 | (1) |
|
5.3 HUMAN PERFORMANCE FOR SPEECH IN NOISE |
|
|
168 | (3) |
|
5.4 SOME ISSUES IN ASR OF NOISY SPEECH |
|
|
171 | (8) |
|
5.4.1 Introduction and specific difficulties |
|
|
171 | (2) |
|
|
173 | (6) |
|
5.5 THE LOMBARD REFLEX AND ITS INCIDENCE ON ASR SYSTEMS |
|
|
179 | (12) |
|
|
179 | (1) |
|
5.5.2 ASR of Lombard speech |
|
|
180 | (11) |
Part C POSSIBLE SOLUTIONS AND SOME PERSPECTIVES |
|
191 | (238) |
|
Chapter 6 THE CURRENT TECHNOLOGY AND ITS LIMITS: AN OVERVIEW |
|
|
193 | (14) |
|
|
194 | (1) |
|
6.2 WHERE WE ARE TODAY AND WHERE TECHNOLOGY IS HEADING |
|
|
194 | (7) |
|
|
194 | (2) |
|
|
196 | (4) |
|
6.2.3 Some reasons for today's limitations |
|
|
200 | (1) |
|
6.3 SPEECH RECOGNITION BY HUMAN LISTENERS AND MACHINES |
|
|
201 | (2) |
|
6.4 OVERVIEW OF RECENT ADVANCES IN ROBUST SPEECH PROCESSING |
|
|
203 | (4) |
|
Chapter 7 TOWARDS ROBUST SPEECH ANALYSIS |
|
|
207 | (26) |
|
|
208 | (1) |
|
|
208 | (4) |
|
7.3 ROBUST SPEECH ANALYSIS |
|
|
212 | (21) |
|
7.3.1 On the use of auditory models for better speech analysis |
|
|
212 | (3) |
|
7.3.2 Robust spectral estimation and ARMA models |
|
|
225 | (8) |
|
Chapter 8 ON THE USE OF A ROBUST SPEECH REPRESENTATION |
|
|
233 | (40) |
|
|
234 | (1) |
|
|
235 | (20) |
|
8.2.1 Time derivatives of speech |
|
|
235 | (6) |
|
8.2.2 AR modeling in the autocorrelation domain |
|
|
241 | (2) |
|
|
243 | (7) |
|
8.2.4 Feature transformation |
|
|
250 | (4) |
|
8.2.5 Feature estimation in noise |
|
|
254 | (1) |
|
8.2.6 Other techniques providing improved features |
|
|
255 | (1) |
|
8.3 NOISE-ROBUST DISTORTION AND SIMILARITY MEASURES |
|
|
255 | (18) |
|
|
255 | (2) |
|
8.3.2 Robust distortion measures |
|
|
257 | (4) |
|
8.3.3 Discriminative similarity measures |
|
|
261 | (12) |
|
Chapter 9 ASR OF NOISY, STRESSED, AND CHANNEL DISTORTED SPEECH |
|
|
273 | (52) |
|
|
274 | (2) |
|
|
276 | (14) |
|
9.2.1 Filtering techniques |
|
|
276 | (3) |
|
9.2.2 Signal estimation techniques based on statistical modeling for speech enhancement |
|
|
279 | (2) |
|
9.2.3 Linear and non-linear spectral subtraction |
|
|
281 | (5) |
|
9.2.4 Signal restoration via a mapping transformation |
|
|
286 | (4) |
|
|
290 | (35) |
|
9.3.1 HMM composition and decomposition |
|
|
290 | (4) |
|
9.3.2 Noise masking, data contamination, and noise immunity learning |
|
|
294 | (1) |
|
9.3.3 Adaptation techniques for noisy speech recognition |
|
|
295 | (8) |
|
9.3.4 Minimum error training |
|
|
303 | (4) |
|
9.3.5 Stress and channel compensation |
|
|
307 | (5) |
|
|
312 | (13) |
|
Chapter 10 WORD-SPOTTING AND REJECTION |
|
|
325 | (22) |
|
10.1 WORD-SPOTTING VERSUS ENDPOINT-BASED RECOGNITION |
|
|
326 | (12) |
|
|
326 | (3) |
|
10.1.2 Template matching word-spotters |
|
|
329 | (1) |
|
10.1.3 Training garbage (or filler) models |
|
|
330 | (2) |
|
10.1.4 Word-spotting and large vocabulary recognition |
|
|
332 | (1) |
|
10.1.5 Vocabulary-independent word-spotting and user-defined keywords |
|
|
332 | (1) |
|
10.1.6 Performance measures |
|
|
333 | (1) |
|
10.1.7 Post word-spotting processing and rejection |
|
|
334 | (3) |
|
10.1.8 Examples of word-spotting applications |
|
|
337 | (1) |
|
10.2 CONFIDENCE MEASURES AND THE NEW WORD PROBLEM |
|
|
338 | (9) |
|
10.2.1 Recognition confidence measures |
|
|
338 | (1) |
|
10.2.1 Detecting out-of-vocabulary words and adding new words |
|
|
339 | (8) |
|
Chapter 11 SPONTANEOUS SPEECH |
|
|
347 | (24) |
|
|
348 | (3) |
|
11.2 THE ATIS DATABASE AND SPONTANEOUS SPEECH CORPORA |
|
|
351 | (2) |
|
11.3 THE SPEECH RECOGNITION-NATURAL LANGUAGE INTERFACE |
|
|
353 | (3) |
|
|
356 | (4) |
|
11.5 ROBUST PARSING AND INTERPRETATION |
|
|
360 | (11) |
|
Chapter 12 ON THE USE OF KNOWLEDGE IN ASR |
|
|
371 | (22) |
|
12.1 STATEMENT OF THE PROBLEM |
|
|
372 | (1) |
|
12.2 HYBRID MODELS FOR ASR |
|
|
373 | (6) |
|
|
373 | (1) |
|
12.2.2 Hybrid data-based approaches |
|
|
374 | (3) |
|
ORION: A hybrid system for isolated word recognition |
|
|
377 | (2) |
|
12.3 MODELS FOR COOPERATION BETWEEN KNOWLEDGE SOURCES |
|
|
379 | (2) |
|
12.3.1 Statement of the problem |
|
|
379 | (1) |
|
12.3.2 Bottom-up versus top-down processing |
|
|
379 | (1) |
|
12.3.3 Heterarchical models for ASR |
|
|
380 | (1) |
|
12.4 DEDUCTIVE AND ABDUCTIVE REASONING MODELS FOR ASR |
|
|
381 | (7) |
|
12.4.1 Use of a production rule model |
|
|
381 | (3) |
|
12.4.2 Truth maintenance and abduction |
|
|
384 | (4) |
|
|
388 | (5) |
|
Chapter 13 APPLICATION DOMAIN, HUMAN FACTORS, AND DIALOGUE |
|
|
393 | (36) |
|
13.1 THE APPLICATION DOMAIN |
|
|
394 | (2) |
|
13.2 HUMAN FACTORS AND USER INTERFACE |
|
|
396 | (2) |
|
13.3 DIALOGUE FOR IMPROVED ROBUSTNESS |
|
|
398 | (6) |
|
13.3.1 Beyond sentences and turn talking: towards a natural interaction |
|
|
398 | (1) |
|
13.3.2 Dialogue context and error correction |
|
|
399 | (1) |
|
13.3.3 Multimodal dialogue systems |
|
|
400 | (1) |
|
13.3.4 Different dialogue strategies for different applications |
|
|
401 | (3) |
|
13.4 APPLICATION-INDEPENDENCE AND FAST PROTOTYPING |
|
|
404 | (5) |
|
|
404 | (1) |
|
13.4.2 Vocabulary-independent recognition |
|
|
404 | (3) |
|
13.4.3 Application-independent dialogue strategies |
|
|
407 | (1) |
|
13.4.4 The notion of global speech interface |
|
|
408 | (1) |
|
13.5 THE ASSESSMENT AND ITS DIFFICULTIES |
|
|
409 | (3) |
|
13.6 A ROBUST REAL-WORLD APPLICATION |
|
|
412 | (9) |
|
|
412 | (1) |
|
13.6.2 TOBIE-SOL: A conversational system for a weak-sighted operator |
|
|
413 | (8) |
|
APPLICATION PERSPECTIVES FOR THE YEAR 2000 |
|
|
421 | (8) |
Appendix |
|
429 | (2) |
Index |
|
431 | |