Muutke küpsiste eelistusi

Robustness in Automatic Speech Recognition: Fundamentals and Applications 1996 ed. [Kõva köide]

  • Kõva köide
  • Hind: 187,67 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 220,79 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
Provides a unified view of improving the capability of machines to recognize speech so that it will be reliable in the changing and often unpredictable conditions in real use; so far it works very well in the laboratory but not as consistently as consumers demand elsewhere. Covers the problems of speech production and perception in noise, popular techniques used in speech analysis and automatic speech recognition, problems relevant to robustness and speech-based applications, variability between and within speakers, various types of distorted speech, and recent advances in dealing with such problems. Not intended, but suitable for a graduate or undergraduate course. Annotation copyright Book News, Inc. Portland, Or.

The domain of speech processing has come to the point where researchers and engineers are concerned with how speech technology can be applied to new products, and how this technology will transform our future. One important problem is to improve robustness of speech processing under adverse conditions, which is the subject of this book. Robust speech processing is a relatively new area which became a concern as technology started moving from laboratory to field applications. A method or an algorithm is robust if it can deal with a broad range of applications and adapt to unknown conditions. Robustness in Automatic Speech Recognition addresses all of the fundamental problems and issues in the area. The book is divided into three parts. The first provides the background necessary for understanding the rest of the material. It also emphasizes the problems of speech production and perception in noise along with popular techniques used in speech analysis and automatic speech recognition. Part Two discusses the problems relevant to robustness in automatic speech recognition and speech-based applications. It emphasizes intra- and inter-speaker variability as well as automatic speech recognition of Lombard, noisy and channel distorted speech. Finally, the third part covers recent advances in the field of robust automatic speech recognition. Audience: An invaluable reference. May be used as a text for advanced courses on the subject.
About the authors xxiii(2)
Foreword xxv(2)
Preface xxvii(2)
Acknowledgments xxix
Part A SPEECH COMMUNICATION BY HUMANS AND MACHINES 1(124)
Chapter 1 NATURE AND PERCEPTION OF SPEECH SOUNDS
3(34)
1.1 SPEECH PRODUCTION
4(5)
1.1.1 The speech apparatus
4(1)
1.1.2 Articulatory phonetics
5(1)
1.1.3 Articulatory models
6(1)
1.1.4 Production of speech in noise
7(2)
1.2 ACOUSTIC PHONETICS
9(15)
1.2.1 Representations of Speech
9(1)
1.2.2 Phonemes and allophones
10(1)
1.2.3 Vowels
11(4)
1.2.4 Consonants
15(5)
1.2.5 Acoustic-phonetic changes due to the Lombard reflex
20(4)
1.3 HEARING AND PERCEPTION
24(13)
1.3.1 The auditory system
24(3)
1.3.2 Perception of sounds
27(3)
1.3.3 Influence of the Lombard reflex on speech perception
30(7)
Chapter 2 BACKGROUND ON SPEECH ANALYSIS
37(36)
2.1 PRINCIPLES AND AIMS OF SPEECH ANALYSIS METHODS
38(3)
2.1.1 Introduction
38(1)
2.1.2 The Fourier transforms
39(1)
2.1.3 Digital filter-banks
41(1)
2.2 SPEECH ANALYSIS BASED ON A PRODUCTION MODEL
41(4)
2.2.1 Introduction to the linear prediction analysis
41(1)
2.2.2 The LPC Model
42(2)
2.2.3 Spectral modeling using LPC
44(1)
2.3 FEATURE ANALYSIS
45(4)
2.3.1 Introduction
45(1)
2.3.2 Typical LPC parameters used in recognition
45(4)
2.3.3 Vector quantization
49(1)
2.4 TIME-FREQUENCY REPRESENTATIONS OF SPEECH
49(2)
2.5 WAVELETS
51(3)
2.6 HIGHER-ORDER SPECTRAL ANALYSIS
54(2)
2.7 SPEECH ANALYSIS BASED ON AUDITORY MODELS
56(5)
2.7.1 Introduction
56(2)
2.7.2 Physiological and psychoacoustic models
58(2)
2.7.3 Application to ASR
61(1)
2.8 LIMITS OF STANDARD ANALYSES IN PRESENCE OF NOISE
62(11)
Chapter 3 FUNDAMENTALS OF AUTOMATIC SPEECH RECOGNITION
73(52)
3.1 PRELIMINARIES
74(6)
3.1.1 Basic principles
74(3)
3.1.2 Historical background
77(3)
3.2 DISTANCE MEASURES
80(4)
3.2.1 Introduction
80(1)
3.2.2 Spectral distance measures
81(2)
3.2.3 Distance measures and speech perception
83(1)
3.3 PATTERN RECOGNITION METHODS FOR ASR
84(28)
3.3.1 Basic principles
84(1)
3.3.2 Time normalization
85(5)
3.3.3 Stochasti modeling
90(12)
3.3.4 Neural networks
102(10)
3.4 SPEAKER-DEPENDENT AND SPEKER-INDEPENDENT RECOGNITION
112(1)
3.4.1 Introdution
112(1)
3.4.2 Template seletion in pattern reognition ASR systems
113(1)
3.5 PERFORMING FINE DISTINCTIONS IN ASR
113(12)
Part B ROBUSTNESS IN ASR: PROBLEMS AND ISSUES 125(66)
Chapter 4 SPEAKER VARIABILITY AND SPECIFICITY
127(28)
4.1 VARIANTS OF SPEECH AND SPEAKING STYLES
128(10)
4.1.1 Introduction
128(4)
4.1.2 Read versus spontaneous speech
132(1)
4.1.3 Stress and emotion in speech
132(2)
4.1.4 Male-female differences
134(1)
4.1.5 Voice conversion
135(5)
4.1.6 Available databases to study speaking styles
137(1)
4.2 VARIABILITY AND INVARIANCE
138(17)
4.2.1 Preliminaries
138(3)
4.2.2 Personal variation or intra-speaker variability
141(1)
4.2.3 Inter-speaker variability
142(1)
4.2.4 Environment variability
143(1)
4.2.5 Linguistic variability
144(1)
4.2.6 Contextual variation
145(1)
4.2.7 Robust phonetic features in the presence of noise
146(1)
4.2.8 Relational invariance
147(8)
Chapter 5 DEALING WITH NOISY SPEECH AND CHANNEL DISTORTIONS
155(36)
5.1 TYPICAL NOISE SOURCES AND CHANNEL DISTORTIONS
156(11)
5.1.1 Preliminiaries
156(2)
5.1.2 Signal-to-noise ratio evaluation
158(2)
5.1.3 General assumptions
160(1)
5.1.4 Characteristics of some common noises
161(6)
5.2 EFFECTS OF ADDITIVE NOISE ON SPEECH
167(1)
5.3 HUMAN PERFORMANCE FOR SPEECH IN NOISE
168(3)
5.4 SOME ISSUES IN ASR OF NOISY SPEECH
171(8)
5.4.1 Introduction and specific difficulties
171(2)
5.4.2 Endpoint detection
173(6)
5.5 THE LOMBARD REFLEX AND ITS INCIDENCE ON ASR SYSTEMS
179(12)
5.5.1 Preliminaries
179(1)
5.5.2 ASR of Lombard speech
180(11)
Part C POSSIBLE SOLUTIONS AND SOME PERSPECTIVES 191(238)
Chapter 6 THE CURRENT TECHNOLOGY AND ITS LIMITS: AN OVERVIEW
193(14)
6.1 INTRODUCTION
194(1)
6.2 WHERE WE ARE TODAY AND WHERE TECHNOLOGY IS HEADING
194(7)
6.2.1 Current technology
194(2)
6.2.2 Real challenges
196(4)
6.2.3 Some reasons for today's limitations
200(1)
6.3 SPEECH RECOGNITION BY HUMAN LISTENERS AND MACHINES
201(2)
6.4 OVERVIEW OF RECENT ADVANCES IN ROBUST SPEECH PROCESSING
203(4)
Chapter 7 TOWARDS ROBUST SPEECH ANALYSIS
207(26)
7.1 PRELIMINARIES
208(1)
7.2 SIGNAL ACQUISITION
208(4)
7.3 ROBUST SPEECH ANALYSIS
212(21)
7.3.1 On the use of auditory models for better speech analysis
212(3)
7.3.2 Robust spectral estimation and ARMA models
225(8)
Chapter 8 ON THE USE OF A ROBUST SPEECH REPRESENTATION
233(40)
8.1 INTRODUCTION
234(1)
7.2 FEATURE EXTRACTION
235(20)
8.2.1 Time derivatives of speech
235(6)
8.2.2 AR modeling in the autocorrelation domain
241(2)
8.2.3 Feature processing
243(7)
8.2.4 Feature transformation
250(4)
8.2.5 Feature estimation in noise
254(1)
8.2.6 Other techniques providing improved features
255(1)
8.3 NOISE-ROBUST DISTORTION AND SIMILARITY MEASURES
255(18)
8.3.1 Cepstral lifters
255(2)
8.3.2 Robust distortion measures
257(4)
8.3.3 Discriminative similarity measures
261(12)
Chapter 9 ASR OF NOISY, STRESSED, AND CHANNEL DISTORTED SPEECH
273(52)
9.1 INTRODUCTION
274(2)
9.2 SPEECH ENHANCEMENT
276(14)
9.2.1 Filtering techniques
276(3)
9.2.2 Signal estimation techniques based on statistical modeling for speech enhancement
279(2)
9.2.3 Linear and non-linear spectral subtraction
281(5)
9.2.4 Signal restoration via a mapping transformation
286(4)
9.3 MODEL COMPENSATION
290(35)
9.3.1 HMM composition and decomposition
290(4)
9.3.2 Noise masking, data contamination, and noise immunity learning
294(1)
9.3.3 Adaptation techniques for noisy speech recognition
295(8)
9.3.4 Minimum error training
303(4)
9.3.5 Stress and channel compensation
307(5)
9.3.6 Concluding remarks
312(13)
Chapter 10 WORD-SPOTTING AND REJECTION
325(22)
10.1 WORD-SPOTTING VERSUS ENDPOINT-BASED RECOGNITION
326(12)
10.1.1 Preliminaries
326(3)
10.1.2 Template matching word-spotters
329(1)
10.1.3 Training garbage (or filler) models
330(2)
10.1.4 Word-spotting and large vocabulary recognition
332(1)
10.1.5 Vocabulary-independent word-spotting and user-defined keywords
332(1)
10.1.6 Performance measures
333(1)
10.1.7 Post word-spotting processing and rejection
334(3)
10.1.8 Examples of word-spotting applications
337(1)
10.2 CONFIDENCE MEASURES AND THE NEW WORD PROBLEM
338(9)
10.2.1 Recognition confidence measures
338(1)
10.2.1 Detecting out-of-vocabulary words and adding new words
339(8)
Chapter 11 SPONTANEOUS SPEECH
347(24)
11.1 INTRODUCTION
348(3)
11.2 THE ATIS DATABASE AND SPONTANEOUS SPEECH CORPORA
351(2)
11.3 THE SPEECH RECOGNITION-NATURAL LANGUAGE INTERFACE
353(3)
11.4 THE LANGUAGE MODEL
356(4)
11.5 ROBUST PARSING AND INTERPRETATION
360(11)
Chapter 12 ON THE USE OF KNOWLEDGE IN ASR
371(22)
12.1 STATEMENT OF THE PROBLEM
372(1)
12.2 HYBRID MODELS FOR ASR
373(6)
12.2.1 Preliminaries
373(1)
12.2.2 Hybrid data-based approaches
374(3)
ORION: A hybrid system for isolated word recognition
377(2)
12.3 MODELS FOR COOPERATION BETWEEN KNOWLEDGE SOURCES
379(2)
12.3.1 Statement of the problem
379(1)
12.3.2 Bottom-up versus top-down processing
379(1)
12.3.3 Heterarchical models for ASR
380(1)
12.4 DEDUCTIVE AND ABDUCTIVE REASONING MODELS FOR ASR
381(7)
12.4.1 Use of a production rule model
381(3)
12.4.2 Truth maintenance and abduction
384(4)
12.5 CONCLUSION
388(5)
Chapter 13 APPLICATION DOMAIN, HUMAN FACTORS, AND DIALOGUE
393(36)
13.1 THE APPLICATION DOMAIN
394(2)
13.2 HUMAN FACTORS AND USER INTERFACE
396(2)
13.3 DIALOGUE FOR IMPROVED ROBUSTNESS
398(6)
13.3.1 Beyond sentences and turn talking: towards a natural interaction
398(1)
13.3.2 Dialogue context and error correction
399(1)
13.3.3 Multimodal dialogue systems
400(1)
13.3.4 Different dialogue strategies for different applications
401(3)
13.4 APPLICATION-INDEPENDENCE AND FAST PROTOTYPING
404(5)
13.4.1 Introduction
404(1)
13.4.2 Vocabulary-independent recognition
404(3)
13.4.3 Application-independent dialogue strategies
407(1)
13.4.4 The notion of global speech interface
408(1)
13.5 THE ASSESSMENT AND ITS DIFFICULTIES
409(3)
13.6 A ROBUST REAL-WORLD APPLICATION
412(9)
13.6.1 Introduction
412(1)
13.6.2 TOBIE-SOL: A conversational system for a weak-sighted operator
413(8)
APPLICATION PERSPECTIVES FOR THE YEAR 2000
421(8)
Appendix 429(2)
Index 431