|
1 Significance of Prosody for Speaker, Language, Emotion, and Speech Recognition |
|
|
1 | (22) |
|
|
1 | (1) |
|
|
2 | (2) |
|
|
2 | (1) |
|
|
3 | (1) |
|
|
4 | (1) |
|
1.3 Probabilistic Formulation of Recognition |
|
|
4 | (1) |
|
1.4 Significance of Prosody for Robust Recognition |
|
|
5 | (2) |
|
1.5 Automatic Speaker Recognition |
|
|
7 | (3) |
|
1.5.1 Speaker Recognition by Humans |
|
|
7 | (1) |
|
1.5.2 Speaker-Specific Aspect of Speech |
|
|
8 | (1) |
|
1.5.3 Significance of Prosody for Automatic Speaker Recognition |
|
|
9 | (1) |
|
1.6 Automatic Language Recognition |
|
|
10 | (5) |
|
1.6.1 Language Recognition by Humans |
|
|
11 | (1) |
|
1.6.2 Language-Specific Aspect of Speech |
|
|
11 | (1) |
|
1.6.3 Significance of Prosody for Automatic Language Recognition |
|
|
12 | (3) |
|
1.7 Automatic Emotion Recognition |
|
|
15 | (5) |
|
1.7.1 Emotion Recognition by Humans |
|
|
15 | (1) |
|
1.7.2 Emotion-Specific Aspect of Speech |
|
|
16 | (2) |
|
1.7.3 Significance of Prosody for Automatic Emotion Recognition |
|
|
18 | (2) |
|
1.8 Automatic Speech Recognition |
|
|
20 | (2) |
|
1.8.1 Speech Recognition by Humans |
|
|
21 | (1) |
|
1.8.2 Significance of Prosody for Automatic Speech Recognition |
|
|
22 | (1) |
|
|
22 | (1) |
|
2 Extraction and Representation of Prosody for Speaker, Language, Emotion, and Speech Recognition |
|
|
23 | (22) |
|
|
23 | (1) |
|
2.2 ASR-Free Approaches for Automatic Segmentation and Representation of Prosody |
|
|
24 | (15) |
|
2.2.1 Syllable-Like Segmentation Using Location of Vowel Onset Points |
|
|
24 | (6) |
|
2.2.2 Syllable-Like Segmentation Using Information from F0 and Energy Contour |
|
|
30 | (2) |
|
2.2.3 Syllable-Like Segmentation Using Detection of Vowel Region |
|
|
32 | (1) |
|
2.2.4 Segmentation Using Inflections or Start/End of Voicing |
|
|
33 | (1) |
|
2.2.5 Segmentation as Pseudo Syllables |
|
|
34 | (1) |
|
2.2.6 Segmentation at Predefined Intervals |
|
|
35 | (2) |
|
2.2.7 Suprasegmental Parameterization |
|
|
37 | (1) |
|
2.2.8 Segmentation at Sentence/Phrase and Syllable Level |
|
|
37 | (2) |
|
2.3 ASR-Based Approaches for Extraction and Representation of Prosody |
|
|
39 | (4) |
|
2.3.1 Segmentation into Nonuniform Extraction Regions |
|
|
39 | (3) |
|
2.3.2 Segmentation into Pseudo Syllables |
|
|
42 | (1) |
|
|
43 | (2) |
|
3 Modeling and Fusion of Prosody for Speaker, Language, Emotion, and Speech Recognition |
|
|
45 | (12) |
|
|
45 | (1) |
|
|
45 | (1) |
|
3.3 Speaker Recognition Systems Based on Prosody |
|
|
46 | (2) |
|
3.4 Language Recognition Systems Based on Prosody |
|
|
48 | (2) |
|
3.5 Emotion Recognition Systems Based on Prosody |
|
|
50 | (2) |
|
3.6 Speech Recognition Systems Based on Prosody |
|
|
52 | (1) |
|
3.7 Fusion of Prosodic Evidence into the Conventional Recognition Applications |
|
|
53 | (3) |
|
|
56 | (1) |
References |
|
57 | |