|
|
xi | |
|
|
xv | |
Short biographies of the editors and authors |
|
xvii | |
Preface to Voice Biometrics |
|
xxvii | |
About the editors |
|
xxix | |
|
|
1 | (1) |
|
|
|
Chapter 2 Fundamentals of voice biometrics: classical and machine learning approaches |
|
|
1 | (1) |
|
Chapter 3 Voice biometrics: attacker's perspective |
|
|
2 | (1) |
|
Chapter 4 Voice biometrics: privacy in paralinguistic and extralinguistic tasks for health applications |
|
|
3 | (1) |
|
Chapter 5 Voice privacy in biometrics: speaker de-identification |
|
|
3 | (1) |
|
Chapter 6 Performance evaluation of voice biometrics solutions |
|
|
3 | (1) |
|
Chapter 7 Voice biometrics: how the technology is standardized |
|
|
4 | (1) |
|
Chapter 8 Voice biometrics: perspective from the industry |
|
|
4 | (1) |
|
Chapter 9 Joining forces of voice and facial biometrics: a case study in the scope of NIST SRE' 19 |
|
|
4 | (1) |
|
Chapter 10 Voice biometrics: future trends and challenges ahead |
|
|
5 | (2) |
|
2 Fundamentals of voice biometrics: classical and machine learning approaches |
|
|
7 | (32) |
|
|
Joaquin Gonzalez-Rodriguez |
|
|
|
|
2.1 Introduction to speaker recognition systems |
|
|
7 | (2) |
|
2.2 Metrics for system performance evaluation |
|
|
9 | (2) |
|
|
9 | (1) |
|
2.2.2 Detection cost function |
|
|
10 | (1) |
|
2.3 Text-independent speaker recognition |
|
|
11 | (10) |
|
2.3.1 Classical acoustic approaches: GMM-UBM, i-vector and PLDA |
|
|
11 | (3) |
|
|
14 | (1) |
|
2.3.2.1 Basic concepts of neural networks |
|
|
14 | (3) |
|
2.3.2.2 Some applications of DNNs to speech processing |
|
|
17 | (1) |
|
2.3.3 DNNs for speaker recognition |
|
|
17 | (4) |
|
2.4 Text-dependent speaker recognition |
|
|
21 | (2) |
|
2.4.1 Classification of systems and techniques |
|
|
21 | (1) |
|
2.4.2 Databases and benchmarks |
|
|
22 | (1) |
|
2.5 Calibration of speaker recognition scores |
|
|
23 | (16) |
|
2.5.1 Motivation: why to calibrate? |
|
|
23 | (2) |
|
2.5.2 What is calibration? |
|
|
25 | (2) |
|
2.5.3 Score-to-LR computation methods |
|
|
27 | (1) |
|
2.5.3.1 Generative calibration models: fitting distributions to scores |
|
|
28 | (1) |
|
2.5.3.2 Discriminative calibration models: transforming scores into LR values to optimize a cost function |
|
|
28 | (2) |
|
2.5.4 Performance measurement of score-to-LR methods |
|
|
30 | (1) |
|
|
31 | (8) |
|
3 Voice biometrics: attacker's perspective |
|
|
39 | (28) |
|
|
|
|
39 | (1) |
|
|
40 | (2) |
|
|
42 | (6) |
|
|
42 | (2) |
|
3.2.2 Black box hardware attacks |
|
|
44 | (1) |
|
3.2.3 Black box adversarial attacks |
|
|
45 | (3) |
|
|
48 | (5) |
|
|
48 | (3) |
|
3.3.2 Gray box hardware attacks |
|
|
51 | (1) |
|
3.3.3 Gray box and white box adversarial attacks |
|
|
51 | (2) |
|
3.4 Technological challenges |
|
|
53 | (4) |
|
3.4.1 Extracting prosodic information |
|
|
53 | (1) |
|
3.4.2 Enrolled users with malicious intent |
|
|
53 | (1) |
|
3.4.3 Number of trials permitted on the ASV |
|
|
54 | (1) |
|
3.4.4 Minuteness of the perturbation in adversarial attacks |
|
|
54 | (1) |
|
3.4.5 Privacy preservation of speech and voice privacy |
|
|
55 | (2) |
|
3.5 Conclusions and future work |
|
|
57 | (10) |
|
|
57 | (1) |
|
|
58 | (9) |
|
4 Voice biometrics: privacy in paralinguistic and extralinguistic tasks for health applications |
|
|
67 | (26) |
|
|
|
|
|
|
67 | (2) |
|
4.2 Paralinguistic and extralinguistic tasks |
|
|
69 | (3) |
|
4.2.1 Speech-affecting diseases |
|
|
70 | (1) |
|
|
71 | (1) |
|
4.3 Cryptographic primitives and MPC for PPML |
|
|
72 | (6) |
|
4.3.1 Homomorphic encryption |
|
|
72 | (1) |
|
|
73 | (1) |
|
4.3.3 Secure Multiparty Computation |
|
|
73 | (1) |
|
4.3.3.1 Yao's GCs protocol |
|
|
74 | (1) |
|
|
75 | (1) |
|
|
76 | (1) |
|
4.3.4 Distance-preserving hashing techniques |
|
|
77 | (1) |
|
|
77 | (1) |
|
|
78 | (1) |
|
4.4 PPML for paralinguistic and extralinguistic tasks |
|
|
78 | (8) |
|
4.4.1 PPML for non-health-related tasks |
|
|
78 | (1) |
|
4.4.2 PPML for health-related tasks |
|
|
79 | (1) |
|
4.4.3 Private SVM+RBF for health-related tasks |
|
|
80 | (1) |
|
4.4.3.1 Private RBF computation |
|
|
81 | (1) |
|
4.4.3.2 Private SVM computation |
|
|
81 | (1) |
|
4.4.3.3 Experimental setup |
|
|
82 | (1) |
|
4.4.3.4 Model training and parameters |
|
|
83 | (1) |
|
4.4.3.5 Private SVM implementation details |
|
|
83 | (1) |
|
4.4.3.6 Classification results |
|
|
84 | (1) |
|
4.4.3.7 Security and computational performance |
|
|
84 | (2) |
|
|
86 | (7) |
|
|
86 | (1) |
|
|
86 | (7) |
|
5 Voice privacy in biometrics: speaker de-identification |
|
|
93 | (28) |
|
|
|
|
|
93 | (2) |
|
5.2 How to evaluate speaker de-identification? |
|
|
95 | (3) |
|
5.2.1 Subjective measures |
|
|
95 | (2) |
|
|
97 | (1) |
|
5.3 Speaker de-identification techniques |
|
|
98 | (4) |
|
|
99 | (1) |
|
5.3.2 Gaussian mixture model |
|
|
100 | (1) |
|
|
101 | (1) |
|
5.3.4 Deep learning techniques |
|
|
102 | (1) |
|
5.4 Experiment definition |
|
|
102 | (6) |
|
5.4.1 Piecewise definition of transformation functions |
|
|
103 | (1) |
|
5.4.2 Pretrained transformation functions |
|
|
104 | (2) |
|
5.4.3 De-identification based on DNNs |
|
|
106 | (1) |
|
5.4.4 De-identification based on generative adversarial networks |
|
|
107 | (1) |
|
|
108 | (3) |
|
|
109 | (2) |
|
|
111 | (2) |
|
|
113 | (8) |
|
|
114 | (1) |
|
|
114 | (7) |
|
6 Performance evaluation of voice biometrics solutions |
|
|
121 | (18) |
|
|
|
|
121 | (2) |
|
6.2 Evaluating methods or technology |
|
|
123 | (8) |
|
6.2.1 Existing benchmarking evaluations |
|
|
124 | (1) |
|
6.2.2 Evaluation criteria |
|
|
125 | (1) |
|
6.2.2.1 Evaluating a system producing hard decisions |
|
|
125 | (2) |
|
6.2.2.2 Evaluating the goodness of verification scores |
|
|
127 | (2) |
|
6.2.3 Statistical significance |
|
|
129 | (1) |
|
6.2.4 Specific evaluation aspects |
|
|
129 | (1) |
|
6.2.5 Evaluating related technologies |
|
|
130 | (1) |
|
|
131 | (1) |
|
6.4 Summary and propositions |
|
|
132 | (7) |
|
|
133 | (6) |
|
7 Voice biometrics: How the technology is standardized |
|
|
139 | (24) |
|
|
|
|
140 | (1) |
|
7.2 Biometrics standardization within ISO/IEC |
|
|
140 | (9) |
|
7.2.1 Generalized system design |
|
|
141 | (2) |
|
7.2.2 Harmonized biometric vocabulary |
|
|
143 | (1) |
|
7.2.3 Performance testing and reporting |
|
|
144 | (2) |
|
7.2.4 Presentation attack detection |
|
|
146 | (1) |
|
7.2.5 Biometric information protection |
|
|
147 | (2) |
|
7.3 Data interchange formats for passports and beyond |
|
|
149 | (5) |
|
7.3.1 Motivation and background on encoding biometric data |
|
|
150 | (1) |
|
7.3.2 Data interchange standard ISO/IEC 19794 |
|
|
151 | (1) |
|
|
152 | (2) |
|
7.3.4 ISO/IEC 19794 Part 13: voice data |
|
|
154 | (1) |
|
7.4 Discussion: de facto and ISO/IEC standards |
|
|
154 | (5) |
|
7.4.1 On the general system design |
|
|
154 | (2) |
|
7.4.2 Gap analysis: performance testing and reporting |
|
|
156 | (2) |
|
7.4.3 Regarding implementations and data interchange formats |
|
|
158 | (1) |
|
|
159 | (4) |
|
|
160 | (1) |
|
|
160 | (3) |
|
8 Voice biometrics: perspective from the industry |
|
|
163 | (23) |
|
|
|
|
|
|
|
8.1 Automated password reset: an example of a commercial application using voice biometrics |
|
|
164 | (5) |
|
|
164 | (1) |
|
|
164 | (1) |
|
8.1.3 System architecture |
|
|
165 | (2) |
|
8.1.4 Voice biometric system |
|
|
167 | (2) |
|
|
169 | (1) |
|
8.2 Testing of commercial voice biometric systems |
|
|
169 | (6) |
|
|
169 | (1) |
|
8.2.1.1 Biometric testing |
|
|
170 | (2) |
|
|
172 | (2) |
|
|
174 | (1) |
|
8.3 Forensic speaker recognition |
|
|
175 | (11) |
|
|
175 | (1) |
|
8.3.2 Forensic speaker recognition and the strength of evidence |
|
|
176 | (1) |
|
8.3.3 The forensic expert's workflow |
|
|
176 | (2) |
|
8.3.4 Technical challenges |
|
|
178 | (2) |
|
8.3.4.1 Improving interpretability of scores |
|
|
180 | (1) |
|
8.3.4.2 Score normalization |
|
|
180 | (1) |
|
8.3.4.3 Score calibration |
|
|
181 | (1) |
|
8.3.4.4 Condition adaptation |
|
|
181 | (1) |
|
8.3.4.5 Dealing with multi-speaker recordings |
|
|
182 | (1) |
|
8.3.5 Training-communication between system developers and end-users |
|
|
182 | (1) |
|
|
182 | (1) |
|
|
183 | (3) |
|
9 Joining forces of voice and facial biometrics: a case study in the scope of NI ST SRE'19 |
|
|
186 | (33) |
|
|
|
Dijana Petrovska Delacretaz |
|
|
9.1 Introduction to the NIST SRE' 19 challenge |
|
|
188 | (2) |
|
9.1.1 The SRE' 19 CTS challenge |
|
|
188 | (1) |
|
9.1.2 The SRE' 19 multimedia challenge |
|
|
188 | (1) |
|
9.1.3 SRE'19 evaluation metrics |
|
|
189 | (1) |
|
9.2 TSP speaker verification system for the SRE' 19 evaluation |
|
|
190 | (7) |
|
9.2.1 A brief review of state of the art in speaker verification |
|
|
190 | (1) |
|
9.2.2 TSP speaker verification common pipeline for the SRE' |
|
|
190 | (1) |
|
CTS and multimedia challenges |
|
|
191 | (1) |
|
|
191 | (1) |
|
|
192 | (1) |
|
9.2.3 TSP speaker verification system for the SRE' 19 CTS challenge |
|
|
192 | (1) |
|
9.2.4 TSP speaker verification system for the SRE' 19 multimedia challenge |
|
|
193 | (1) |
|
9.2.5 Results for TSP speaker verification systems on the SRE' 19 CTS and multimedia challenges |
|
|
194 | (3) |
|
|
197 | (1) |
|
9.3 TSP face recognition system for SRE' 19 |
|
|
197 | (11) |
|
9.3.1 Survey of face recognition systems |
|
|
197 | (3) |
|
9.3.2 TSP face recognition system pipeline |
|
|
200 | (1) |
|
9.3.3 Databases used in the TSP face recognition system |
|
|
201 | (1) |
|
|
202 | (3) |
|
9.3.5 Embedding extractor |
|
|
205 | (1) |
|
Initial version of the DNN architecture |
|
|
205 | (1) |
|
Final version of the DNN architecture |
|
|
205 | (3) |
|
|
208 | (1) |
|
9.4 Audiovisual biometric system for the SRE' 19 multimedia challenge |
|
|
208 | (4) |
|
9.5 Conclusions and perspectives |
|
|
212 | (7) |
|
|
214 | (1) |
|
|
214 | (5) |
|
10 Voice biometrics: future trends and challenges ahead |
|
|
219 | (4) |
|
|
|
|
219 | (1) |
|
10.2 Privacy and security |
|
|
220 | (1) |
|
|
221 | (2) |
References |
|
223 | (4) |
Index |
|
227 | |