This book provides a broad and comprehensive overview of the existing technical approaches in the area of silent speech interfaces (SSI), both in theory and in application. Each technique is described in the context of the human speech production process, allowing the reader to clearly understand the principles behind SSI in general and across different methods. Additionally, the book explores the combined use of different data sources, collected from various sensors, in order to tackle the limitations of simpler SSI approaches, addressing current challenges of this field. The book also provides information about existing SSI applications, resources and a simple tutorial on how to build an SSI.
1 Introduction.- 2 SSI Modalities I Behind the scenes From the brain to the muscles.- 3 SSI Modalities II Articulation and its consequences.- 4 Combining Modalities Multimodal SSI.- 5 Application Examples.- 6 Conclusions.
|
|
1 | (14) |
|
|
1 | (3) |
|
1.2 A Speech Production Primer for SSI |
|
|
4 | (2) |
|
1.3 Current SSI Modalities: An Overview |
|
|
6 | (3) |
|
|
9 | (1) |
|
1.5 Main Challenges in SSI |
|
|
10 | (1) |
|
|
11 | (4) |
|
|
12 | (3) |
|
2 SSI Modalities I: Behind the Scenes---From the Brain to the Muscles |
|
|
15 | (16) |
|
2.1 Brain Activity and Silent Speech Interfaces |
|
|
16 | (4) |
|
2.1.1 Mapping Speech-Related Brain Activity |
|
|
16 | (1) |
|
2.1.2 Measuring Brain Activity |
|
|
17 | (1) |
|
2.1.3 Electroencephalographic Sensors |
|
|
18 | (1) |
|
2.1.4 Electrocorticographic Electrodes |
|
|
19 | (1) |
|
2.2 Muscular Activity and Silent Speech Interfaces |
|
|
20 | (6) |
|
2.2.1 Muscles in Speech Production |
|
|
20 | (2) |
|
2.2.2 Measuring Electrical Muscular Activity |
|
|
22 | (1) |
|
2.2.3 Surface Electromyography |
|
|
23 | (3) |
|
|
26 | (5) |
|
|
27 | (4) |
|
3 SSI Modalities II: Articulation and Its Consequences |
|
|
31 | (20) |
|
3.1 Measuring Non-visible Articulators and Changes in the Vocal Tract |
|
|
32 | (4) |
|
3.1.1 Electromagnetic and Permanent Magnetic Articulography |
|
|
32 | (2) |
|
3.1.2 Vocal Tract Imaging |
|
|
34 | (1) |
|
3.1.3 Ultrasound and Speech Articulation |
|
|
34 | (1) |
|
3.1.4 Ultrasound and Silent Speech Interfaces |
|
|
35 | (1) |
|
3.2 Measuring Visible Articulators and Visible Effects of Articulation |
|
|
36 | (6) |
|
3.2.1 Visual Speech Recognition Using RGB Information |
|
|
36 | (1) |
|
|
37 | (1) |
|
3.2.3 Local Feature Descriptors |
|
|
38 | (1) |
|
3.2.4 Ultrasonic Doppler Sensing |
|
|
39 | (1) |
|
3.2.5 Ultrasonic Doppler Uses in Silent Speech Interfaces |
|
|
40 | (2) |
|
3.3 Measuring Other Effects |
|
|
42 | (2) |
|
3.3.1 Non-audible Murmur Microphones |
|
|
43 | (1) |
|
3.3.2 Other Electromagnetic and Vibration Sensors |
|
|
44 | (1) |
|
|
44 | (7) |
|
|
45 | (6) |
|
4 Combining Modalities: Multimodal SSI |
|
|
51 | (22) |
|
4.1 Silent Speech Interfaces Using Multiple Modalities |
|
|
52 | (1) |
|
4.2 Challenges: Which Modalities and How to Combine Them |
|
|
53 | (1) |
|
4.3 A Framework for Multimodal SSI |
|
|
54 | (14) |
|
4.3.1 Data Collection Method and General Setup for Acquisition |
|
|
56 | (5) |
|
|
61 | (2) |
|
4.3.3 Data Processing, Feature Extraction, and Feature Selection |
|
|
63 | (5) |
|
|
68 | (5) |
|
|
69 | (4) |
|
|
73 | (20) |
|
5.1 Basic Tutorial: How to Build a Simple Video-Based SSI Recognizer |
|
|
74 | (7) |
|
5.1.1 Requirements and Setup |
|
|
75 | (1) |
|
5.1.2 Overall Presentation of the Pipeline |
|
|
75 | (1) |
|
5.1.3 Step 1: Database Processing |
|
|
76 | (1) |
|
5.1.4 Step 2: Feature Extraction |
|
|
77 | (1) |
|
5.1.5 Step 3: Train the SSI Recognizer |
|
|
78 | (1) |
|
5.1.6 Step 4: Test the SSI Recognizer |
|
|
79 | (1) |
|
5.1.7 Experimenting with Other Modalities |
|
|
79 | (2) |
|
5.1.8 EMG-Based Recognizer |
|
|
81 | (1) |
|
5.1.9 Other Experimentations |
|
|
81 | (1) |
|
5.2 A More Elaborate Example: Assessing the Applicability of SEMG to Tongue Gesture Detection |
|
|
81 | (5) |
|
|
82 | (1) |
|
|
83 | (3) |
|
|
86 | (1) |
|
5.4 An SSI System for a Real-World Scenario |
|
|
86 | (4) |
|
5.4.1 Overall System Architecture |
|
|
87 | (1) |
|
5.4.2 A Possible Implementation |
|
|
87 | (3) |
|
|
90 | (3) |
|
|
91 | (2) |
|
|
93 | (8) |
|
|
94 | (1) |
|
|
94 | (7) |
|
|
97 | (4) |
Index |
|
101 | |
João Freitas holds a Degree in Computer Engineering (2007) from the Polytechnic Institute of Lisbon, Portugal and a PhD in Computer Science (2015) from the Universities of Minho, Porto and Aveiro. He works at the Microsoft Language Development Center (Lisbon, Portugal) as a Researcher and Lead Software Engineer since 2006, where he has participated in several R&D national and international projects in the areas of Speech Recognition and Synthesis, Crowd-Sourcing Data Collections and Ambient Assisted Living. During this time he has also acquired experience in software development, quality assurance and data analysis. His main research activity is in the area of Silent Speech Interfaces in which he co- authored several international peer reviewed publications (book chapters, international journals and proceedings of international conferences). He also has interests in other areas of research such as Human-Computer Interaction and Computer Vision.
António Teixeira, Associate Professor (with tenure) at University of Aveiro, Portugal, PhD in Speech Synthesis and Master in Electronics and Telecommunications. Chair of the Special Interest Group on Iberian Languages (SIG-IL) of the International Speech Communication Association (ISCA). Director of the Master in Speech and Hearing Sciences at University of Aveiro since 2004. Senior researcher at IEETA. His main research activity is in the areas of Multimodal Human-Machine Interaction and Speech Processing. Large experience in spoken language interaction; architectures, new applications and services for AAL; speech technologies and information extraction applications. He is/was involved in several national and European research projects, being project responsible or co-responsible of University of Aveiro, participation in Heron II, Living Usability Lab (LUL), AAL4ALL, Smartphones for Seniors, AAL PaeLIFE and Marie Curie IAPP IRIS; research since 1998 in spoken interaction with robots, including participation in project CARL: Communication, Action, Reasoning and Learning in Robotics; has supervised 8 concluded PhDs (including João Freitas PhD defended in 2015) and 50 Masters; and co-authored more than two hundred publications (books, book chapters, international journals and proceedings of international conferences).
Miguel Sales Dias holds a bachelor (1985) and a master (1988) in Electrical and Computer Engineering (IST-UTL, Portugal) and a PhD in Computer Graphics and Multimedia (1998) from ISCTE-IUL where he was an Associated Professor until 2005, holding currently an Invited Associated Professor position, teaching and conducting research in Computer Graphics, Virtual and Augmented reality, Ambient Assisted Living and Multimodal Human-Computer Interaction. Since November 2005, he is the Director of the first European R&D Centre in Speech and Natural User Interaction Technologies of Microsoft Corporation in Portugal (Microsoft Language Development Center, MLDC). He is regularly commissioned by the European Commission for R&D project evaluations and reviews. Author of 1 patent, author, co-author or editor of 11 scientific books or journal editions, 12 indexed papers in international journals, 26 chapters in indexed international books, 144 other publications, workshops or keynotes in international conferences. Since 1992 he has participated or participates in 33 International R&D projects (ESPRIT, RACE, ACTS, TELEMATICS, TEN-IBC, EUREKA, INTERREG, FP5 IST-IPS, FP6 IST, ESA, Marie Curie, AAL, ACP), and 15 National (FCT, QREN, NITEC, POSC, POCTI, POSI, ICPME, TIT). He has supervised 5 concluded Phds. He obtained 5 scientific prizes. He is a member of ACM SIGGRAPH, Eurographics, ISCA and IEEE; editorial boards of several journal; several Program Committees of National and International conferences in Computer Graphics, Virtual and Augmented Reality, Speech technologies, Accessibility and Ambient Assisted Living. He was President of ADETTI, an ISCTE-IUL associated R&D research center. He was Vice-president and Secretary of the Portuguese Group of Computer Graphics, Eurographics Portuguese Chapter.
Samuel Silva graduated in Electronics and Telecommunications Engineering in 2003, completed an MSc (pre-Bologna) in the same area in 2007 and obtained his PhD in Informatics Engineering in 2012, all by University of Aveiro. His main research interests include medical imaging processing and analysis, data and information visualization, and human computer interaction, areas in which he (co-) authored eleven articles in international peer reviewed journals and more than 50 papers in international peer reviewed conferences. As a researcher, at the Institute of Electronics and Informatics Engineering of Aveiro (Portugal), S. Silva has been actively involved in several national and international R&D projects spanning theareas of Ambient Assisted Living, Multimodal Interaction and Speech Production.