Muutke küpsiste eelistusi

E-raamat: Multimodal Behavior Analysis in the Wild: Advances and Challenges

Edited by (Research Scientist, INRIA), Edited by (Full Professor, University of Trento, Italy), Edited by (Researcher at FBK, Assistant Professor at the University of Perugia)
  • Formaat - PDF+DRM
  • Hind: 184,28 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Multimodal Behavioral Analysis in the Wild: Advances and Challenges presents the state-of- the-art in behavioral signal processing using different data modalities, with a special focus on identifying the strengths and limitations of current technologies. The book focuses on audio and video modalities, while also emphasizing emerging modalities, such as accelerometer or proximity data. It covers tasks at different levels of complexity, from low level (speaker detection, sensorimotor links, source separation), through middle level (conversational group detection, addresser and addressee identification), and high level (personality and emotion recognition), providing insights on how to exploit inter-level and intra-level links.

This is a valuable resource on the state-of-the- art and future research challenges of multi-modal behavioral analysis in the wild. It is suitable for researchers and graduate students in the fields of computer vision, audio processing, pattern recognition, machine learning and social signal processing.

  • Gives a comprehensive collection of information on the state-of-the-art, limitations, and challenges associated with extracting behavioral cues from real-world scenarios
  • Presents numerous applications on how different behavioral cues have been successfully extracted from different data sources
  • Provides a wide variety of methodologies used to extract behavioral cues from multi-modal data
List of Contributors xiii
About the Editors xix
Multimodal behavior analysis in the wild: An introduction 1(8)
Xavier Alameda-Pineda
Elisa Ricci
Nicu Sebe
0.1 Analyzing human behavior in the wild from multimodal data
1(2)
0.2 Scope of the book
3(3)
0.3 Summary of important points
6(1)
References
7(2)
Chapter 1 Multimodal open-domain conversations with robotic platforms 9(18)
Kristiina Jokinen
Graham Wilcock
1.1 Introduction
9(5)
1.1.1 Constructive Dialog Model
11(3)
1.2 Open-domain dialogs
14(4)
1.2.1 Topic shifts and topic trees
14(2)
1.2.2 Dialogs using Wikipedia
16(2)
1.3 Multimodal dialogs
18(3)
1.3.1 Multimodal WikiTalk for robots
19(1)
1.3.2 Multimodal topic modeling
20(1)
1.4 Future directions
21(2)
1.4.1 Dialogs using domain ontologies
21(1)
1.4.2 IoT and an integrated robot architecture
22(1)
1.5 Conclusion
23(1)
References
24(3)
Chapter 2 Audio-motor integration for robot audition 27(26)
Antoine Deleforge
Alexander Schmidt
Walter Kellermann
2.1 Introduction
27(2)
2.2 Audio-motor integration in psychophysics and robotics
29(3)
2.3 Single-microphone sound localization using head movements
32(5)
2.3.1 HRTF model and dynamic cues
32(2)
2.3.2 Learning-based sound localization
34(2)
2.3.3 Results
36(1)
2.4 Ego-noise reduction using proprioceptors
37(8)
2.4.1 Ego-noise: challenges and opportunities
37(1)
2.4.2 Proprioceptor-guided dictionary learning
37(2)
2.4.3 Phase-optimized dictionary learning
39(2)
2.4.4 Audio-motor integration via support vector machines
41(3)
2.4.5 Results
44(1)
2.5 Conclusion and perspectives
45(1)
References
46(7)
Chapter 3 Audio source separation into the wild 53(26)
Laurent Girin
Sharon Gannot
Xiaofei Li
3.1 Introduction
53(1)
3.2 Multichannel audio source separation
54(4)
3.3 Making MASS go from labs into the wild
58(10)
3.3.1 Moving sources and sensors
58(3)
3.3.2 Varying number of (active) sources
61(2)
3.3.3 Spatially diffuse sources and long mixing filters
63(4)
3.3.4 Ad hoc microphone arrays
67(1)
3.4 Conclusions and perspectives
68(2)
References
70(9)
Chapter 4 Designing audio-visual tools to support multisensory disabilities 79(24)
Nicoletta Noceti
Luca Giuliani
Joan Sosa-Garcia
Luca Brayda
Andrea Trucco
Francesca Odone
4.1 Introduction
79(3)
4.2 Related works
82(3)
4.3 The Glassense system
85(7)
4.4 Visual recognition module
88(1)
4.4.1 Object-instance recognition
88(1)
4.4.2 Experimental assessment
89(3)
4.5 Complementary hearing aid module
92(3)
4.5.1 Measurement of Glassense beam pattern
92(1)
4.5.2 Analysis of measured beam pattern
93(2)
4.6 Assessing usability with impaired users
95(3)
4.6.1 Glassense field tests with visually impaired
96(1)
4.6.2 Glassense field tests with binaural hearing loss
96(2)
4.7 Conclusion
98(1)
References
99(4)
Chapter 5 Audio-visual learning for body-worn cameras 103(18)
Andrea Cavallaro
Alessio Brutti
5.1 Introduction
103(2)
5.2 Multi-modal classification
105(2)
5.3 Cross-modal adaptation
107(3)
5.4 Audio-visual reidentification
110(1)
5.5 Reidentification dataset
111(1)
5.6 Reidentification results
112(4)
5.7 Closing remarks
116(1)
References
116(5)
Chapter 6 Activity recognition from visual lifelogs: State of the art and future challenges 121(14)
Mariella Dimiccoli
Alejandro Cartas
Petia Radeva
6.1 Introduction
121(2)
6.2 Activity recognition from egocentric images
123(2)
6.3 Activity recognition from egocentric photo-streams
125(2)
6.4 Experimental results
127(4)
6.4.1 Experimental setup
127(1)
6.4.2 Implementation
128(1)
6.4.3 Results and discussion
129(2)
6.5 Conclusion
131(1)
Acknowledgments
132(3)
References l
l32
Chapter 7 Lifelog retrieval for memory stimulation of people with memory impairment 135(24)
Gabriel Oliveira-Barra
Marc Bolanos
Estefania Talavera
Olga Gelonch
Maite Garolera
Petia Radeva
7.1 Introduction
135(3)
7.2 Related work
138(3)
7.3 Retrieval based on key-frame semantic selection
141(8)
7.3.1 Summarization of autobiographical episodes
143(1)
7.3.2 Semantic key-frame selection
144(2)
7.3.3 Egocentric image retrieval based on CNNs and inverted index search
146(3)
7.4 Experiments
149(5)
7.4.1 Dataset
149(1)
7.4.2 Experimental setup
150(1)
7.4.3 Evaluation measures
151(1)
7.4.4 Results
151(2)
7.4.5 Discussions
153(1)
7.5 Conclusions
154(1)
Acknowledgments
154(1)
References
155(4)
Chapter 8 Integrating signals for reasoning about visitors' behavior in cultural heritage 159(12)
Tsvi Kuflik
Eyal Dim
8.1 Introduction
159(2)
8.2 Using technology for reasoning about visitors' behavior
161(5)
8.3 Discussion
166(1)
8.4 Conclusions
167(1)
References
168(3)
Chapter 9 Wearable systems for improving tourist experience 171(28)
Lorenzo Seidenari
Claudio Baecchi
Tiberio Uricchio
Andrea Ferracani
Marco Bertini
Alberto Del Bimbo
9.1 Introduction
171(1)
9.2 Related work
172(4)
9.3 Behavior analysis for smart guides
176(1)
9.4 The indoor system
176(10)
9.5 The outdoor system
186(8)
9.6 Conclusions
194(1)
References
195(4)
Chapter 10 Recognizing social relationships from an egocentric vision perspective 199(26)
Stefano Alletto
Marcella Cornia
Lorenzo Baraldi
Giuseppe Serra
Rita Cucchiara
10.1 Introduction
199(3)
10.2 Related work
202(2)
10.2.1 Head pose estimation
202(1)
10.2.2 Social interactions
203(1)
10.3 Understanding people interactions
204(6)
10.3.1 Face detection and tracking
205(1)
10.3.2 Head pose estimation
205(3)
10.3.3 3D people localization
208(2)
10.4 Social group detection
210(2)
10.4.1 Correlation clustering via structural SVM
210(2)
10.5 Social relevance estimation
212(1)
10.6 Experimental results
213(9)
10.6.1 Head pose estimation
214(1)
10.6.2 Distance estimation
215(1)
10.6.3 Groups estimation
216(4)
10.6.4 Social relevance
220(2)
10.7 Conclusions
222(1)
References
223(2)
Chapter 11 Complex conversational scene analysis using wearable sensors 225(22)
Hayley Hung
Ekin Gedik
Laura Cabrera Quiros
11.1 Introduction
225(2)
11.2 Defining 'in the wild' and ecological validity
227(1)
11.3 Ecological validity vs. experimental control
228(1)
11.4 Ecological validity vs. robust automated perception
229(1)
11.5 Thin vs. thick slices of analysis
230(1)
11.6 Collecting data of social behavior
230(4)
11.6.1 Practical concerns when collecting data during social events
231(3)
11.7 Analyzing social actions with a single body worn accelerometer
234(7)
11.7.1 Feature extraction and classification
235(1)
11.7.2 Performance vs. sample size
236(2)
11.7.3 Transductive parameter transfer (TPT) for personalized models
238(3)
11.7.4 Discussion
241(1)
11.8
Chapter summary
241(1)
References
242(5)
Chapter 12 Detecting conversational groups in images using clustering games 247(22)
Sebastiano Vascon
Marcello Pelillo
12.1 Introduction
247(3)
12.2 Related work
250(1)
12.3 Clustering games
251(4)
12.3.1 Notations and definitions
251(2)
12.3.2 Clustering games
253(2)
12.4 Conversational groups as equilibria of clustering games
255(3)
12.4.1 Frustum of attention
255(2)
12.4.2 Quantifying pairwise interactions
257(1)
12.4.3 The algorithm
258(1)
12.5 Finding ESS-clusters using game dynamics
258(3)
12.6 Experiments and results
261(4)
12.6.1 Datasets
261(1)
12.6.2 Evaluation metrics and parameter exploration
262(1)
12.6.3 Experiments
263(2)
12.7 Conclusions
265(1)
References
265(4)
Chapter 13 We are less free than how we think: Regular patterns in nonverbal communication 269(20)
Alessandro Vinciarelli
Anna Esposito
Mohammad Tayarani
Giorgio Roffo
Filomena Scibelli
Francesco Perrone
Dong-Bach Vo
13.1 Introduction
269(2)
13.2 On spotting cues: how many and when
271(5)
13.2.1 The cues
272(1)
13.2.2 Methodology
273(2)
13.2.3 Results
275(1)
13.3 On following turns: who talks with whom
276(3)
13.3.1 Conflict
277(1)
13.3.2 Methodology
278(1)
13.3.3 Results
278(1)
13.4 On speech dancing: who imitates whom
279(7)
13.4.1 Methodology
279(3)
13.4.2 Results
282(4)
13.5 Conclusions
286(1)
References
287(2)
Chapter 14 Crowd behavior analysis from fixed and moving cameras 289(34)
Pierre Bour
Emile Cribelier
Vasileios Argyriou
14.1 Introduction
289(4)
14.2 Microscopic and macroscopic crowd modeling
293(2)
14.3 Motion information for crowd representation from fixed cameras
295(3)
14.3.1 Pre-processing and selection of areas of interest
295(1)
14.3.2 Motion-based crowd behavior analysis
296(2)
14.4 Crowd behavior and density analysis
298(3)
14.4.1 Person detection and tracking in crowded scenes
299(1)
14.4.2 Low level features for crowd density estimation
300(1)
14.5 CNN-based crowd analysis methods for surveillance and anomaly detection
301(6)
14.6 Crowd analysis using moving sensors
307(3)
14.7 Metrics and datasets
310(4)
14.7.1 Metrics for performance evaluation
310(2)
14.7.2 Datasets for crowd behavior analysis
312(2)
14.8 Conclusions
314(1)
References
315(8)
Chapter 15 Towards multi-modality invariance: A study in visual representation 323(26)
Lingxi Xie
Qi Tian
15.1 Introduction and related work
323(3)
15.2 Variances in visual representation
326(2)
15.3 Reversal invariance in BoVW
328(9)
15.3.1 Reversal symmetry and Max-SIFT
329(1)
15.3.2 RIDE: generalized reversal invariance
330(2)
15.3.3 Application to image classification
332(1)
15.3.4 Experiments
332
15.3.5 Summary
136(201)
15.4 Reversal invariance in CNN
337(7)
15.4.1 Reversal-invariant convolution (RI-Cony)
337(1)
15.4.2 Relationship to data augmentation
338(2)
15.4.3 CIFAR experiments
340(1)
15.4.4 ILSVRC2012 classification experiments
341(2)
15.4.5 Summary
343(1)
15.5 Conclusions
344(1)
References
344(5)
Chapter 16 Sentiment concept embedding for visual affect recognition 349(20)
Victor Campos
Xavier Giro-i-Nieto
Brendan Jou
Jordi Torres
Shih-Fu Chang
16.1 Introduction
149(203)
16.1.1 Embeddings for image classification
350(1)
16.1.2 Affective computing
351(1)
16.2 Visual sentiment ontology
352(1)
16.3 Building output embeddings for ANPs
353(4)
16.3.1 Combining adjectives and nouns
354(2)
16.3.2 Loss functions for the embeddings
356(1)
16.4 Experimental results
357(5)
16.4.1 Adjective noun pair detection
358(3)
16.4.2 Zero-shot concept detection
361(1)
16.5 Visual affect recognition
362(3)
16.5.1 Visual emotion prediction
363(1)
16.5.2 Visual sentiment prediction
364(1)
16.6 Conclusions and future work
365(1)
References
366(3)
Chapter 17 Video-based emotion recognition in the wild 369(18)
Albert Ali Salah
Heysem Kaya
Furkan Gurpinar
17.1 Introduction
369(1)
17.2 Related work
370(4)
17.3 Proposed approach
374(2)
17.4 Experimental results
376(3)
17.4.1 EmotiW Challenge
376(2)
17.4.2 ChaLearn Challenges
378(1)
17.5 Conclusions and discussion
379(3)
Acknowledgments
382(1)
References
382(5)
Chapter 18 Real-world automatic continuous affect recognition from audiovisual signals 387(20)
Panagiotis Tzirakis
Stefanos Zafeiriou
Bjorn Schuller
18.1 Introduction
387(2)
18.2 Real world vs laboratory settings
389(1)
18.3 Audio and video affect cues and theories of emotion
389(3)
18.3.1 Audio signals
389(1)
18.3.2 Visual signals
390(1)
18.3.3 Quantifying affect
391(1)
18.4 Affective computing
392(5)
18.4.1 Multimodal fusion techniques
392(1)
18.4.2 Related work
393(1)
18.4.3 Databases
394(2)
18.4.4 Affect recognition competitions
396(1)
18.5 Audiovisual affect recognition: a representative end-to-end learning system
397(5)
18.5.1 Proposed model
398(2)
18.5.2 Experiments & results
400(2)
18.6 Conclusions
402(1)
References
403(4)
Chapter 19 Affective facial computing: Generalizability across domains 407(36)
Jeffrey F. Cohn
Itir Onal Ertugrul
Wen-Sheng Chu
Jeffrey M. Girard
Laszlo A. Jeni
Zakia Hammal
19.1 Introduction
407(2)
19.2 Overview of AFC
409(1)
19.3 Approaches to annotation
410(1)
19.4 Reliability and performance
411(2)
19.5 Factors influencing performance
413(2)
19.6 Systematic review of studies of cross-domain generalizability
415(14)
19.6.1 Study selection
416(1)
19.6.2 Databases
416(3)
19.6.3 Cross-domain generalizability
419(8)
19.6.4 Studies using deep- vs. shallow learning
427(1)
19.6.5 Discussion
428(1)
19.7 New directions
429(4)
19.8 Summary
433(1)
Acknowledgments
434(1)
References
434(9)
Chapter 20 Automatic recognition of self-reported and perceived emotions 443(28)
Biqiao Zhang
Emily Mower Provost
20.1 Introduction
443(2)
20.2 Emotion production and perception
445(3)
20.2.1 Descriptions of emotion
445(1)
20.2.2 Brunswik's functional lens model
446(2)
20.2.3 Appraisal theory
448(1)
20.3 Observations from perception experiments
448(2)
20.4 Collection and annotation of labeled emotion data
450(3)
20.4.1 Emotion-elicitation methods
450(2)
20.4.2 Data annotation tools
452(1)
20.5 Emotion datasets
453(4)
20.5.1 Text datasets
453(2)
20.5.2 Audio, visual, physiological, and multi-modal datasets
455(2)
20.6 Recognition of self-reported and perceived emotion
457(3)
20.7 Challenges and prospects
460(3)
20.8 Concluding remarks 463
Acknowledgments
463(1)
References
463(8)
Index 471
Xavier Alameda-Pineda received his PhD from INRIA and University of Grenoble in2013. He was a post-doctoral researcher at CNRS/GIPSA-Lab and at the University of Trento, in the deep relational learning group. He is a research scientist at INRIA working on signal processing and machine learning for scene and behavior understanding using multimodal data. He is the winner of the best paper award of ACM MM 2015, the best student paper award at IEEE WASPAA 2015 and the best scientific paper award on image, speech, signal and video processing at IEEE ICPR 2016. He is member of IEEE and of ACM SIGMM. Elisa Ricci is a researcher at FBK and an assistant professor at University of Perugia. She received her PhD from the University of Perugia in 2008. She has since been a postdoctoral researcher at Idiap and FBK, Trento and a visiting researcher at University of Bristol. Her research interests are directed along developing machine learning algorithms for video scene analysis, human behaviour understanding and multimedia content analysis. She is area chair of ACM MM 2016 and of ECCV 2016. She received the IBM Best Student Paper Award at ICPR 2014. Nicu Sebe is a full professor at the University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human behavior understanding. He was a general co-chair of FG 2008 and ACM MM 2013, and a program chair of CIVR 2007 and 2010, of ACM MM 2007 and 2011, and of ECCV 2016. He is a program chair of ICCV 2017 and of ICPR 2020, and a general chair of ICMR 2017. He is a senior member of IEEE and ACM and a fellow of IAPR.