What does it take to move from recognizing static objects in images to truly understanding dynamic human behavior in complex scenes? This book provides the answer by introducing a groundbreaking framework that fuses image engineering with spatiotemporal behavior understanding (STBU), offering an in-depth exploration of how action, interaction, and context converge in real-world image analysis.
Positioned at the intersection of image understanding, neural networks, and behavioral modeling, this volume equips researchers and engineers with the principles, methods, and architectures needed to analyze and interpret dynamic visual data. It guides readers through each stage of the pipelinefrom interest point detection and trajectory learning to action classification, activity modeling, and human-object interaction analysisculminating in advanced topics such as abnormal event detection and graph-based neural modeling. Throughout, the book introduces deep learning strategies for action and behavior recognition, high-order modeling techniques that integrate motion, posture, and context, and transformer-based approaches for human-object interaction. It also addresses practical challenges such as differential explosion and adapting recognition models to varied scene content.
This book is essential reading for graduate students, researchers, and practitioners in computer vision, artificial intelligence, and robotics who seek a comprehensive yet accessible guide to high-level image understanding. A working knowledge of machine learning and basic computer vision concepts is recommended for full benefit. Whether you're advancing academic research or building real-world intelligent systems, this volume provides both the theoretical insight and applied techniques to push the frontier of spatiotemporal image understanding.
"Chapter 1.Introduction".- "Chapter 2.Spatiotemporal Points".- "Chapter
3.Spatiotemporal Trajectory".- "Chapter 4.Action Classification and
Recognition".- "Chapter 5.Activity Modeling and Recognition".- "Chapter
6.Detection of Human-Object Interaction Activity".- "Chapter 7.Behavior
Recognition Networks".- "Chapter 8.Abnormal Event Detection".
Yu-Jin Zhang is a Professor of Image Engineering at Tsinghua University, Beijing, where he has been on faculty since 1993. He received his Ph.D. in Applied Science from the State University of Liège, Belgium, in 1989, followed by postdoctoral research at Delft University of Technology in the Netherlands. With over three decades of experience in image processing,Image Analysis, image understanding, and visual information retrieval, he is widely recognized as a leading scholar in the field.
Professor Zhang has authored more than 550 peer-reviewed research papers and published over 20 academic books, including Handbook of Image Engineering (Springer, 2021), A Selection of Image Processing Techniques (CRC Press, 2022),A Selection of Image Analysis Techniques (CRC Press, 2023),A Selection of Image Understanding Techniques (CRC Press, 2023), and 3-D Computer Vision: Foundations and Advanced Methodologies (Springer, 2024). His work has significantly shaped the landscape of image engineering (image processing, analysis and understanding).
In addition to his research, he has published more than 40 textbooks in Image engineering, developed and taught more than ten specialized courses at Tsinghua University and abroad, inspiring generations of students and practitioners. He is a Fellow of SPIE and the China Society of Image and Graphics (CSIG), and he served as Program Chair of ICIP 2017.