This book offers a concise yet comprehensive exploration of embodied artificial intelligence (AI) and its integration with swarm manipulation, navigation, and tracking tasks. It uniquely bridges the gap in the existing literature by providing a thorough review of swarm-embodied AI, focusing on collaborative perception and decision-making methods. Its standout features include a systematic approach, detailed discussions on advanced directions, and a practical case study on multi-robot multi-target tracking.
It commences by examining the three key elements of embodied AI: multi-sensor fusion, embodied perception, and embodied decision-making. It reviews existing works that independently optimize each of these elements. Subsequently, the book delves into swarm-embodied AI, encompassing swarm-embodied collaborative perception, collaborative decision-making, and future research directions. Specifically, it explores how swarm intelligence enhances the scalability and generalizability of embodied AI, and conversely, how embodied AI augments swarm intelligence by adapting learning models to diverse tasks and environments. Finally, the book presents a case study of multi-robot multi-target tracking, providing a practical demonstration of all algorithms discussed within. Readers can follow this case study step by step to gain a deeper understanding of the advancements and potential challenges of swarm-embodied AI.
Designed with accessibility in mind, this book caters to a wide audience, including researchers, students, and practitioners seeking insights into this rapidly evolving field. Its user-friendly format ensures ease of understanding without requiring specialized prior knowledge. By distilling complex concepts and highlighting practical applications, the book serves as an invaluable resource for anyone interested in the intersection of embodied AI and swarm intelligence.
Part I Autonomous Embodied Intelligence.
Chapter
1. Embodied AI.-
Chapter
2. Autonomous Embodied AI.
Chapter
3. Autonomous Swarm Embodied AI.-
Part II Self-evolving Intelligence.
Chapter
4. Self-evolving Embodied
Intelligence.
Chapter
5. Self-evolvingWorld Model.
Chapter
6. Self-evolving
Agent.
Xin Wang is an associate professor at Tsinghua University, with a Ph.D. from Zhejiang University and Simon Fraser University. His research focuses on multimedia intelligence and machine learning. Dr. Wang has published over 200 high-quality research papers prestigious conferences including ICML, NeurIPS, IEEE TPAMI, IEEE TKDE, ACM KDD, WWW, ACM SIGIR, ACM Multimedia, etc., winning three best paper awards such as ACM Multimedia Asia in 2023 and IEEE ICME best paper runner up in 2025. He serves as Associate Editor for IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology. He was honored with the ACM China Rising Star Award, IEEE TCMC Rising Star Award and DAMO Academy Young Fellow. He has co-authored several books, including Automated Machine Learning and Meta-Learning for Multimedia and Visual Question Answering, published by Springer in 2021 and 2022.
Tongtong Feng is a postdoctoral fellow at the Department of Computer Science and Technology, Tsinghua University. He got his Ph.D. degree in Computer Science and Technology from Beijing University of Posts and Telecommunications. His research interests include Autonomous Embodied AI, Self-evolving Agent, and Multimedia Intelligence. He has published over 20 high-quality research papers in top journals and conferences, including IEEE TMM, ESWA, ACM Multimedia, and AAAI, etc. He got the Best Paper Nomination of ACM Multimedia 2024.
Huaping Liu is a professor at Tsinghua University, earned his Ph.D. degree from Tsinghua University in 2004. Dr. Liu specializes in embodied intelligence, particularly robotic perception and control. Hes a National Science Fund recipient and senior editor of the International Journal of Robotics Research. Dr. Liu has co-authored several books, including Robotic Tactile Perception and Understanding and Wearable Technology for Robotic Manipulation and Learning, published by springer in 2018 and 2020.
Wenwu Zhu is a professor at Tsinghua University, obtained his Ph.D. degree from New York University in 1996. Previously, he held positions as a research manager at Microsoft Research Asia, chief scientist and director at Intel Research China, and member of Technical Staff at Bell Labs, New Jersey. His research focuses on data-driven multimedia networking and multimedia intelligence, resulting in over 400 referred papers and over 100 patents. He has received numerous awards, including ACM SIGMM Technical Achievement Award in 2023, IEEE Circuits and Systems Society Charles A. Desoer Technical Achievement Award in 2024 and 12 Best Paper Awards, such as ACM Multimedia in 2012 and IEEE Transactions on Circuits and Systems for Video Technology in 2001 and 2019. Dr. Zhu has served as Editor-in-Chief for IEEE Transactions on Multimedia (20172019) and IEEE Transactions on Circuits and Systems for Video Technology (20242025) and on steering committees for IEEE Transactions on Multimedia (20152016) and IEEE Transactions on Mobile Computing (20072010). He has also chaired major conferences including ACM Multimedia 2018 and ACM CIKM 2019. He is recognized as an AAAS Fellow, ACM Fellow, IEEE Fellow, SPIE Fellow, and a member of The Academy of Europe (Academia Europaea). Dr. Zhu has co-authored several books, including Automated Machine Learning and Meta-Learning for Multimedia and Visual Question Answering, published by Springer in 2021 and 2022.