"This book covers not only the knowledge of GPU and CUDA programming, but also provides successful real applications in many domains, including signal processing, image processing, physics, and artificial intelligence. The most recent research outcome and the most recent progress of GPU architectures are included, such as multi-GPU programming and GPU clusters. I believe it is a very good reference for GPU and CUDA parallel programming courses as it provides detailed illustration of the architectures of GPU, programming principles of CUDA, CUDA libraries for algebra, and a series of real applications. In addition, it will definitely contribute to the progress of research in CUDA-enabled parallel computing." Professor Ying Liu, School of Computer and Control, University of Chinese Academy of Sciences