Muutke küpsiste eelistusi

Parallel Programming for Modern High Performance Computing Systems [Kõva köide]

  • Formaat: Hardback, 304 pages, kõrgus x laius: 234x156 mm, kaal: 614 g, 20 Tables, black and white; 64 Illustrations, black and white
  • Ilmumisaeg: 28-Feb-2018
  • Kirjastus: CRC Press
  • ISBN-10: 1138305952
  • ISBN-13: 9781138305953
  • Formaat: Hardback, 304 pages, kõrgus x laius: 234x156 mm, kaal: 614 g, 20 Tables, black and white; 64 Illustrations, black and white
  • Ilmumisaeg: 28-Feb-2018
  • Kirjastus: CRC Press
  • ISBN-10: 1138305952
  • ISBN-13: 9781138305953
In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems.

It first discusses selected and popular state-of-the-art computing devices and systems available today, These include multicore CPUs, manycore (co)processors, such as Intel Xeon Phi, accelerators, such as GPUs, and clusters, as well as programming models supported on these platforms.

It next introduces parallelization through important programming paradigms, such as master-slave, geometric Single Program Multiple Data (SPMD) and divide-and-conquer.

The practical and useful elements of the most popular and important APIs for programming parallel HPC systems are discussed, including MPI, OpenMP, Pthreads, CUDA, OpenCL, and OpenACC. It also demonstrates, through selected code listings, how selected APIs can be used to implement important programming paradigms. Furthermore, it shows how the codes can be compiled and executed in a Linux environment.

The book also presents hybrid codes that integrate selected APIs for potentially multi-level parallelization and utilization of heterogeneous resources, and it shows how to use modern elements of these APIs. Selected optimization techniques are also included, such as overlapping communication and computations implemented using various APIs.

Features:











Discusses the popular and currently available computing devices and cluster systems





Includes typical paradigms used in parallel programs





Explores popular APIs for programming parallel applications





Provides code templates that can be used for implementation of paradigms





Provides hybrid code examples allowing multi-level parallelization





Covers the optimization of parallel programs
List of figures
xiii
List of tables
xvii
List of listings
xix
Preface xxiii
Chapter 1 Understanding the need for parallel computing
1(10)
1.1 Introduction
1(1)
1.2 From Problem to Parallel Solution - Development Steps
2(2)
1.3 Approaches to Parallelization
4(2)
1.4 Selected Use Cases With Popular Apis
6(1)
1.5 Outline of the Book
7(4)
Chapter 2 Overview of selected parallel and distributed systems for high performance computing
11(18)
2.1 Generic Taxonomy of Parallel Computing Systems
11(1)
2.2 Multicore Cpus
12(2)
2.3 Gpus
14(3)
2.4 Manycore Cpus/Coprocessors
17(2)
2.5 Cluster Systems
19(1)
2.6 Growth of High Performance Computing Systems and Relevant Metrics
20(2)
2.7 Volunteer-Based Systems
22(3)
2.8 Grid Systems
25(4)
Chapter 3 Typical paradigms for parallel applications
29(40)
3.1 Aspects of Parallelization
30(5)
3.1.1 Data partitioning and granularity
30(2)
3.1.2 Communication
32(1)
3.1.3 Data allocation
32(1)
3.1.4 Load balancing
33(1)
3.1.5 HPC related metrics
34(1)
3.2 Master-Slave
35(4)
3.3 Geometric Spmd
39(16)
3.4 Pipelining
55(1)
3.5 Divide-and-Conquer
56(13)
Chapter 4 Selected APIs for parallel programming
69(116)
4.1 Message Passing Interface (MPI)
74(28)
4.1.1 Programming model and application structure
74(1)
4.1.2 The world of MPI processes and threads
75(1)
4.1.3 Initializing and finalizing usage of MPI
75(1)
4.1.4 Communication modes
76(1)
4.1.5 Basic point-to-point communication routines
76(2)
4.1.6 Basic MPI collective communication routines
78(5)
4.1.7 Packing buffers and creating custom data types
83(2)
4.1.8 Receiving a message with wildcards
85(1)
4.1.9 Receiving a message with unknown data size
86(1)
4.1.10 Various send modes
87(1)
4.1.11 Non-blocking communication
88(2)
4.1.12 One-sided MPI API
90(5)
4.1.13 A sample MPI application
95(2)
4.1.14 Multithreading in MPI
97(2)
4.1.15 Dynamic creation of processes in MPI
99(2)
4.1.16 Parallel MPI I/O
101(1)
4.2 Openmp
102(16)
4.2.1 Programming model and application structure
102(2)
4.2.2 Commonly used directives and functions
104(5)
4.2.3 The number of threads in a parallel region
109(1)
4.2.4 Synchronization of threads within a parallel region and single thread execution
109(2)
4.2.5 Important environment variables
111(1)
4.2.6 A sample OpenMP application
112(3)
4.2.7 Selected SIMD directives
115(1)
4.2.8 Device offload instructions
115(2)
4.2.9 Tasking in OpenMP
117(1)
4.3 Pthreads
118(9)
4.3.1 Programming model and application structure
118(3)
4.3.2 Mutual exclusion
121(2)
4.3.3 Using condition variables
123(1)
4.3.4 Barrier
124(1)
4.3.5 Synchronization
125(1)
4.3.6 A sample Pthreads application
125(2)
4.4 Cuda
127(20)
4.4.1 Programming model and application structure
127(4)
4.4.2 Scheduling and synchronization
131(3)
4.4.3 Constraints
134(1)
4.4.4 A sample CUDA application
134(3)
4.4.5 Streams and asynchronous operations
137(4)
4.4.6 Dynamic parallelism
141(2)
4.4.7 Unified Memory in CUDA
143(2)
4.4.8 Management of GPU devices
145(2)
4.5 Opencl
147(20)
4.5.1 Programming model and application structure
147(8)
4.5.2 Coordinates and Indexing
155(1)
4.5.3 Queuing data reads/writes and kernel execution
156(1)
4.5.4 Synchronization functions
157(1)
4.5.5 A sample OpenCL application
158(9)
4.6 Openacc
167(5)
4.6.1 Programming model and application structure
167(1)
4.6.2 Common directives
168(1)
4.6.3 Data management
169(2)
4.6.4 A sample OpenACC application
171(1)
4.6.5 Asynchronous processing and synchronization
171(1)
4.6.6 Device management
172(1)
4.7 Selected Hybrid Approaches
172(13)
4.7.1 MPI+Pthreads
173(4)
4.7.2 MPI+OpenMP
177(3)
4.7.3 MPI+CUDA
180(5)
Chapter 5 Programming parallel paradigms using selected APIs
185(66)
5.1 Master-Slave
185(33)
5.1.1 MPI
186(4)
5.1.2 OpenMP
190(7)
5.1.3 MPI+OpenMP
197(2)
5.1.4 MPI+Pthreads
199(8)
5.1.5 CUDA
207(6)
5.1.6 OpenMP+CUDA
213(5)
5.2 Geometric Spmd
218(11)
5.2.1 MPI
218(2)
5.2.2 MPI+OpenMP
220(5)
5.2.3 OpenMP
225(1)
5.2.4 MPI+CUDA
225(4)
5.3 Divide-and-Conquer
229(22)
5.3.1 OpenMP
229(3)
5.3.2 CUDA
232(3)
5.3.3 MPI
235(1)
5.3.3.1 Balanced version
236(4)
5.3.3.2 Version with dynamic process creation
240(11)
Chapter 6 Optimization techniques and best practices for parallel codes
251(22)
6.1 Data Prefetching, Communication and Computations Overlapping and Increasing Computation Efficiency
252(5)
6.1.1 MPI
253(3)
6.1.2 CUDA
256(1)
6.2 Data Granularity
257(1)
6.3 Minimization of Overheads
258(2)
6.3.1 Initialization and synchronization overheads
258(2)
6.3.2 Load balancing vs cost of synchronization
260(1)
6.4 Process/Thread Affinity
260(1)
6.5 Data Types and Accuracy
261(1)
6.6 Data Organization and Arrangement
261(1)
6.7 Checkpointing
262(2)
6.8 Simulation of Parallel Application Execution
264(1)
6.9 Best Practices and Typical Optimizations
265(8)
6.9.1 GPUs/CUDA
265(1)
6.9.2 Intel Xeon Phi
266(3)
6.9.3 Clusters
269(1)
6.9.4 Hybrid systems
270(3)
Appendix A Resources
273(2)
A.1 Software Packages
273(2)
Appendix B Further reading
275(22)
B.1 Context of This Book
275(1)
B.2 Other Resources on Parallel Programming
275(22)
Index 297
Pawel Czanul