Series Foreword |
|
xiii | |
Foreword |
|
xv | |
Preface |
|
xvii | |
|
|
|
|
5 | (16) |
|
1.1 Fundamental Concepts of Parallel Computing |
|
|
5 | (2) |
|
1.2 The Rise of Concurrency |
|
|
7 | (2) |
|
|
9 | (7) |
|
1.3.1 Multiprocessor Systems |
|
|
9 | (4) |
|
1.3.2 Graphics Processing Units (GPU) |
|
|
13 | (2) |
|
1.3.3 Distributed Memory Clusters |
|
|
15 | (1) |
|
1.4 Parallel Software for Multiprocessor Computers |
|
|
16 | (5) |
|
2 The Language of Performance |
|
|
21 | (14) |
|
2.1 The Basics: FLOPS, Speedup, and Parallel Efficiency |
|
|
21 | (2) |
|
|
23 | (2) |
|
|
25 | (2) |
|
2.4 Strong Scaling vs. Weak Scaling |
|
|
27 | (2) |
|
|
29 | (2) |
|
2.6 Understanding Hardware with the Roofline Model |
|
|
31 | (4) |
|
|
35 | (12) |
|
|
35 | (3) |
|
|
38 | (1) |
|
3.3 Major Components of OpenMP |
|
|
38 | (9) |
|
II The OpenMP Common Core |
|
|
|
4 Threads and the OpenMP Programming Model |
|
|
47 | (28) |
|
|
47 | (1) |
|
4.2 The Structure of OpenMP Programs |
|
|
47 | (3) |
|
4.3 Threads and the Fork Join Pattern |
|
|
50 | (6) |
|
|
56 | (16) |
|
4.4.1 The SPMD Design Pattern |
|
|
58 | (5) |
|
|
63 | (4) |
|
|
67 | (1) |
|
|
67 | (2) |
|
|
69 | (3) |
|
|
72 | (3) |
|
|
75 | (24) |
|
5.1 Worksharing-Loop Construct |
|
|
76 | (3) |
|
5.2 Combined Parallel Worksharing-Loop Construct |
|
|
79 | (1) |
|
|
79 | (4) |
|
|
83 | (7) |
|
5.4.1 The Static Schedule |
|
|
83 | (1) |
|
5.4.2 The Dynamic Schedule |
|
|
84 | (2) |
|
5.4.3 Choosing a Schedule |
|
|
86 | (4) |
|
5.5 Implicit Barriers and the Nowait Clause |
|
|
90 | (2) |
|
5.6 Pi Program with Parallel Loop Worksharing |
|
|
92 | (2) |
|
5.7 A Loop-Level Parallelism Strategy |
|
|
94 | (2) |
|
|
96 | (3) |
|
6 OpenMP Data Environment |
|
|
99 | (22) |
|
6.1 Default Storage Attributes |
|
|
100 | (2) |
|
6.2 Modifying Storage Attributes |
|
|
102 | (7) |
|
|
103 | (2) |
|
|
105 | (2) |
|
6.2.3 The Firstprivate Clause |
|
|
107 | (1) |
|
|
108 | (1) |
|
6.3 Data Environment Examples |
|
|
109 | (7) |
|
|
109 | (2) |
|
6.3.2 Mandelbrot Set Area |
|
|
111 | (4) |
|
6.3.3 Pi Loop Example Revisited |
|
|
115 | (1) |
|
|
116 | (3) |
|
|
119 | (2) |
|
|
121 | (24) |
|
|
121 | (4) |
|
|
125 | (1) |
|
7.3 Our First Example: Schrodinger's Program |
|
|
125 | (3) |
|
|
128 | (2) |
|
|
130 | (2) |
|
7.5.1 When Do Tasks Complete? |
|
|
131 | (1) |
|
7.6 Task Data Environment |
|
|
132 | (3) |
|
7.6.1 Default Data Scoping for Tasks |
|
|
132 | (2) |
|
7.6.2 Linked List Program Revisited with Tasks |
|
|
134 | (1) |
|
7.7 Fundamental Design Patterns with Tasks |
|
|
135 | (8) |
|
7.7.1 Divide and Conquer Pattern |
|
|
137 | (6) |
|
|
143 | (2) |
|
|
145 | (14) |
|
8.1 Memory Hierarchies Revisited |
|
|
146 | (3) |
|
8.2 The OpenMP Common Core Memory Model |
|
|
149 | (3) |
|
8.3 Working with Shared Memory |
|
|
152 | (4) |
|
|
156 | (3) |
|
|
159 | (16) |
|
|
160 | (1) |
|
9.2 Worksharing Constructs |
|
|
161 | (1) |
|
9.3 Parallel Worksharing-Loop Combined Construct |
|
|
162 | (1) |
|
|
163 | (1) |
|
9.5 Synchronization and Memory Consistency Models |
|
|
164 | (2) |
|
9.6 Data Environment Clauses |
|
|
166 | (1) |
|
|
167 | (1) |
|
9.8 Environment Variables and Runtime Library Routines |
|
|
168 | (7) |
|
III Beyond the Common Core |
|
|
|
10 Multithreading beyond the Common Core |
|
|
175 | (28) |
|
10.1 Additional Clauses for OpenMP Common Core Constructs |
|
|
175 | (15) |
|
10.1.1 The Parallel Construct |
|
|
176 | (2) |
|
10.1.2 The Worksharing-Loop Construct |
|
|
178 | (8) |
|
10.1.3 The Task Construct |
|
|
186 | (4) |
|
10.2 Multithreading Functionality Missing from the Common Core |
|
|
190 | (11) |
|
|
191 | (4) |
|
|
195 | (1) |
|
|
196 | (1) |
|
|
197 | (2) |
|
10.2.5 Runtime Library Routines |
|
|
199 | (1) |
|
10.2.5.1 omp_get_max_threads |
|
|
199 | (1) |
|
|
199 | (1) |
|
|
200 | (1) |
|
|
201 | (2) |
|
11 Synchronization and the OpenMP Memory Model |
|
|
203 | (22) |
|
11.1 Memory Consistency Models |
|
|
204 | (6) |
|
11.2 Pairwise Synchronization |
|
|
210 | (7) |
|
11.3 Locks and How to Use Them |
|
|
217 | (3) |
|
11.4 The C++ Memory Model and OpenMP |
|
|
220 | (4) |
|
|
224 | (1) |
|
12 Beyond OpenMP Common Core Hardware |
|
|
225 | (40) |
|
12.1 Nonuniform Memory Access (NUMA) Systems |
|
|
226 | (21) |
|
12.1.1 Working with NUMA Systems |
|
|
228 | (4) |
|
12.1.1.1 Controlling Thread Affinity |
|
|
232 | (2) |
|
12.1.1.2 Managing Data Locality |
|
|
234 | (6) |
|
12.1.2 Nested Parallel Constructs |
|
|
240 | (3) |
|
12.1.3 Checking the Thread Affinity |
|
|
243 | (2) |
|
12.1.4 Summary: Thread Affinity and Data Locality |
|
|
245 | (2) |
|
|
247 | (9) |
|
|
256 | (6) |
|
|
262 | (3) |
|
13 Your Continuing Education in OpenMP |
|
|
265 | (12) |
|
13.1 Programmer Resources from the ARB |
|
|
265 | (2) |
|
13.2 How to Read the OpenMP Specification |
|
|
267 | (5) |
|
13.2.1 OpenMP with All the Formal Jargon |
|
|
268 | (4) |
|
13.3 The Structure of the OpenMP Specification |
|
|
272 | (3) |
|
|
275 | (2) |
Glossary |
|
277 | (12) |
References |
|
289 | (2) |
Subject Index |
|
291 | |