|
|
xiii | |
Introduction |
|
xvii | |
Concurrency and Parallelism |
|
xvii | |
Why Study Parallel Prograrnming |
|
xvii | |
What Is in This Book |
|
xix | |
|
1 An Introduction to Parallel Computer Architecture |
|
|
1 | (30) |
|
1.1 Parallel Organization |
|
|
1 | (3) |
|
SISD: Single Instruction, Single Data |
|
|
2 | (1) |
|
SIMD: Single Instruction, Multiple Data |
|
|
2 | (1) |
|
MIMD: Multiple Instruction, Multiple Data |
|
|
3 | (1) |
|
MISD: Multiple Instruction, Single Data |
|
|
3 | (1) |
|
|
4 | (1) |
|
|
5 | (3) |
|
|
8 | (3) |
|
|
11 | (2) |
|
1.6 Interconnect Architecture |
|
|
13 | (11) |
|
|
13 | (1) |
|
|
13 | (1) |
|
Types and Quality of Networks |
|
|
14 | (2) |
|
|
16 | (2) |
|
|
18 | (1) |
|
|
19 | (1) |
|
|
20 | (1) |
|
|
21 | (1) |
|
|
22 | (2) |
|
|
24 | (1) |
|
|
24 | (7) |
|
2 Parallel Programming Models |
|
|
31 | (14) |
|
2.1 Distributed-Memory Prograrnming Model |
|
|
32 | (1) |
|
2.2 Shared-Memory Programming Model |
|
|
33 | (2) |
|
|
35 | (2) |
|
2.4 Variants of Task Parallelism |
|
|
37 | (2) |
|
|
39 | (6) |
|
3 Parallel Performance Analysis |
|
|
45 | (30) |
|
3.1 Simple Parallel Model |
|
|
46 | (1) |
|
3.2 Bulk-Synchronous Parallel Model |
|
|
47 | (5) |
|
|
48 | (1) |
|
|
49 | (3) |
|
|
52 | (5) |
|
|
55 | (1) |
|
|
55 | (2) |
|
3.4 Parallel Performance Evaluation |
|
|
57 | (5) |
|
|
57 | (1) |
|
|
58 | (1) |
|
|
58 | (1) |
|
|
59 | (1) |
|
|
59 | (1) |
|
|
60 | (2) |
|
|
62 | (1) |
|
Brent's Work-Time Scheduling Principle |
|
|
63 | (1) |
|
|
63 | (2) |
|
|
65 | (1) |
|
|
66 | (1) |
|
|
67 | (8) |
|
4 Synchronization and Communication Primitives |
|
|
75 | (36) |
|
4.1 Threads and Processes |
|
|
75 | (2) |
|
4.2 Race Condition and Consistency of State |
|
|
77 | (8) |
|
|
78 | (4) |
|
|
82 | (1) |
|
FIFO and Processor Consistency |
|
|
82 | (2) |
|
|
84 | (1) |
|
|
85 | (1) |
|
|
85 | (5) |
|
Synchronization Condition |
|
|
86 | (1) |
|
|
86 | (1) |
|
|
86 | (2) |
|
|
88 | (2) |
|
|
90 | (9) |
|
|
90 | (1) |
|
|
91 | (3) |
|
|
94 | (1) |
|
|
95 | (1) |
|
|
96 | (1) |
|
|
97 | (2) |
|
|
99 | (5) |
|
Point-to-Point Communication |
|
|
99 | (3) |
|
|
102 | (1) |
|
|
102 | (2) |
|
|
104 | (7) |
|
5 Parallel Program Design |
|
|
111 | (28) |
|
|
112 | (3) |
|
|
112 | (1) |
|
|
113 | (1) |
|
|
114 | (1) |
|
|
115 | (1) |
|
|
115 | (9) |
|
|
116 | (4) |
|
|
120 | (3) |
|
|
123 | (1) |
|
|
124 | (6) |
|
|
125 | (1) |
|
Task Scheduling Framework |
|
|
126 | (1) |
|
Centralized Push Scheduling Strategy |
|
|
127 | (2) |
|
Distributed Push Scheduling |
|
|
129 | (1) |
|
|
129 | (1) |
|
|
130 | (2) |
|
5.5 Debugging and Profiling |
|
|
132 | (1) |
|
|
133 | (6) |
|
6 Middleware: The Practice of Parallel Programming |
|
|
139 | (72) |
|
|
139 | (16) |
|
|
140 | (1) |
|
|
140 | (1) |
|
|
141 | (2) |
|
|
143 | (1) |
|
|
144 | (3) |
|
|
147 | (3) |
|
Other Work-Sharing Pragmas |
|
|
150 | (1) |
|
|
151 | (2) |
|
|
153 | (2) |
|
|
155 | (25) |
|
|
156 | (2) |
|
Message-Passing Synchronization |
|
|
158 | (3) |
|
|
161 | (3) |
|
MPI Collective Communication |
|
|
164 | (3) |
|
|
167 | (1) |
|
|
167 | (2) |
|
|
169 | (4) |
|
|
173 | (3) |
|
MPI Groups and Communicators |
|
|
176 | (1) |
|
|
177 | (1) |
|
|
178 | (2) |
|
|
180 | (4) |
|
Partitioned Global Address Space |
|
|
180 | (1) |
|
|
181 | (2) |
|
|
183 | (1) |
|
|
184 | (4) |
|
|
185 | (1) |
|
|
186 | (2) |
|
|
188 | (19) |
|
|
188 | (3) |
|
Data and Function on Device |
|
|
191 | (2) |
|
|
193 | (1) |
|
|
194 | (1) |
|
|
195 | (2) |
|
|
197 | (1) |
|
|
198 | (1) |
|
|
199 | (3) |
|
|
202 | (1) |
|
CUDA Parallel Memory Access |
|
|
203 | (3) |
|
|
206 | (1) |
|
|
207 | (4) |
|
7 Parallel Algorithms and Techniques |
|
|
211 | (58) |
|
7.1 Divide and Conquer: Prefix-Sum |
|
|
212 | (5) |
|
Parallel Prefix-Sum: Method |
|
|
214 | (1) |
|
Parallel Prefix-Sum: Method |
|
|
215 | (1) |
|
Parallel Prefix-Sum: Method |
|
|
215 | (2) |
|
7.2 Divide and Conquer: Merge Two Sorted Lists |
|
|
217 | (10) |
|
|
218 | (1) |
|
|
219 | (3) |
|
|
222 | (4) |
|
|
226 | (1) |
|
7.3 Accelerated Cascading: Find Minima |
|
|
227 | (3) |
|
7.4 Recursive Doubling: List Ranking |
|
|
230 | (1) |
|
7.5 Recursive Doubling: Euler Tour |
|
|
231 | (2) |
|
7.6 Recursive Doubling: Connected Components |
|
|
233 | (5) |
|
7.7 Pipelining: Merge-Sort |
|
|
238 | (11) |
|
|
238 | (2) |
|
|
240 | (5) |
|
4-Cover Property Analysis |
|
|
245 | (3) |
|
|
248 | (1) |
|
7.8 Application of Prefix-Sum: Radix-Sort |
|
|
249 | (1) |
|
7.9 Exploiting Parallelism: Quick-Sort |
|
|
250 | (4) |
|
7.10 Fixing Processor Count: Sample-Sort |
|
|
254 | (3) |
|
7.11 Exploiting Parallelism: Minimum Spanning Tree |
|
|
257 | (7) |
|
|
260 | (3) |
|
MST with Parallel Priority Queue |
|
|
263 | (1) |
|
|
264 | (5) |
Bibliography |
|
269 | (8) |
Index |
|
277 | |