| Preface |
|
vii | |
| About the Author |
|
ix | |
|
|
|
1 | (6) |
|
1.1 Classifications of Parallel Architectures |
|
|
1 | (1) |
|
1.2 Shared-Memory Computers |
|
|
2 | (1) |
|
1.3 Interconnection-Network Computers |
|
|
3 | (2) |
|
|
|
5 | (2) |
|
2 Shared-memory Computers (PRAM) |
|
|
7 | (88) |
|
|
|
7 | (1) |
|
2.2 The Balanced Tree Method |
|
|
8 | (2) |
|
|
|
10 | (1) |
|
2.4 Sorting in 9(1) Time on the CRCW PRAM Model |
|
|
11 | (3) |
|
2.4.1 Implementation on the CREW PRAM model |
|
|
12 | (1) |
|
2.4.2 Implementation on the EREW PRAM model |
|
|
13 | (1) |
|
|
|
14 | (4) |
|
|
|
16 | (2) |
|
|
|
18 | (1) |
|
|
|
18 | (3) |
|
|
|
21 | (1) |
|
|
|
22 | (5) |
|
|
|
25 | (1) |
|
2.8.2 Computing vertex levels in a tree |
|
|
26 | (1) |
|
|
|
27 | (5) |
|
|
|
27 | (3) |
|
|
|
30 | (1) |
|
2.9.3 Parallel bottom-up merge sorting |
|
|
31 | (1) |
|
2.10 The Zero-one Principle |
|
|
32 | (1) |
|
|
|
33 | (2) |
|
2.12 Bitonic Merging and Sorting |
|
|
35 | (8) |
|
|
|
40 | (3) |
|
|
|
43 | (7) |
|
|
|
44 | (2) |
|
2.13.2 Computing and maintaining ranks |
|
|
46 | (3) |
|
2.13.3 Analysis of the algorithm |
|
|
49 | (1) |
|
|
|
50 | (2) |
|
|
|
52 | (4) |
|
2.16 Matrix Multiplication |
|
|
56 | (2) |
|
|
|
58 | (1) |
|
|
|
58 | (1) |
|
2.19 Minimum Spanning Trees |
|
|
59 | (4) |
|
2.20 Computing the Convex Hull of a Set of Points |
|
|
63 | (5) |
|
|
|
68 | (1) |
|
|
|
69 | (8) |
|
|
|
77 | (18) |
|
|
|
95 | (64) |
|
|
|
95 | (1) |
|
|
|
96 | (3) |
|
3.3 Embeddings of the Hypercube |
|
|
99 | (5) |
|
|
|
100 | (1) |
|
3.3.2 Embedding of a linear array into the hypercube |
|
|
101 | (1) |
|
3.3.3 Embedding of a mesh into the hypercube |
|
|
102 | (1) |
|
3.3.4 Embedding of a binary tree into the hypercube |
|
|
103 | (1) |
|
3.4 Broadcasting in the Hypercube |
|
|
104 | (1) |
|
|
|
105 | (1) |
|
3.6 Permutation Routing on the Hypercube |
|
|
105 | (5) |
|
3.6.1 The greedy algorithm |
|
|
106 | (1) |
|
3.6.2 The randomized algorithm |
|
|
107 | (3) |
|
3.7 Permutation Routing on the Butterfly |
|
|
110 | (2) |
|
3.8 Computing Parallel Prefix on the Hypercube |
|
|
112 | (1) |
|
|
|
113 | (2) |
|
|
|
115 | (3) |
|
3.11 Selection on the Hypercube |
|
|
118 | (1) |
|
3.12 Multiselection on the Hypercube |
|
|
119 | (3) |
|
3.13 Load Balancing on the Hypercube |
|
|
122 | (4) |
|
3.14 Computing Parallel Prefix on the Butterfly |
|
|
126 | (1) |
|
3.15 Odd-Even Merging and Sorting on the Butterfly |
|
|
127 | (5) |
|
3.16 Matrix Multiplication on the Hypercube |
|
|
132 | (6) |
|
|
|
138 | (1) |
|
|
|
138 | (6) |
|
|
|
144 | (15) |
|
4 The Linear Array and the Mesh |
|
|
159 | (68) |
|
|
|
159 | (2) |
|
4.2 Embedding between a Mesh and a Linear Array |
|
|
161 | (1) |
|
4.3 Broadcasting in the Linear Array and the Mesh |
|
|
162 | (1) |
|
4.4 Computing Parallel Prefix on the Mesh |
|
|
163 | (1) |
|
4.5 Odd-Even Transposition Sort |
|
|
164 | (1) |
|
|
|
165 | (2) |
|
4.7 A Simple O(√n) Time Algorithm for Sorting on the Mesh |
|
|
167 | (2) |
|
4.8 Odd-Even Merging and Sorting on the Mesh |
|
|
169 | (3) |
|
4.9 Routing on the Linear Array and the Mesh |
|
|
172 | (5) |
|
4.9.1 Routing in the linear array |
|
|
172 | (1) |
|
4.9.2 Deterministic routing in the mesh |
|
|
173 | (1) |
|
4.9.3 Randomized routing on the mesh |
|
|
174 | (3) |
|
4.10 Matrix Multiplication on the Mesh |
|
|
177 | (3) |
|
4.10.1 The first algorithm |
|
|
177 | (1) |
|
4.10.2 The second algorithm |
|
|
178 | (2) |
|
4.11 Computing the Transitive Closure on the Mesh |
|
|
180 | (4) |
|
4.12 Connected Components |
|
|
184 | (1) |
|
|
|
185 | (1) |
|
4.14 Computing the Convex Hull of a Set of Points on the Mesh |
|
|
185 | (6) |
|
4.14.1 The first algorithm |
|
|
186 | (1) |
|
4.14.2 The second algorithm |
|
|
187 | (4) |
|
4.15 Labeling Connected Components |
|
|
191 | (5) |
|
4.15.1 The propagation algorithm |
|
|
191 | (1) |
|
4.15.2 The recursive algorithm |
|
|
192 | (4) |
|
|
|
196 | (6) |
|
|
|
202 | (4) |
|
4.17.1 Sorting on 3-dimensional meshes |
|
|
202 | (4) |
|
|
|
206 | (1) |
|
|
|
207 | (5) |
|
|
|
212 | (15) |
|
|
|
227 | (26) |
|
|
|
227 | (4) |
|
5.2 Implementation on the Butterfly |
|
|
231 | (1) |
|
5.3 Iterative FFT on the Butterfly |
|
|
231 | (3) |
|
5.4 The Inverse Fourier Transform |
|
|
234 | (1) |
|
5.5 Product of Polynomials |
|
|
235 | (3) |
|
5.6 Computing the Convolution of Two Vectors |
|
|
238 | (1) |
|
5.7 The Product of a Toeplitz Matrix and a Vectors |
|
|
239 | (2) |
|
5.8 Using Modular Arithmetic |
|
|
241 | (3) |
|
|
|
244 | (1) |
|
|
|
244 | (2) |
|
|
|
246 | (7) |
|
|
|
253 | (28) |
|
|
|
253 | (7) |
|
6.1.1 Semigroup operations |
|
|
254 | (1) |
|
6.1.2 Sorting by minimum extraction |
|
|
254 | (1) |
|
6.1.3 Sorting by partitioning |
|
|
255 | (1) |
|
|
|
256 | (3) |
|
6.1.5 The one-dimensional pyramid |
|
|
259 | (1) |
|
|
|
260 | (4) |
|
6.2.1 Computing parallel prefix on the pyramid |
|
|
262 | (2) |
|
|
|
264 | (6) |
|
6.3.1 Sorting on the mesh of trees |
|
|
266 | (3) |
|
6.3.2 Routing in the mesh of trees |
|
|
269 | (1) |
|
6.4 Computing Parallel Prefix on the Mesh of Trees |
|
|
270 | (2) |
|
6.5 Comparison Between the Mesh of Trees and the Pyramid |
|
|
272 | (1) |
|
|
|
273 | (1) |
|
|
|
273 | (2) |
|
|
|
275 | (6) |
|
|
|
281 | (32) |
|
|
|
281 | (2) |
|
7.2 Ranking of the Processors |
|
|
283 | (2) |
|
7.3 Routing between Substars |
|
|
285 | (2) |
|
7.4 Computing Parallel Prefix on the Star |
|
|
287 | (3) |
|
7.5 Computing the Maximum |
|
|
290 | (2) |
|
7.6 Neighborhood Broadcasting and Recursive Doubling |
|
|
292 | (2) |
|
7.7 Broadcasting in the Star |
|
|
294 | (2) |
|
7.8 The Arrangement Graph |
|
|
296 | (1) |
|
7.9 The (d, k)-Star Graph |
|
|
297 | (3) |
|
7.10 Sorting in the Sd,k Star |
|
|
300 | (3) |
|
|
|
303 | (1) |
|
|
|
303 | (2) |
|
|
|
305 | (8) |
|
8 Optical Transpose Interconnection System (OTIS) |
|
|
313 | (28) |
|
|
|
313 | (1) |
|
|
|
314 | (10) |
|
8.2.1 Data movements in the OTIS-Mesh |
|
|
315 | (1) |
|
8.2.2 Broadcasting in the OTIS-Mesh |
|
|
316 | (1) |
|
8.2.3 Semigroup operations on the OTIS-Mesh |
|
|
316 | (2) |
|
8.2.4 Parallel prefix in OTIS-Mesh |
|
|
318 | (2) |
|
8.2.5 Shift operations on the OTIS-Mesh |
|
|
320 | (2) |
|
8.2.6 Permutation routing in OTIS-Mesh |
|
|
322 | (2) |
|
8.2.7 Sorting on OTIS-Mesh |
|
|
324 | (1) |
|
|
|
324 | (4) |
|
8.3.1 Simulation of an n2-processor hypercube |
|
|
324 | (2) |
|
8.3.2 Broadcasting in the OTIS-Hypercube |
|
|
326 | (1) |
|
8.3.3 Semigroup operations on the OTIS-Hypercube |
|
|
326 | (1) |
|
8.3.4 Sorting and routing in the OTIS-Hypercube |
|
|
327 | (1) |
|
|
|
328 | (2) |
|
|
|
328 | (1) |
|
|
|
328 | (2) |
|
|
|
330 | (1) |
|
|
|
331 | (3) |
|
|
|
334 | (7) |
|
|
|
341 | (16) |
|
|
|
341 | (1) |
|
9.2 Matrix-vector Multiplication |
|
|
342 | (1) |
|
9.3 Computing the Convolution of Two Sequences |
|
|
342 | (3) |
|
9.3.1 Semisystolic solution |
|
|
343 | (1) |
|
9.3.2 Pure systolic solution |
|
|
344 | (1) |
|
9.4 A Zero-time VLSI Sorter |
|
|
345 | (2) |
|
9.5 An On-chip Bubble Sorter |
|
|
347 | (4) |
|
|
|
351 | (1) |
|
|
|
352 | (1) |
|
|
|
353 | (4) |
|
Appendix A Mathematical Preliminaries |
|
|
357 | (10) |
|
|
|
357 | (2) |
|
|
|
357 | (1) |
|
|
|
358 | (1) |
|
|
|
358 | (1) |
|
|
|
359 | (1) |
|
A.2 Divide-and-conquer Recurrences |
|
|
359 | (2) |
|
|
|
361 | (1) |
|
|
|
362 | (5) |
|
A.4.1 Random variables and expectation |
|
|
362 | (1) |
|
A.4.2 Bernoulli distribution |
|
|
363 | (1) |
|
A.4.3 Binomial distribution |
|
|
364 | (1) |
|
|
|
364 | (3) |
| Bibliography |
|
367 | (8) |
| Index |
|
375 | |