Preface |
|
xi | |
Introduction |
|
xv | |
Chapter 1 Computer Architectures |
|
1 | (16) |
|
1.1 Different types of parallelism |
|
|
1 | (6) |
|
1.1.1 Overlap, concurrency and parallelism |
|
|
1 | (3) |
|
1.1.2 Temporal and spatial parallelism for arithmetic logic units |
|
|
4 | (2) |
|
1.1.3 Parallelism and memory |
|
|
6 | (1) |
|
|
7 | (7) |
|
1.2.1 Interleaved multi-bank memory |
|
|
7 | (1) |
|
|
8 | (5) |
|
|
13 | (1) |
|
|
14 | (3) |
|
1.3.1 Graphics-type accelerators |
|
|
14 | (2) |
|
|
16 | (1) |
Chapter 2 Parallelization and Programming Models |
|
17 | (36) |
|
|
17 | (2) |
|
|
19 | (6) |
|
2.2.1 Degree of parallelism |
|
|
19 | (2) |
|
|
21 | (1) |
|
|
21 | (1) |
|
|
22 | (3) |
|
|
25 | (12) |
|
|
25 | (1) |
|
|
26 | (1) |
|
2.3.3 Examples of dependence |
|
|
27 | (3) |
|
2.3.4 Reduction operations |
|
|
30 | (1) |
|
|
31 | (3) |
|
|
34 | (3) |
|
2.4 Vectorization: a case study |
|
|
37 | (6) |
|
2.4.1 Vector computers and vectorization |
|
|
37 | (1) |
|
|
38 | (1) |
|
2.4.3 Reduction operations |
|
|
39 | (2) |
|
2.4.4 Pipeline operations |
|
|
41 | (2) |
|
|
43 | (6) |
|
2.5.1 Message-passing programming |
|
|
43 | (1) |
|
2.5.2 Parallel environment management |
|
|
44 | (1) |
|
2.5.3 Point-to-point communications |
|
|
45 | (1) |
|
2.5.4 Collective communications |
|
|
46 | (3) |
|
|
49 | (4) |
Chapter 3 Parallel Algorithm Concepts |
|
53 | (18) |
|
3.1 Parallel algorithms for recurrences |
|
|
54 | (4) |
|
3.1.1 The principles of reduction methods |
|
|
54 | (1) |
|
3.1.2 Overhead and stability of reduction methods |
|
|
55 | (2) |
|
|
57 | (1) |
|
3.2 Data locality and distribution: product of matrices |
|
|
58 | (13) |
|
3.2.1 Row and column algorithms |
|
|
58 | (2) |
|
|
60 | (4) |
|
3.2.3 Distributed algorithms |
|
|
64 | (2) |
|
|
66 | (5) |
Chapter 4 Basics of Numerical Matrix Analysis |
|
71 | (22) |
|
4.1 Review of basic notions of linear algebra |
|
|
71 | (8) |
|
4.1.1 Vector spaces, scalar products and orthogonal projection |
|
|
71 | (3) |
|
4.1.2 Linear applications and matrices |
|
|
74 | (5) |
|
4.2 Properties of matrices |
|
|
79 | (14) |
|
4.2.1 Matrices, eigenvalues and eigenvectors |
|
|
79 | (1) |
|
|
80 | (3) |
|
|
83 | (2) |
|
4.2.4 Conditioning of a matrix |
|
|
85 | (8) |
Chapter 5 Sparse Matrices |
|
93 | (12) |
|
5.1 Origins of sparse matrices |
|
|
93 | (5) |
|
5.2 Parallel formation of sparse matrices: shared memory |
|
|
98 | (1) |
|
5.3 Parallel formation by block of sparse matrices: distributed memory |
|
|
99 | (6) |
|
5.3.1 Parallelization by sets of vertices |
|
|
99 | (2) |
|
5.3.2 Parallelization by sets of elements |
|
|
101 | (1) |
|
5.3.3 Comparison: sets of vertices and elements |
|
|
101 | (4) |
Chapter 6 Solving Linear Systems |
|
105 | (4) |
|
|
105 | (1) |
|
|
106 | (3) |
Chapter 7 LU Methods for Solving Linear Systems |
|
109 | (16) |
|
7.1 Principle of LU decomposition |
|
|
109 | (4) |
|
|
113 | (2) |
|
7.3 Gauss—Jordan factorization |
|
|
115 | (6) |
|
|
118 | (3) |
|
7.4 Crout and Cholesky factorizations for symmetric matrices |
|
|
121 | (4) |
Chapter 8 Parallelization of LU Methods for Dense Matrices |
|
125 | (14) |
|
|
125 | (5) |
|
8.2 Implementation of block factorization in a message-passing environment |
|
|
130 | (5) |
|
8.3 Parallelization of forward and backward substitutions |
|
|
135 | (4) |
Chapter 9 LU Methods for Sparse Matrices |
|
139 | (22) |
|
9.1 Structure of factorized matrices |
|
|
139 | (3) |
|
9.2 Symbolic factorization and renumbering |
|
|
142 | (5) |
|
|
147 | (5) |
|
9.4 Elimination trees and dependencies |
|
|
152 | (1) |
|
|
153 | (6) |
|
9.6 Forward and backward substitutions |
|
|
159 | (2) |
Chapter 10 Basics of Krylov Subspaces |
|
161 | (6) |
|
|
161 | (3) |
|
10.2 Construction of the Arnoldi basis |
|
|
164 | (3) |
Chapter 11 Methods with Complete Orthogonalization for Symmetric Positive Definite Matrices |
|
167 | (18) |
|
11.1 Construction of the Lanczos basis for symmetric matrices |
|
|
167 | (1) |
|
|
168 | (5) |
|
11.3 The conjugate gradient method |
|
|
173 | (4) |
|
11.4 Comparison with the gradient method |
|
|
177 | (3) |
|
11.5 Principle of preconditioning for symmetric positive definite matrices |
|
|
180 | (5) |
Chapter 12 Exact Orthogonalization Methods for Arbitrary Matrices |
|
185 | (16) |
|
|
185 | (8) |
|
12.2 The case of symmetric matrices: the MINRES method |
|
|
193 | (3) |
|
|
196 | (2) |
|
12.4 Principle of preconditioning for non-symmetric matrices |
|
|
198 | (3) |
Chapter 13 Biorthogonalization Methods for Non-symmetric Matrices |
|
201 | (24) |
|
13.1 Lanczos biorthogonal basis for non-symmetric matrices |
|
|
201 | (5) |
|
13.2 The non-symmetric Lanczos method |
|
|
206 | (1) |
|
13.3 The biconjugate gradient method: BiCG |
|
|
207 | (4) |
|
13.4 The quasi-minimal residual method: QMR |
|
|
211 | (6) |
|
|
217 | (8) |
Chapter 14 Parallelization of Krylov Methods |
|
225 | (18) |
|
14.1 Parallelization of dense matrix-vector product |
|
|
225 | (2) |
|
14.2 Parallelization of sparse matrix-vector product based on node sets |
|
|
227 | (2) |
|
14.3 Parallelization of sparse matrix-vector product based on element sets |
|
|
229 | (9) |
|
14.3.1 Review of the principles of domain decomposition |
|
|
229 | (2) |
|
14.3.2 Matrix-vector product |
|
|
231 | (2) |
|
14.3.3 Interface exchanges |
|
|
233 | (3) |
|
14.3.4 Asynchronous matrix-vector product with non-blocking communications |
|
|
236 | (1) |
|
14.3.5 Comparison: parallelization based on node and element sets |
|
|
236 | (2) |
|
14.4 Parallelization of the scalar product |
|
|
238 | (3) |
|
|
239 | (1) |
|
|
239 | (1) |
|
|
240 | (1) |
|
14.5 Summary of the parallelization of Krylov methods |
|
|
241 | (2) |
Chapter 15 Parallel Preconditioning Methods |
|
243 | (36) |
|
|
243 | (2) |
|
15.2 Incomplete factorization methods |
|
|
245 | (5) |
|
|
245 | (3) |
|
|
248 | (2) |
|
15.3 Schur complement method |
|
|
250 | (7) |
|
15.3.1 Optimal local preconditioning |
|
|
250 | (1) |
|
15.3.2 Principle of the Schur complement method |
|
|
251 | (3) |
|
15.3.3 Properties of the Schur complement method |
|
|
254 | (3) |
|
|
257 | (6) |
|
15.4.1 Preconditioning using projection |
|
|
257 | (1) |
|
15.4.2 Algebraic construction of a coarse grid |
|
|
258 | (3) |
|
15.4.3 Algebraic multigrid methods |
|
|
261 | (2) |
|
15.5 The Schwarz additive method of preconditioning |
|
|
263 | (12) |
|
15.5.1 Principle of the overlap |
|
|
263 | (2) |
|
15.5.2 Multiplicative versus additive Schwarz methods |
|
|
265 | (3) |
|
15.5.3 Additive Schwarz preconditioning |
|
|
268 | (1) |
|
15.5.4 Restricted additive Schwarz: parallel implementation |
|
|
269 | (6) |
|
15.6 Preconditioners based on the physics |
|
|
275 | (4) |
|
15.6.1 Gauss—Seidel method |
|
|
275 | (1) |
|
|
276 | (3) |
Appendices |
|
279 | (60) |
|
|
281 | (20) |
|
|
301 | (22) |
|
|
323 | (16) |
Bibliography |
|
339 | (4) |
Index |
|
343 | |