Muutke küpsiste eelistusi

E-raamat: Algorithms and Parallel Computing [Wiley Online]

Teised raamatud teemal:
  • Wiley Online
  • Hind: 128,99 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
Teised raamatud teemal:
"There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application.This book provides the techniques to explore the possible ways to program a parallel computer for a given application"--

"This book provides the techniques to explore the possible ways to program a parallel computer for a given application"--

Provided by publisher.

There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.
Preface xiii
List of Acronyms
xix
1 Introduction
1(28)
1.1 Introduction
1(1)
1.2 Toward Automating Parallel Programming
2(2)
1.3 Algorithms
4(8)
1.4 Parallel Computing Design Considerations
12(1)
1.5 Parallel Algorithms and Parallel Architectures
13(1)
1.6 Relating Parallel Algorithm and Parallel Architecture
14(1)
1.7 Implementation of Algorithms: A Two-Sided Problem
14(1)
1.8 Measuring Benefits of Parallel Computing
15(4)
1.9 Amdahl's Law for Multiprocessor Systems
19(2)
1.10 Gustafson-Barsis's Law
21(1)
1.11 Applications of Parallel Computing
22(7)
2 Enhancing Uniprocessor Performance
29(24)
2.1 Introduction
29(1)
2.2 Increasing Processor Clock Frequency
30(1)
2.3 Parallelizing ALU Structure
30(3)
2.4 Using Memory Hierarchy
33(6)
2.5 Pipelining
39(5)
2.6 Very Long Instruction Word (VLIW) Processors
44(1)
2.7 Instruction-Level Parallelism (ILP) and Superscalar Processors
45(4)
2.8 Multithreaded Processor
49(4)
3 Parallel Computers
53(16)
3.1 Introduction
53(1)
3.2 Parallel Computing
53(1)
3.3 Shared-Memory Multiprocessors (Uniform Memory Access [ UMA])
54(2)
3.4 Distributed-Memory Multiprocessor (Nonuniform Memory Access [ NUMA])
56(1)
3.5 SIMD Processors
57(1)
3.6 Systolic Processors
57(3)
3.7 Cluster Computing
60(1)
3.8 Grid (Cloud) Computing
60(1)
3.9 Multicore Systems
61(1)
3.10 SM
62(2)
3.11 Communication Between Parallel Processors
64(3)
3.12 Summary of Parallel Architectures
67(2)
4 Shared-Memory Multiprocessors
69(14)
4.1 Introduction
69(1)
4.2 Cache Coherence and Memory Consistency
70(6)
4.3 Synchronization and Mutual Exclusion
76(7)
5 Interconnection Networks
83(22)
5.1 Introduction
83(1)
5.2 Classification of Interconnection Networks by Logical Topologies
84(7)
5.3 Interconnection Network Switch Architecture
91(14)
6 Concurrency Platforms
105(26)
6.1 Introduction
105(1)
6.2 Concurrency Platforms
105(1)
6.3 Cilk++
106(6)
6.4 OpenMP
112(10)
6.5 Compute Unified Device Architecture (CUDA)
122(9)
7 Ad Hoc Techniques for Parallel Algorithms
131(12)
7.1 Introduction
131(2)
7.2 Defining Algorithm Variables
133(1)
7.3 Independent Loop Scheduling
133(1)
7.4 Dependent Loops
134(1)
7.5 Loop Spreading for Simple Dependent Loops
135(1)
7.6 Loop Unrolling
135(1)
7.7 Problem Partitioning
136(1)
7.8 Divide-and-Conquer (Recursive Partitioning) Strategies
137(2)
7.9 Pipelining
139(4)
8 Nonserial-Parallel Algorithms
143(16)
8.1 Introduction
143(1)
8.2 Comparing DAG and DCG Algorithms
143(2)
8.3 Parallelizing NSPA Algorithms Represented by a DAG
145(2)
8.4 Formal Technique for Analyzing NSPAs
147(3)
8.5 Detecting Cycles in the Algorithm
150(1)
8.6 Extracting Serial and Parallel Algorithm Performance Parameters
151(2)
8.7 Useful Theorems
153(3)
8.8 Performance of Serial and Parallel Algorithms on Parallel Computers
156(3)
9 z-Transform Analysis
159(8)
9.1 Introduction
159(1)
9.2 Definition of z-Transform
159(1)
9.3 The I-D FIR Digital Filter Algorithm
160(1)
9.4 Software and Hardware Implementations of the z-Transform
161(1)
9.5 Design 1: Using Horner's Rule for Broadcast Input and Pipelined Output
162(1)
9.6 Design 2: Pipelined Input and Broadcast Output
163(1)
9.7 Design 3: Pipelined Input and Output
164(3)
10 Dependence Graph Analysis
167(18)
10.1 Introduction
167(1)
10.2 The 1-D FIR Digital Filter Algorithm
167(1)
10.3 The Dependence Graph of an Algorithm
168(1)
10.4 Deriving the Dependence Graph for an Algorithm
168(3)
10.5 The Scheduling Function for the 1-D FIR Filter
171(6)
10.6 Node Projection Operation
177(2)
10.7 Nonlinear Projection Operation
179(1)
10.8 Software and Hardware Implementations of the DAG Technique
180(5)
11 Computational Geometry Analysis
185(24)
11.1 Introduction
185(1)
11.2 Matrix Multiplication Algorithm
185(1)
11.3 The 3-D Dependence Graph and Computation Domain D
186(2)
11.4 The Facets and Vertices of D
188(1)
11.5 The Dependence Matrices of the Algorithm Variables
188(1)
11.6 Nullspace of Dependence Matrix: The Broadcast Subdomain B
189(3)
11.7 Design Space Exploration: Choice of Broadcasting versus Pipelining Variables
192(3)
11.8 Data Scheduling
195(5)
11.9 Projection Operation Using the Linear Projection Operator
200(5)
11.10 Effect of Projection Operation on Data
205(1)
11.11 The Resulting Multithreaded/Multiprocessor Architecture
206(1)
11.12 Summary of Work Done in this
Chapter
207(2)
12 Case Study: One-Dimensional IIR Digital Filters
209(10)
12.1 Introduction
209(1)
12.2 The 1-D IIR Digital Filter Algorithm
209(1)
12.3 The IIR Filter Dependence Graph
209(7)
12.4 z-Domain Analysis of 1-D IIR Digital Filter Algorithm
216(3)
13 Case Study: Two- and Three-Dimensional Digital Filters
219(8)
13.1 Introduction
219(1)
13.2 Line and Frame Wraparound Problems
219(2)
13.3 2-D Recursive Filters
221(2)
13.4 3-D Digital Filters
223(4)
14 Case Study: Multirate Decimators and Interpolators
227(18)
14.1 Introduction
227(1)
14.2 Decimator Structures
227(1)
14.3 Decimator Dependence Graph
228(2)
14.4 Decimator Scheduling
230(1)
14.5 Decimator DAG for s1 = [ 1 0]
231(2)
14.6 Decimator DAG for s2 = [ 1 -1]
233(2)
14.7 Decimator DAG for s3 = [ 1 1]
235(1)
14.8 Polyphase Decimator Implementations
235(1)
14.9 Interpolator Structures
236(1)
14.10 Interpolator Dependence Graph
237(1)
14.11 Interpolator Scheduling
238(1)
14.12 Interpolator DAG for s1 = [ 1 0]
239(2)
14.13 Interpolator DAG for s2 = [ 1 -1]
241(2)
14.14 Interpolator DAG for s3 = [ 1 1]
243(1)
14.15 Polyphase Interpolator Implementations
243(2)
15 Case Study: Pattern Matching
245(10)
15.1 Introduction
245(1)
15.2 Expressing the Algorithm as a Regular Iterative Algorithm (RIA)
245(1)
15.3 Obtaining the Algorithm Dependence Graph
246(1)
15.4 Data Scheduling
247(1)
15.5 DAG Node Projection
248(1)
15.6 DESIGN 1: Design Space Exploration When s = [ 1 1]t
249(3)
15.7 DESIGN 2: Design Space Exploration When s = [ 1 -1]t
252(1)
15.8 DESIGN 3: Design Space Exploration When s = [ 1 0]t
253(2)
16 Case Study: Motion Estimation for Video Compression
255(12)
16.1 Introduction
255(1)
16.2 FBMAs
256(1)
16.3 Data Buffering Requirements
257(1)
16.4 Formulation of the FBMA
258(1)
16.5 Hierarchical Formulation of Motion Estimation
259(2)
16.6 Hardware Design of the Hierarchy Blocks
261(6)
17 Case Study: Multiplication over GF(2m)
267(12)
17.1 Introduction
267(1)
17.2 The Multiplication Algorithm in GF(2m)
268(2)
17.3 Expressing Field Multiplication as an RIA
270(1)
17.4 Field Multiplication Dependence Graph
270(1)
17.5 Data Scheduling
271(2)
17.6 DAG Node Projection
273(2)
17.7 Design 1: Using d1 = [ 1 0]t
275(1)
17.8 Design 2: Using d2 = [ 1 1]t
275(2)
17.9 Design 3: Using d3 = [ 1 -1]t
277(1)
17.10 Applications of Finite Field Multipliers
277(2)
18 Case Study: Polynomial Division over GF(2)
279(14)
18.1 Introduction
279(1)
18.2 The Polynomial Division Algorithm
279(2)
18.3 The LFSR Dependence Graph
281(1)
18.4 Data Scheduling
282(1)
18.5 DAG Node Projection
283(1)
18.6 Design 1: Design Space Exploration When s1 = [ 1 -1]
284(2)
18.7 Design 2: Design Space Exploration When s2 = [ 1 0]
286(3)
18.8 Design 3: Design Space Exploration When s3 = [ 1 -0.5]
289(2)
18.9 Comparing the Three Designs
291(2)
19 The Fast Fourier Transform
293(12)
19.1 Introduction
293(2)
19.2 Decimation-in-Time FFT
295(3)
19.3 Pipeline Radix-2 Decimation-in-Time FFT Processor
298(1)
19.4 Decimation-in-Frequency FFT
299(4)
19.5 Pipeline Radix-2 Decimation-in-Frequency FFT Processor
303(2)
20 Solving Systems of Linear Equations
305(18)
20.1 Introduction
305(1)
20.2 Special Matrix Structures
305(4)
20.3 Forward Substitution (Direct Technique)
309(3)
20.4 Back Substitution
312(1)
20.5 Matrix Triangularization Algorithm
312(5)
20.6 Successive over Relaxation (SOR) (Iterative Technique)
317(4)
20.7 Problems
321(2)
21 Solving Partial Differential Equations Using Finite Difference Method
323(8)
21.1 Introduction
323(1)
21.2 FDM for 1-D Systems
324(7)
References 331(6)
Index 337
Fayez Gebali, PhD, has taught at the University of Victoria since 1984 and has served as the Associate Dean of Engineering for undergraduate programs since 2002. He has contributed to dozens of journals and technical reports and has completed four books. Dr. Gebali's primary research interests include VLSI design, processor array design, algorithms for computer arithmetic, and communication system modeling.