Tasuta saatmine! | Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Algorithms and Parallel Computing [Wiley Online]

Fayez Gebali

Teised formaadid

Other digital carrier (Hind: 102,23 €) - 14-Mar-2011

Formaat: 368 pages
Sari: Wiley Series on Parallel and Distributed Computing
Ilmumisaeg: 18-Mar-2011
Kirjastus: John Wiley & Sons Inc
ISBN-10: 470932023
ISBN-13: 9780470932025

Teised raamatud teemal:

Grid & parallel computing

Wiley Online
Hind: 128,99 €*
* hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks

Formaat: 368 pages
Sari: Wiley Series on Parallel and Distributed Computing
Ilmumisaeg: 18-Mar-2011
Kirjastus: John Wiley & Sons Inc
ISBN-10: 470932023
ISBN-13: 9780470932025

Teised raamatud teemal:

Grid & parallel computing

Rohkem infot Wiley Online kohta

Raamatu kodulehekülg: https://onlinelibrary.wiley.com/doi/book/10.1002/9780470932025

"There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application.This book provides the techniques to explore the possible ways to program a parallel computer for a given application"--

"This book provides the techniques to explore the possible ways to program a parallel computer for a given application"--

Provided by publisher.

There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.

Preface

xiii

List of Acronyms

xix

1 Introduction

(28)

1.1 Introduction

(1)

1.2 Toward Automating Parallel Programming

(2)

1.3 Algorithms

(8)

1.4 Parallel Computing Design Considerations

(1)

1.5 Parallel Algorithms and Parallel Architectures

(1)

1.6 Relating Parallel Algorithm and Parallel Architecture

(1)

1.7 Implementation of Algorithms: A Two-Sided Problem

(1)

1.8 Measuring Benefits of Parallel Computing

(4)

1.9 Amdahl's Law for Multiprocessor Systems

(2)

1.10 Gustafson-Barsis's Law

(1)

1.11 Applications of Parallel Computing

(7)

2 Enhancing Uniprocessor Performance

(24)

2.1 Introduction

(1)

2.2 Increasing Processor Clock Frequency

(1)

2.3 Parallelizing ALU Structure

(3)

2.4 Using Memory Hierarchy

(6)

2.5 Pipelining

(5)

2.6 Very Long Instruction Word (VLIW) Processors

(1)

2.7 Instruction-Level Parallelism (ILP) and Superscalar Processors

(4)

2.8 Multithreaded Processor

(4)

3 Parallel Computers

(16)

3.1 Introduction

(1)

3.2 Parallel Computing

(1)

3.3 Shared-Memory Multiprocessors (Uniform Memory Access [ UMA])

(2)

3.4 Distributed-Memory Multiprocessor (Nonuniform Memory Access [ NUMA])

(1)

3.5 SIMD Processors

(1)

3.6 Systolic Processors

(3)

3.7 Cluster Computing

(1)

3.8 Grid (Cloud) Computing

(1)

3.9 Multicore Systems

(1)

3.10 SM

(2)

3.11 Communication Between Parallel Processors

(3)

3.12 Summary of Parallel Architectures

(2)

4 Shared-Memory Multiprocessors

(14)

4.1 Introduction

(1)

4.2 Cache Coherence and Memory Consistency

(6)

4.3 Synchronization and Mutual Exclusion

(7)

5 Interconnection Networks

(22)

5.1 Introduction

(1)

5.2 Classification of Interconnection Networks by Logical Topologies

(7)

5.3 Interconnection Network Switch Architecture

(14)

6 Concurrency Platforms

105

(26)

6.1 Introduction

105

(1)

6.2 Concurrency Platforms

105

(1)

6.3 Cilk++

106

(6)

6.4 OpenMP

112

(10)

6.5 Compute Unified Device Architecture (CUDA)

122

(9)

7 Ad Hoc Techniques for Parallel Algorithms

131

(12)

7.1 Introduction

131

(2)

7.2 Defining Algorithm Variables

133

(1)

7.3 Independent Loop Scheduling

133

(1)

7.4 Dependent Loops

134

(1)

7.5 Loop Spreading for Simple Dependent Loops

135

(1)

7.6 Loop Unrolling

135

(1)

7.7 Problem Partitioning

136

(1)

7.8 Divide-and-Conquer (Recursive Partitioning) Strategies

137

(2)

7.9 Pipelining

139

(4)

8 Nonserial-Parallel Algorithms

143

(16)

8.1 Introduction

143

(1)

8.2 Comparing DAG and DCG Algorithms

143

(2)

8.3 Parallelizing NSPA Algorithms Represented by a DAG

145

(2)

8.4 Formal Technique for Analyzing NSPAs

147

(3)

8.5 Detecting Cycles in the Algorithm

150

(1)

8.6 Extracting Serial and Parallel Algorithm Performance Parameters

151

(2)

8.7 Useful Theorems

153

(3)

8.8 Performance of Serial and Parallel Algorithms on Parallel Computers

156

(3)

9 z-Transform Analysis

159

(8)

9.1 Introduction

159

(1)

9.2 Definition of z-Transform

159

(1)

9.3 The I-D FIR Digital Filter Algorithm

160

(1)

9.4 Software and Hardware Implementations of the z-Transform

161

(1)

9.5 Design 1: Using Horner's Rule for Broadcast Input and Pipelined Output

162

(1)

9.6 Design 2: Pipelined Input and Broadcast Output

163

(1)

9.7 Design 3: Pipelined Input and Output

164

(3)

10 Dependence Graph Analysis

167

(18)

10.1 Introduction

167

(1)

10.2 The 1-D FIR Digital Filter Algorithm

167

(1)

10.3 The Dependence Graph of an Algorithm

168

(1)

10.4 Deriving the Dependence Graph for an Algorithm

168

(3)

10.5 The Scheduling Function for the 1-D FIR Filter

171

(6)

10.6 Node Projection Operation

177

(2)

10.7 Nonlinear Projection Operation

179

(1)

10.8 Software and Hardware Implementations of the DAG Technique

180

(5)

11 Computational Geometry Analysis

185

(24)

11.1 Introduction

185

(1)

11.2 Matrix Multiplication Algorithm

185

(1)

11.3 The 3-D Dependence Graph and Computation Domain D

186

(2)

11.4 The Facets and Vertices of D

188

(1)

11.5 The Dependence Matrices of the Algorithm Variables

188

(1)

11.6 Nullspace of Dependence Matrix: The Broadcast Subdomain B

189

(3)

11.7 Design Space Exploration: Choice of Broadcasting versus Pipelining Variables

192

(3)

11.8 Data Scheduling

195

(5)

11.9 Projection Operation Using the Linear Projection Operator

200

(5)

11.10 Effect of Projection Operation on Data

205

(1)

11.11 The Resulting Multithreaded/Multiprocessor Architecture

206

(1)

11.12 Summary of Work Done in this
Chapter

207

(2)

12 Case Study: One-Dimensional IIR Digital Filters

209

(10)

12.1 Introduction

209

(1)

12.2 The 1-D IIR Digital Filter Algorithm

209

(1)

12.3 The IIR Filter Dependence Graph

209

(7)

12.4 z-Domain Analysis of 1-D IIR Digital Filter Algorithm

216

(3)

13 Case Study: Two- and Three-Dimensional Digital Filters

219

(8)

13.1 Introduction

219

(1)

13.2 Line and Frame Wraparound Problems

219

(2)

13.3 2-D Recursive Filters

221

(2)

13.4 3-D Digital Filters

223

(4)

14 Case Study: Multirate Decimators and Interpolators

227

(18)

14.1 Introduction

227

(1)

14.2 Decimator Structures

227

(1)

14.3 Decimator Dependence Graph

228

(2)

14.4 Decimator Scheduling

230

(1)

14.5 Decimator DAG for s1 = [ 1 0]

231

(2)

14.6 Decimator DAG for s2 = [ 1 -1]

233

(2)

14.7 Decimator DAG for s3 = [ 1 1]

235

(1)

14.8 Polyphase Decimator Implementations

235

(1)

14.9 Interpolator Structures

236

(1)

14.10 Interpolator Dependence Graph

237

(1)

14.11 Interpolator Scheduling

238

(1)

14.12 Interpolator DAG for s1 = [ 1 0]

239

(2)

14.13 Interpolator DAG for s2 = [ 1 -1]

241

(2)

14.14 Interpolator DAG for s3 = [ 1 1]

243

(1)

14.15 Polyphase Interpolator Implementations

243

(2)

15 Case Study: Pattern Matching

245

(10)

15.1 Introduction

245

(1)

15.2 Expressing the Algorithm as a Regular Iterative Algorithm (RIA)

245

(1)

15.3 Obtaining the Algorithm Dependence Graph

246

(1)

15.4 Data Scheduling

247

(1)

15.5 DAG Node Projection

248

(1)

15.6 DESIGN 1: Design Space Exploration When s = [ 1 1]t

249

(3)

15.7 DESIGN 2: Design Space Exploration When s = [ 1 -1]t

252

(1)

15.8 DESIGN 3: Design Space Exploration When s = [ 1 0]t

253

(2)

16 Case Study: Motion Estimation for Video Compression

255

(12)

16.1 Introduction

255

(1)

16.2 FBMAs

256

(1)

16.3 Data Buffering Requirements

257

(1)

16.4 Formulation of the FBMA

258

(1)

16.5 Hierarchical Formulation of Motion Estimation

259

(2)

16.6 Hardware Design of the Hierarchy Blocks

261

(6)

17 Case Study: Multiplication over GF(2m)

267

(12)

17.1 Introduction

267

(1)

17.2 The Multiplication Algorithm in GF(2m)

268

(2)

17.3 Expressing Field Multiplication as an RIA

270

(1)

17.4 Field Multiplication Dependence Graph

270

(1)

17.5 Data Scheduling

271

(2)

17.6 DAG Node Projection

273

(2)

17.7 Design 1: Using d1 = [ 1 0]t

275

(1)

17.8 Design 2: Using d2 = [ 1 1]t

275

(2)

17.9 Design 3: Using d3 = [ 1 -1]t

277

(1)

17.10 Applications of Finite Field Multipliers

277

(2)

18 Case Study: Polynomial Division over GF(2)

279

(14)

18.1 Introduction

279

(1)

18.2 The Polynomial Division Algorithm

279

(2)

18.3 The LFSR Dependence Graph

281

(1)

18.4 Data Scheduling

282

(1)

18.5 DAG Node Projection

283

(1)

18.6 Design 1: Design Space Exploration When s1 = [ 1 -1]

284

(2)

18.7 Design 2: Design Space Exploration When s2 = [ 1 0]

286

(3)

18.8 Design 3: Design Space Exploration When s3 = [ 1 -0.5]

289

(2)

18.9 Comparing the Three Designs

291

(2)

19 The Fast Fourier Transform

293

(12)

19.1 Introduction

293

(2)

19.2 Decimation-in-Time FFT

295

(3)

19.3 Pipeline Radix-2 Decimation-in-Time FFT Processor

298

(1)

19.4 Decimation-in-Frequency FFT

299

(4)

19.5 Pipeline Radix-2 Decimation-in-Frequency FFT Processor

303

(2)

20 Solving Systems of Linear Equations

305

(18)

20.1 Introduction

305

(1)

20.2 Special Matrix Structures

305

(4)

20.3 Forward Substitution (Direct Technique)

309

(3)

20.4 Back Substitution

312

(1)

20.5 Matrix Triangularization Algorithm

312

(5)

20.6 Successive over Relaxation (SOR) (Iterative Technique)

317

(4)

20.7 Problems

321

(2)

21 Solving Partial Differential Equations Using Finite Difference Method

323

(8)

21.1 Introduction

323

(1)

21.2 FDM for 1-D Systems

324

(7)

References

331

(6)

Index

337

Fayez Gebali, PhD, has taught at the University of Victoria since 1984 and has served as the Associate Dean of Engineering for undergraduate programs since 2002. He has contributed to dozens of journals and technical reports and has completed four books. Dr. Gebali's primary research interests include VLSI design, processor array design, algorithms for computer arithmetic, and communication system modeling.

Püsilink: https://www.kriso.ee/db/9780470932025_pe.html

Märksõnad:

E-raamat: Algorithms and Parallel Computing [Wiley Online]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Kirjastuste teemad

Vali ostukorv