Klienditugi: 7440010 (E-R 10-18)

Introduction to Parallel Programming 2nd edition [Pehme köide]

3.85/5 (63 hinnangut Goodreads-ist)

Matthew Malensek (Assistant Professor, Department of Computer Science, University of San Francisco, CA, USA), Peter Pacheco (University of San Francisco, USA)

Formaat: Paperback / softback, 496 pages, kõrgus x laius: 235x191 mm, kaal: 770 g, 65 illustrations; Illustrations
Ilmumisaeg: 15-Nov-2021
Kirjastus: Morgan Kaufmann Publishers In
ISBN-10: 0128046058
ISBN-13: 9780128046050

Teised raamatud teemal:

Algorithms & data structures
Computer architecture & logic design
Programming & scripting languages: general - (Hetkel poes: 1 nimetust)
Computer programming / software development - (Hetkel poes: 4 nimetust)
Discrete mathematics
Computer science - (Hetkel poes: 7 nimetust)
Distributed systems

Pehme köide
Hind: 87,84 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 496 pages, kõrgus x laius: 235x191 mm, kaal: 770 g, 65 illustrations; Illustrations
Ilmumisaeg: 15-Nov-2021
Kirjastus: Morgan Kaufmann Publishers In
ISBN-10: 0128046058
ISBN-13: 9780128046050

Teised raamatud teemal:

Algorithms & data structures
Computer architecture & logic design
Programming & scripting languages: general - (Hetkel poes: 1 nimetust)
Computer programming / software development - (Hetkel poes: 4 nimetust)
Discrete mathematics
Computer science - (Hetkel poes: 7 nimetust)
Distributed systems

Püsilink: https://www.kriso.ee/db/9780128046050.html

Märksõnad:

Parallel programming Computer science

An Introduction to Parallel Programming, Second Edition presents a tried-and-true tutorial approach that shows students how to develop effective parallel programs with MPI, Pthreads and OpenMP.

As the first undergraduate text to directly address compiling and running parallel programs on multi-core and cluster architecture, this second edition carries forward its clear explanations for designing, debugging and evaluating the performance of distributed and shared-memory programs while adding coverage of accelerators via new content on GPU programming and heterogeneous programming. New and improved user-friendly exercises teach students how to compile, run and modify example programs.

Takes a tutorial approach, starting with small programming examples and building progressively to more challenging examples
Explains how to develop parallel programs using MPI, Pthreads and OpenMP programming models
A robust package of online ancillaries for instructors and students includes lecture slides, solutions manual, downloadable source code, and an image bank

New to this edition:

New chapters on GPU programming and heterogeneous programming
New examples and exercises related to parallel algorithms

Muu info

Leading undergraduate text in parallel programming, covering OpenMP, MPI and Pthreads, three of the most widely used parallel programming environments

Preface

Chapter 1 Why parallel computing

(16)

1.1 Why we need ever-increasing performance

(1)

1.2 Why we're building parallel systems

(1)

1.3 Why we need to write parallel programs

(3)

1.4 How do we write parallel programs?

(2)

1.5 What we'll be doing

(2)

1.6 Concurrent, parallel, distributed

(1)

1.7 The rest of the book

(1)

1.8 A word of warning

(1)

1.9 Typographical conventions

(1)

1.10 Summary

(2)

1.11 Exercises

(3)

Chapter 2 Parallel hardware and parallel software

(72)

2.1 Some background

(3)

2.1.1 The von Neumann architecture

(2)

2.1.2 Processes, multitasking, and threads

(1)

2.2 Modifications to the von Neumann model

(11)

2.2.1 The basics of caching

(2)

2.2.2 Cache mappings

(1)

2.2.3 Caches and programs: an example

(1)

2.2.4 Virtual memory

(2)

2.2.5 Instruction-level parallelism

(3)

2.2.6 Hardware multithreading

(1)

2.3 Parallel hardware

(18)

2.3.1 Classifications of parallel computers

(1)

2.3.2 SIMD systems

(3)

2.3.3 MIMD systems

(3)

2.3.4 Interconnection networks

(8)

2.3.5 Cache coherence

(4)

2.3.6 Shared-memory vs. distributed-memory

(1)

2.4 Parallel software

(11)

2.4.1 Caveats

(1)

2.4.2 Coordinating the processes/threads

(1)

2.4.3 Shared-memory

(4)

2.4.4 Distributed-memory

(3)

2.4.5 GPU programming

(2)

2.4.6 Programming hybrid systems

(1)

2.5 Input and output

(1)

2.5.1 MIMD systems

(1)

2.5.2 GPUs

(1)

2.6 Performance

(10)

2.6.1 Speedup and efficiency in MIMD systems

(4)

2.6.2 Amdahl's law

(1)

2.6.3 Scalability in MIMD systems

(1)

2.6.4 Taking timings of MIMD programs

(3)

2.6.5 GPU performance

(1)

2.7 Parallel program design

(4)

2.7.1 An example

(4)

2.8 Writing and running parallel programs

(2)

2.9 Assumptions

(1)

2.10 Summary

(6)

2.10.1 Serial systems

(1)

2.10.2 Parallel hardware

(2)

2.10.3 Parallel software

(1)

2.10.4 Input and output

(1)

2.10.5 Performance

(1)

2.10.6 Parallel program design

(1)

2.10.7 Assumptions

(1)

2.11 Exercises

(5)

Chapter 3 Distributed memory programming with MPI

(70)

3.1 Getting started

(10)

3.1.1 Compilation and execution

(1)

3.1.2 MPI programs

(1)

3.1.3 MPI_Init and MPI_Finalize

(1)

3.1.4 Communicators, MPI_Comm_size, and MPI_Comm_rank

(1)

3.1.5 SPMD programs

(1)

3.1.6 Communication

(1)

3.1.7 MPI_Send

(2)

3.1.8 MPI_Recv

(1)

3.1.9 Message matching

(1)

3.1.10 The status_p argument

(1)

3.1.11 Semantics of MPI_Send and MPI_Recv

(1)

3.1.12 Some potential pitfalls

100

(1)

3.2 The trapezoidal rule in MPI

100

(4)

3.2.1 The trapezoidal rule

101

(1)

3.2.2 Parallelizing the trapezoidal rule

102

(2)

3.3 Dealing with I/O

104

(4)

3.3.1 Output

105

(2)

3.3.2 Input

107

(1)

3.4 Collective communication

108

(15)

3.4.1 Tree-structured communication

108

(2)

3.4.2 MPI_Reduce

110

(2)

3.4.3 Collective vs. point-to-point communications

112

(1)

3.4.4 MPI_All reduce

113

(1)

3.4.5 Broadcast

113

(3)

3.4.6 Data distributions

116

(1)

3.4.7 Scatter

117

(2)

3.4.8 Gather

119

(2)

3.4.9 Allgather

121

(2)

3.5 MPI-derived datatypes

123

(4)

3.6 Performance evaluation of MPI programs

127

(8)

3.6.1 Taking timings

127

(3)

3.6.2 Results

130

(3)

3.6.3 Speedup and efficiency

133

(1)

3.6.4 Scalability

134

(1)

3.7 A parallel sorting algorithm

135

(9)

3.7.1 Some simple serial sorting algorithms

135

(2)

3.7.2 Parallel odd-even transposition sort

137

(3)

3.7.3 Safety in MPI programs

140

(3)

3.7.4 Final details of parallel odd-even sort

143

(1)

3.8 Summary

144

(4)

3.9 Exercises

148

(7)

3.10 Programming assignments

155

(4)

Chapter 4 Shared-memory programming with Pthreads

159

(62)

4.1 Processes, threads, and Pthreads

160

(1)

4.2 Hello, world

161

(8)

4.2.1 Execution

161

(2)

4.2.2 Preliminaries

163

(1)

4.2.3 Starting the threads

164

(2)

4.2.4 Running the threads

166

(1)

4.2.5 Stopping the threads

167

(1)

4.2.6 Error checking

168

(1)

4.2.7 Other approaches to thread startup

168

(1)

4.3 Matrix-vector multiplication

169

(2)

4.4 Critical sections

171

(4)

4.5 Busy-waiting

175

(3)

4.6 Mutexes

178

(4)

4.7 Producer--consumer synchronization and semaphores

182

(5)

4.8 Barriers and condition variables

187

(6)

4.8.1 Busy-waiting and a mutex

188

(1)

4.8.2 Semaphores

189

(1)

4.8.3 Condition variables

190

(3)

4.8.4 Pthreads barriers

193

(1)

4.9 Read-write locks

193

(9)

4.9.1 Sorted linked list functions

193

(1)

4.9.2 A multithreaded linked list

194

(4)

4.9.3 Pthreads read-write locks

198

(2)

4.9.4 Performance of the various implementations

200

(1)

4.9.5 Implementing read-write locks

201

(1)

4.10 Caches, cache-coherence, and false sharing

202

(5)

4.11 Thread-safety

207

(3)

4.11.1 Incorrect programs can produce correct output

210

(1)

4.12 Summary

210

(2)

4.13 Exercises

212

(7)

4.14 Programming assignments

219

(2)

Chapter 5 Shared-memory programming with OpenMP

221

(70)

5.1 Getting started

222

(6)

5.1.1 Compiling and running OpenMP programs

223

(1)

5.1.2 The program

224

(3)

5.1.3 Error checking

227

(1)

5.2 The trapezoidal rule

228

(5)

5.2.1 A first OpenMP version

229

(4)

5.3 Scope of variables

233

(1)

5.4 The reduction clause

234

(3)

5.5 The parallel for directive

237

(8)

5.5.1 Caveats

238

(1)

5.5.2 Data dependences

239

(2)

5.5.3 Finding loop-carried dependences

241

(1)

5.5.4 Estimating it

241

(3)

5.5.5 More on scope

244

(1)

5.6 More about loops in OpenMP: sorting

245

(4)

5.6.1 Bubble sort

245

(1)

5.6.2 Odd-even transposition sort

246

(3)

5.7 Scheduling loops

249

(7)

5.7.1 The schedule clause

250

(2)

5.7.2 The static schedule type

252

(1)

5.7.3 The dynamic and guided schedule types

252

(1)

5.7.4 The runtime schedule type

253

(2)

5.7.5 Which schedule?

255

(1)

5.8 Producers and consumers

256

(9)

5.8.1 Queues

256

(1)

5.8.2 Message-passing

256

(1)

5.8.3 Sending messages

257

(1)

5.8.4 Receiving messages

257

(1)

5.8.5 Termination detection

258

(1)

5.8.6 Startup

259

(1)

5.8.7 The atomic directive

259

(1)

5.8.8 Critical sections and locks

260

(2)

5.8.9 Using locks in the message-passing program

262

(1)

5.8.10 Critical directives, atomic directives, or locks?

263

(1)

5.8.11 Some caveats

264

(1)

5.9 Caches, cache coherence, and false sharing

265

(5)

5.10 Tasking

270

(4)

5.11 Thread-safety

274

(3)

5.11.1 Incorrect programs can produce correct output

276

(1)

5.12 Summary

277

(4)

5.13 Exercises

281

(5)

5.14 Programming assignments

286

(5)

Chapter 6 GPU programming with CUDA

291

(70)

6.1 GPUs and GPGPU

291

(1)

6.2 GPU architectures

292

(1)

6.3 Heterogeneous computing

293

(2)

6.4 CUDA hello

295

(3)

6.4.1 The source code

296

(1)

6.4.2 Compiling and running the program

296

(2)

6.5 A closer look

298

(1)

6.6 Threads, blocks, and grids

299

(2)

6.7 Nvidia compute capabilities and device architectures

301

(1)

6.8 Vector addition

302

(10)

6.8.1 The kernel

303

(2)

6.8.2 Get_args

305

(1)

6.8.3 Allocate_vectors and managed memory

305

(2)

6.8.4 Other functions called from main

307

(2)

6.8.5 Explicit memory transfers

309

(3)

6.9 Returning results from CUDA kernels

312

(2)

6.10 CUDA trapezoidal rule I

314

(6)

6.10.1 The trapezoidal rule

314

(1)

6.10.2 A CUDA implementation

315

(1)

6.10.3 Initialization, return value, and final update

316

(2)

6.10.4 Using the correct threads

318

(1)

6.10.5 Updating the return value and atomicAdd

318

(1)

6.10.6 Performance of the CUDA trapezoidal rule

319

(1)

6.11 CUDA trapezoidal rule II: improving performance

320

(8)

6.11.1 Tree-structured communication

320

(2)

6.11.2 Local variables, registers, shared and global memory

322

(2)

6.11.3 Warps and warp shuffles

324

(1)

6.11.4 Implementing tree-structured global sum with a warp shuffle

325

(1)

6.11.5 Shared memory and an alternative to the warp shuffle

326

(2)

6.12 Implementation of trapezoidal rule with warp Size thread blocks

328

(3)

6.12.1 Host code

329

(1)

6.12.2 Kernel with warp shuffle

329

(1)

6.12.3 Kernel with shared memory

329

(1)

6.12.4 Performance

330

(1)

6.13 CUDA trapezoidal rule III: blocks with more than one warp

331

(7)

6.13.1 _Syncthreads

331

(2)

6.13.2 More shared memory

333

(1)

6.13.3 Shared memory warp sums

333

(1)

6.13.4 Shared memory banks

334

(2)

6.13.5 Finishing up

336

(1)

6.13.6 Performance

336

(2)

6.14 Bitonic sort

338

(11)

6.14.1 Serial bitonic sort

338

(4)

6.14.2 Butterflies and binary representations

342

(3)

6.14.3 Parallel bitonic sort I

345

(2)

6.14.4 Parallel bitonic sort II

347

(1)

6.14.5 Performance of CUDA bitonic sort

348

(1)

6.15 Summary

349

(6)

6.16 Exercises

355

(3)

6.17 Programming assignments

358

(3)

Chapter 7 Parallel program development

361

(94)

7.1 Two n-body solvers

361

(37)

7.1.1 The problem

361

(2)

7.1.2 Two serial programs

363

(5)

7.1.3 Parallelizing the n-body solvers

368

(3)

7.1.4 A word about I/O

371

(1)

7.1.5 Parallelizing the basic solver using OpenMP

371

(4)

7.1.6 Parallelizing the reduced solver using OpenMP

375

(4)

7.1.7 Evaluating the OpenMP codes

379

(1)

7.1.8 Parallelizing the solvers using Pthreads

380

(1)

7.1.9 Parallelizing the basic solver using MPI

381

(2)

7.1.10 Parallelizing the reduced solver using MPI

383

(6)

7.1.11 Performance of the MPI solvers

389

(1)

7.1.12 Parallelizing the basic solver using CUDA

390

(3)

7.1.13 A note on cooperative groups in CUDA

393

(1)

7.1.14 Performance of the basic CUDA n-body solver

393

(1)

7.1.15 Improving the performance of the CUDA n-body solver

394

(1)

7.1.16 Using shared memory in the n-body solver

395

(3)

7.2 Sample sort

398

(41)

7.2.1 Sample sort and bucket sort

398

(2)

7.2.2 Choosing the sample

400

(1)

7.2.3 A simple implementation of the Map function

401

(1)

7.2.4 An alternative implementation of Map

402

(5)

7.2.5 Parallelizing sample sort

407

(3)

7.2.6 Implementing sample sort with OpenMP

410

(5)

7.2.7 Implementing sample sort with Pthreads

415

(3)

7.2.8 Implementing sample sort with MPI

418

(11)

7.2.9 Implementing sample sort with CUDA

429

(10)

7.3 A word of caution

439

(1)

7.4 Which API?

439

(1)

7.5 Summary

440

(4)

7.5.1 MPI

442

(2)

7.6 Exercises

444

(7)

7.7 Programming assignments

451

(4)

Chapter 8 Where to go from here

455

(4)

Bibliography

459

(4)

Index

463

Peter Pacheco received a PhD in mathematics from Florida State University. After completing graduate school, he became one of the first professors in UCLAs Program in Computing,” which teaches basic computer science to students at the College of Letters and Sciences there. Since leaving UCLA, he has been on the faculty of the University of San Francisco. At USF Peter has served as chair of the computer science department and is currently chair of the mathematics department.His research is in parallel scientific computing. He has worked on the development of parallel software for circuit simulation, speech recognition, and the simulation of large networks of biologically accurate neurons. Peter has been teaching parallel computing at both the undergraduate and graduate levels for nearly twenty years. He is the author of Parallel Programming with MPI, published by Morgan Kaufmann Publishers. Matthew Malensek is an Assistant Professor in the Department of Computer Science at the University of San Francisco. His research interests are centered around big data, parallel/distributed systems, and cloud computing. This includes systems approaches for processing and managing data at scale in a variety of domains, including fog computing and Internet of Things (IoT) devices.

Introduction to Parallel Programming 2nd edition [Pehme köide]

Muu info

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv