Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Fundamentals of Parallel Multicore Architecture

4.60/5 (9 hinnangut Goodreads-ist)

Yan Solihin (Solihin Publishing and Consulting, LLC, Raleigh, North Carolina, USA)

Formaat: 494 pages
Sari: Chapman & Hall/CRC Computational Science
Ilmumisaeg: 18-Nov-2015
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781498753418

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 58,49 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 494 pages
Sari: Chapman & Hall/CRC Computational Science
Ilmumisaeg: 18-Nov-2015
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781498753418

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Although multicore is now a mainstream architecture, there are few textbooks that cover parallel multicore architectures. Filling this gap, Fundamentals of Parallel Multicore Architecture provides all the material for a graduate or senior undergraduate course that focuses on the architecture of multicore processors. The book is also useful as a reference for professionals who deal with programming on multicore or designing multicore chips.

The texts coverage of fundamental topics prepares students to study research papers in the multicore architecture area. The text offers many pedagogical features, including:

Sufficiently short chapters that can be comfortably read over a weekend Introducing each concept by first describing the problem and building intuition that leads to the need for the concept "Did you know?" boxes that present mini case studies, alternative points of view, examples, and other interesting facts or discussion items Thought-provoking interviews with experts who share their perspectives on multicore architectures in the past, present, and future Online programming assignments and solutions that enhance students understanding

The first several chapters address programming issues in shared memory multiprocessors, such as the programming model and techniques to parallelize regular and irregular applications. The core of the book covers the architectures for shared memory multiprocessors. The final chapter contains interviews with experts in parallel multicore architecture.

Arvustused

"This text provides a lucid and comprehensive treatment of hardware/software foundations of parallel architectures by a leading expert in the area." Rajeev Balasubramonian, University of Utah

"This book does an excellent job covering parallel multicore architectures and their programming models. It covers these topics in the crucial context of advanced memory hierarchy designs. The text is accessible to senior undergraduate students and graduate students in computer science and computer engineering. a self-contained reference for the target audience; the text is comprehensive and strikes a good balance between the principles and in-depth details of modern multicore architecture designs." Robert van Engelen, Florida State University

"The author first discusses the basic hardware and history of multicore architectures, then discusses the basic ideas of how to analyze code to determine parallelism (and the basic concepts of different parallelism techniques), and then discusses the specifics of how to write shared memory parallel programs, and so on. In this way, the topics become increasingly focused on the desired content of the book, that of the details in constructing multicore architectures. This book is well organized and thought out, and I imagine that it [ will be] well received by students." Daniel R. Reynolds, Southern Methodist University

" this book would be appealing to students and practitioners who would like to get an in-depth understanding of multicore architecture and designing efficient programs for these architectures." Purushotham Bangalore, University of Alabama at Birmingham

Preface

Acknowledgement

xix

About the Author

xxi

List of Abbreviations

xxiii

1 Perspectives on Multicore Architectures

(26)

1.1 The Origin of the Multicore Architecture

(10)

1.1.1 Power Consumption Issue

(6)

1.2 Perspectives on Parallel Computers

(6)

1.2.1 Flynn's Taxonomy of Parallel Computers

(2)

1.2.2 Classes of MIMD Parallel Computers

(1)

1.3 Future Multicore Architectures

(6)

1.4 Exercises

(3)

2 Perspectives on Parallel Programming

(24)

2.1 Limits on Parallel Program Performance

(3)

2.2 Parallel Programming Models

(18)

2.2.1 Comparing Shared Memory and Message Passing Models

(2)

2.2.2 A Simple Example

(4)

2.2.3 Other Programming Models

(10)

2.3 Exercises

(2)

3 Shared Memory Parallel Programming

(52)

3.1 Steps in Parallel Programming

(1)

3.2 Dependence Analysis

(5)

3.2.1 Loop-Level Dependence Analysis

(1)

3.2.2 Iteration-Space Traversal Graph and Loop-Carried Dependence Graph

(2)

3.3 Identifying Parallel Tasks in Loop Structures

(9)

3.3.1 Parallelism between Loop Iterations and DOALL Parallelism

(3)

3.3.2 DOACROSS: Synchronized Parallelism between Loop Iterations

(2)

3.3.3 Parallelism Across Statements in a Loop

(2)

3.3.4 DOPIPE: Parallelism Across Statements of a Loop

(2)

3.4 Identifying Parallelism at Other Levels

(2)

3.5 Identifying Parallelism through Algorithm Knowledge

(3)

3.6 Determining the Scope of Variables

(5)

3.6.1 Privatization

(2)

3.6.2 Reduction Variables and Operation

(1)

3.6.3 Summary of Criteria

(1)

3.7 Synchronization

(1)

3.8 Assigning Tasks to Threads

(5)

3.9 Mapping Threads to Processors

(4)

3.10 A Brief Introduction to OpenMP

(7)

3.11 Exercises

(9)

4 Parallel Programming for Linked Data Structures

103

(30)

4.1 Parallelization Challenges in LDS

104

(1)

4.1.1 Loop-Level Parallelization is Insufficient

104

(1)

4.2 Approaches to Parallelization of LDS

105

(10)

4.2.1 Parallelizing Computation vs. Traversal

105

(2)

4.2.2 Parallelizing Operations on the Data Structure

107

(8)

4.3 Parallelization Techniques for Linked Lists

115

(11)

4.3.1 Parallelization among Readers

115

(2)

4.3.2 Parallelism among LDS Traversals

117

(4)

4.3.3 Fine-Grain Lock Approach

121

(5)

4.4 The Role of Transactional Memory

126

(2)

4.5 Exercises

128

(5)

5 Introduction to Memory Hierarchy Organization

133

(58)

5.1 Motivation for Memory Hierarchy

134

(1)

5.2 Basic Architectures of a Cache

135

(20)

5.2.1 Placement Policy

136

(5)

5.2.2 Replacement Policy

141

(3)

5.2.3 Write Policy

144

(2)

5.2.4 Inclusion Policy on Multi-Level Caches

146

(4)

5.2.5 Unified/Split/Banked Cache Organization and Cache Pipelining

150

(2)

5.2.6 Cache Addressing and Translation Lookaside Buffer

152

(3)

5.2.7 Non-Blocking Cache

155

(1)

5.3 Cache Performance

155

(7)

5.3.1 The Power Law of Cache Misses

158

(1)

5.3.2 Stack Distance Profile

159

(1)

5.3.3 Cache Performance Metrics

160

(2)

5.4 Prefetching

162

(4)

5.4.1 Stride and Sequential Prefetching

164

(1)

5.4.2 Prefetching in Multiprocessor Systems

165

(1)

5.5 Cache Design in Multicore Architecture

166

(1)

5.6 Physical Cache Organization

167

(4)

5.6.1 United Cache Organization

167

(1)

5.6.2 Distributed Cache Organization

168

(1)

5.6.3 Hybrid United+Distributed Cache Organization

169

(2)

5.7 Logical Cache Organization

171

(9)

5.7.1 Hashing Function

175

(2)

5.7.2 Improving Distance Locality of Shared Cache

177

(1)

5.7.3 Capacity Sharing in the Private Cache Organization

178

(2)

5.8 Case Studies

180

(6)

5.8.1 IBM Power? Memory Hierarchy

180

(4)

5.8.2 Comparing AMD Shanghai and Intel Barcelona's Memory Hierarchy

184

(2)

5.9 Exercises

186

(5)

6 Introduction to Shared Memory Multiprocessors

191

(14)

6.1 The Cache Coherence Problem

192

(3)

6.2 Memory Consistency Problem

195

(2)

6.3 Synchronization Problem

197

(5)

6.4 Exercises

202

(3)

7 Basic Cache Coherence Issues

205

(60)

7.1 Overview

206

(6)

7.1.1 Basic Support for Bus-Based Multiprocessors

209

(3)

7.2 Cache Coherence in Bus-Based Multiprocessors

212

(23)

7.2.1 Coherence Protocol for Write Through Caches

212

(2)

7.2.2 MSI Protocol with Write Back Caches

214

(6)

7.2.3 MESI Protocol with Write Back Caches

220

(6)

7.2.4 MOESI Protocol with Write Back Caches

226

(5)

7.2.5 Update-Based Protocol with Write Back Caches

231

(4)

7.3 Impact of Cache Design on Cache Coherence Performance

235

(1)

7.4 Performance and Other Practical Issues

236

(4)

7.4.1 Prefetching and Coherence Misses

236

(1)

7.4.2 Multi-Level Caches

237

(2)

7.4.3 Snoop Filtering

239

(1)

7.5 Broadcast Protocol with Point-to-Point Interconnect

240

(17)

7.6 Exercises

257

(8)

8 Hardware Support for Synchronization

265

(36)

8.1 Lock Implementations

266

(16)

8.1.1 Evaluating the Performance of Lock Implementations

266

(1)

8.1.2 The Need for Atomic Instructions

266

(3)

8.1.3 Test and Set Lock

269

(2)

8.1.4 Test and Test and Set Lock

271

(1)

8.1.5 Load Linked and Store Conditional Lock

272

(4)

8.1.6 Ticket Lock

276

(2)

8.1.7 Array-Based Queuing Lock

278

(2)

8.1.8 Qualitative Comparison of Lock Implementations

280

(2)

8.2 Bather Implementations

282

(5)

8.2.1 Sense-Reversing Centralized Barrier

282

(3)

8.2.2 Combining Tree Barrier

285

(1)

8.2.3 Hardware Barrier Implementation

285

(2)

8.3 Transactional Memory

287

(7)

8.4 Exercises

294

(7)

9 Memory Consistency Models

301

(30)

9.1 Programmers' Intuition

302

(4)

9.2 Architecture Mechanisms for Ensuring Sequential Consistency

306

(4)

9.2.1 Basic SC Implementation on a Bus-Based Multiprocessor

306

(2)

9.2.2 Techniques to Improve SC Performance

308

(2)

9.3 Relaxed Consistency Models

310

(10)

9.3.1 Safety Net

311

(1)

9.3.2 Processor Consistency

311

(2)

9.3.3 Weak Ordering

313

(2)

9.3.4 Release Consistency

315

(4)

9.3.5 Lazy Release Consistency

319

(1)

9.4 Synchronization in Different Memory Consistency Models

320

(4)

9.5 Exercises

324

(7)

10 Advanced Cache Coherence Issues

331

(40)

10.1 Directory Coherence Protocols

332

(1)

10.2 Overview of Directory Coherence Protocol

332

(7)

10.2.1 Directory Format and Location

333

(6)

10.3 Basic Directory Cache Coherence Protocol

339

(4)

10.4 Implementation Correctness and Performance

343

(13)

10.4.1 Handling Races Due to Out-of-Sync Directory State

343

(3)

10.4.2 Handling Races Due to Non-Instantaneous Processing of a Request

346

(7)

10.4.3 Write Propagation and Transaction Serialization

353

(2)

10.4.4 Synchronization Support

355

(1)

10.4.5 Memory Consistency Models

356

(1)

10.5 Contemporary Design Issues

356

(11)

10.5.1 Dealing with Imprecise Directory Information

356

(5)

10.5.2 Granularity of Coherence

361

(2)

10.5.3 System Partitioning

363

(2)

10.5.4 Accelerating Thread Migration

365

(2)

10.6 Exercises

367

(4)

11 Interconnection Network Architecture

371

(38)

11.1 Link, Channel, and Latency

372

(4)

11.2 Network Topology

376

(5)

11.3 Routing Policies and Algorithms

381

(12)

11.4 Router Architecture

393

(4)

11.5 Case Study: Alpha 21364 Network Architecture

397

(2)

11.6 Multicore Design Issues

399

(4)

11.6.1 Contemporary Design Issues

401

(2)

11.7 Exercises

403

(6)

12 MATT Architecture

409

(18)

12.1 SIMT Programming Model

410

(2)

12.2 Mapping SIMT Workloads to SIMT Cores

412

(1)

12.3 STMT Core Architecture

413

(10)

12.3.1 Scalar ISA

413

(1)

12.3.2 SLMDization/Vectorization: Warp Formation

413

(2)

12.3.3 Fine-Grain Multithreading (Warp-Level Parallelism)

415

(1)

12.3.4 Microarchitecture

416

(1)

12.3.5 Pipeline Execution

417

(1)

12.3.6 Control Flow Processing

418

(1)

12.3.7 Memory Systems

419

(4)

12.4 Exercises

423

(4)

13 Ask the Experts

427

(26)

Bibliography

453

(6)

Index

459

Yan Solihin is a professor of electrical and computer engineering at North Carolina State University, where he founded and leads the Architecture Research for Performance, Reliability, and Security (ARPERS) group. Dr. Solihin has been a recipient of the IBM Faculty Partnership Award, NSF Faculty Early Career Award, and AT&T Leadership Award. He is listed in the HPCA Hall of Fame and is a senior member of the IEEE. His research interests include computer architecture, computer system modeling methods, and image processing.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97814987534186e.html

Märksõnad:

Multiprocessors

E-raamat: Fundamentals of Parallel Multicore Architecture

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv