Muutke küpsiste eelistusi

E-raamat: Fundamentals of Parallel Multicore Architecture

(Solihin Publishing and Consulting, LLC, Raleigh, North Carolina, USA)
  • Formaat - EPUB+DRM
  • Hind: 58,49 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Although multicore is now a mainstream architecture, there are few textbooks that cover parallel multicore architectures. Filling this gap, Fundamentals of Parallel Multicore Architecture provides all the material for a graduate or senior undergraduate course that focuses on the architecture of multicore processors. The book is also useful as a reference for professionals who deal with programming on multicore or designing multicore chips.

The texts coverage of fundamental topics prepares students to study research papers in the multicore architecture area. The text offers many pedagogical features, including:





Sufficiently short chapters that can be comfortably read over a weekend Introducing each concept by first describing the problem and building intuition that leads to the need for the concept "Did you know?" boxes that present mini case studies, alternative points of view, examples, and other interesting facts or discussion items Thought-provoking interviews with experts who share their perspectives on multicore architectures in the past, present, and future Online programming assignments and solutions that enhance students understanding

The first several chapters address programming issues in shared memory multiprocessors, such as the programming model and techniques to parallelize regular and irregular applications. The core of the book covers the architectures for shared memory multiprocessors. The final chapter contains interviews with experts in parallel multicore architecture.

Arvustused

"This text provides a lucid and comprehensive treatment of hardware/software foundations of parallel architectures by a leading expert in the area." Rajeev Balasubramonian, University of Utah

"This book does an excellent job covering parallel multicore architectures and their programming models. It covers these topics in the crucial context of advanced memory hierarchy designs. The text is accessible to senior undergraduate students and graduate students in computer science and computer engineering. a self-contained reference for the target audience; the text is comprehensive and strikes a good balance between the principles and in-depth details of modern multicore architecture designs." Robert van Engelen, Florida State University

"The author first discusses the basic hardware and history of multicore architectures, then discusses the basic ideas of how to analyze code to determine parallelism (and the basic concepts of different parallelism techniques), and then discusses the specifics of how to write shared memory parallel programs, and so on. In this way, the topics become increasingly focused on the desired content of the book, that of the details in constructing multicore architectures. This book is well organized and thought out, and I imagine that it [ will be] well received by students." Daniel R. Reynolds, Southern Methodist University

" this book would be appealing to students and practitioners who would like to get an in-depth understanding of multicore architecture and designing efficient programs for these architectures." Purushotham Bangalore, University of Alabama at Birmingham

Preface xv
Acknowledgement xix
About the Author xxi
List of Abbreviations xxiii
1 Perspectives on Multicore Architectures 1(26)
1.1 The Origin of the Multicore Architecture
2(10)
1.1.1 Power Consumption Issue
6(6)
1.2 Perspectives on Parallel Computers
12(6)
1.2.1 Flynn's Taxonomy of Parallel Computers
15(2)
1.2.2 Classes of MIMD Parallel Computers
17(1)
1.3 Future Multicore Architectures
18(6)
1.4 Exercises
24(3)
2 Perspectives on Parallel Programming 27(24)
2.1 Limits on Parallel Program Performance
28(3)
2.2 Parallel Programming Models
31(18)
2.2.1 Comparing Shared Memory and Message Passing Models
33(2)
2.2.2 A Simple Example
35(4)
2.2.3 Other Programming Models
39(10)
2.3 Exercises
49(2)
3 Shared Memory Parallel Programming 51(52)
3.1 Steps in Parallel Programming
52(1)
3.2 Dependence Analysis
53(5)
3.2.1 Loop-Level Dependence Analysis
55(1)
3.2.2 Iteration-Space Traversal Graph and Loop-Carried Dependence Graph
56(2)
3.3 Identifying Parallel Tasks in Loop Structures
58(9)
3.3.1 Parallelism between Loop Iterations and DOALL Parallelism
58(3)
3.3.2 DOACROSS: Synchronized Parallelism between Loop Iterations
61(2)
3.3.3 Parallelism Across Statements in a Loop
63(2)
3.3.4 DOPIPE: Parallelism Across Statements of a Loop
65(2)
3.4 Identifying Parallelism at Other Levels
67(2)
3.5 Identifying Parallelism through Algorithm Knowledge
69(3)
3.6 Determining the Scope of Variables
72(5)
3.6.1 Privatization
73(2)
3.6.2 Reduction Variables and Operation
75(1)
3.6.3 Summary of Criteria
76(1)
3.7 Synchronization
77(1)
3.8 Assigning Tasks to Threads
78(5)
3.9 Mapping Threads to Processors
83(4)
3.10 A Brief Introduction to OpenMP
87(7)
3.11 Exercises
94(9)
4 Parallel Programming for Linked Data Structures 103(30)
4.1 Parallelization Challenges in LDS
104(1)
4.1.1 Loop-Level Parallelization is Insufficient
104(1)
4.2 Approaches to Parallelization of LDS
105(10)
4.2.1 Parallelizing Computation vs. Traversal
105(2)
4.2.2 Parallelizing Operations on the Data Structure
107(8)
4.3 Parallelization Techniques for Linked Lists
115(11)
4.3.1 Parallelization among Readers
115(2)
4.3.2 Parallelism among LDS Traversals
117(4)
4.3.3 Fine-Grain Lock Approach
121(5)
4.4 The Role of Transactional Memory
126(2)
4.5 Exercises
128(5)
5 Introduction to Memory Hierarchy Organization 133(58)
5.1 Motivation for Memory Hierarchy
134(1)
5.2 Basic Architectures of a Cache
135(20)
5.2.1 Placement Policy
136(5)
5.2.2 Replacement Policy
141(3)
5.2.3 Write Policy
144(2)
5.2.4 Inclusion Policy on Multi-Level Caches
146(4)
5.2.5 Unified/Split/Banked Cache Organization and Cache Pipelining
150(2)
5.2.6 Cache Addressing and Translation Lookaside Buffer
152(3)
5.2.7 Non-Blocking Cache
155(1)
5.3 Cache Performance
155(7)
5.3.1 The Power Law of Cache Misses
158(1)
5.3.2 Stack Distance Profile
159(1)
5.3.3 Cache Performance Metrics
160(2)
5.4 Prefetching
162(4)
5.4.1 Stride and Sequential Prefetching
164(1)
5.4.2 Prefetching in Multiprocessor Systems
165(1)
5.5 Cache Design in Multicore Architecture
166(1)
5.6 Physical Cache Organization
167(4)
5.6.1 United Cache Organization
167(1)
5.6.2 Distributed Cache Organization
168(1)
5.6.3 Hybrid United+Distributed Cache Organization
169(2)
5.7 Logical Cache Organization
171(9)
5.7.1 Hashing Function
175(2)
5.7.2 Improving Distance Locality of Shared Cache
177(1)
5.7.3 Capacity Sharing in the Private Cache Organization
178(2)
5.8 Case Studies
180(6)
5.8.1 IBM Power? Memory Hierarchy
180(4)
5.8.2 Comparing AMD Shanghai and Intel Barcelona's Memory Hierarchy
184(2)
5.9 Exercises
186(5)
6 Introduction to Shared Memory Multiprocessors 191(14)
6.1 The Cache Coherence Problem
192(3)
6.2 Memory Consistency Problem
195(2)
6.3 Synchronization Problem
197(5)
6.4 Exercises
202(3)
7 Basic Cache Coherence Issues 205(60)
7.1 Overview
206(6)
7.1.1 Basic Support for Bus-Based Multiprocessors
209(3)
7.2 Cache Coherence in Bus-Based Multiprocessors
212(23)
7.2.1 Coherence Protocol for Write Through Caches
212(2)
7.2.2 MSI Protocol with Write Back Caches
214(6)
7.2.3 MESI Protocol with Write Back Caches
220(6)
7.2.4 MOESI Protocol with Write Back Caches
226(5)
7.2.5 Update-Based Protocol with Write Back Caches
231(4)
7.3 Impact of Cache Design on Cache Coherence Performance
235(1)
7.4 Performance and Other Practical Issues
236(4)
7.4.1 Prefetching and Coherence Misses
236(1)
7.4.2 Multi-Level Caches
237(2)
7.4.3 Snoop Filtering
239(1)
7.5 Broadcast Protocol with Point-to-Point Interconnect
240(17)
7.6 Exercises
257(8)
8 Hardware Support for Synchronization 265(36)
8.1 Lock Implementations
266(16)
8.1.1 Evaluating the Performance of Lock Implementations
266(1)
8.1.2 The Need for Atomic Instructions
266(3)
8.1.3 Test and Set Lock
269(2)
8.1.4 Test and Test and Set Lock
271(1)
8.1.5 Load Linked and Store Conditional Lock
272(4)
8.1.6 Ticket Lock
276(2)
8.1.7 Array-Based Queuing Lock
278(2)
8.1.8 Qualitative Comparison of Lock Implementations
280(2)
8.2 Bather Implementations
282(5)
8.2.1 Sense-Reversing Centralized Barrier
282(3)
8.2.2 Combining Tree Barrier
285(1)
8.2.3 Hardware Barrier Implementation
285(2)
8.3 Transactional Memory
287(7)
8.4 Exercises
294(7)
9 Memory Consistency Models 301(30)
9.1 Programmers' Intuition
302(4)
9.2 Architecture Mechanisms for Ensuring Sequential Consistency
306(4)
9.2.1 Basic SC Implementation on a Bus-Based Multiprocessor
306(2)
9.2.2 Techniques to Improve SC Performance
308(2)
9.3 Relaxed Consistency Models
310(10)
9.3.1 Safety Net
311(1)
9.3.2 Processor Consistency
311(2)
9.3.3 Weak Ordering
313(2)
9.3.4 Release Consistency
315(4)
9.3.5 Lazy Release Consistency
319(1)
9.4 Synchronization in Different Memory Consistency Models
320(4)
9.5 Exercises
324(7)
10 Advanced Cache Coherence Issues 331(40)
10.1 Directory Coherence Protocols
332(1)
10.2 Overview of Directory Coherence Protocol
332(7)
10.2.1 Directory Format and Location
333(6)
10.3 Basic Directory Cache Coherence Protocol
339(4)
10.4 Implementation Correctness and Performance
343(13)
10.4.1 Handling Races Due to Out-of-Sync Directory State
343(3)
10.4.2 Handling Races Due to Non-Instantaneous Processing of a Request
346(7)
10.4.3 Write Propagation and Transaction Serialization
353(2)
10.4.4 Synchronization Support
355(1)
10.4.5 Memory Consistency Models
356(1)
10.5 Contemporary Design Issues
356(11)
10.5.1 Dealing with Imprecise Directory Information
356(5)
10.5.2 Granularity of Coherence
361(2)
10.5.3 System Partitioning
363(2)
10.5.4 Accelerating Thread Migration
365(2)
10.6 Exercises
367(4)
11 Interconnection Network Architecture 371(38)
11.1 Link, Channel, and Latency
372(4)
11.2 Network Topology
376(5)
11.3 Routing Policies and Algorithms
381(12)
11.4 Router Architecture
393(4)
11.5 Case Study: Alpha 21364 Network Architecture
397(2)
11.6 Multicore Design Issues
399(4)
11.6.1 Contemporary Design Issues
401(2)
11.7 Exercises
403(6)
12 MATT Architecture 409(18)
12.1 SIMT Programming Model
410(2)
12.2 Mapping SIMT Workloads to SIMT Cores
412(1)
12.3 STMT Core Architecture
413(10)
12.3.1 Scalar ISA
413(1)
12.3.2 SLMDization/Vectorization: Warp Formation
413(2)
12.3.3 Fine-Grain Multithreading (Warp-Level Parallelism)
415(1)
12.3.4 Microarchitecture
416(1)
12.3.5 Pipeline Execution
417(1)
12.3.6 Control Flow Processing
418(1)
12.3.7 Memory Systems
419(4)
12.4 Exercises
423(4)
13 Ask the Experts 427(26)
Bibliography 453(6)
Index 459
Yan Solihin is a professor of electrical and computer engineering at North Carolina State University, where he founded and leads the Architecture Research for Performance, Reliability, and Security (ARPERS) group. Dr. Solihin has been a recipient of the IBM Faculty Partnership Award, NSF Faculty Early Career Award, and AT&T Leadership Award. He is listed in the HPCA Hall of Fame and is a senior member of the IEEE. His research interests include computer architecture, computer system modeling methods, and image processing.