Muutke küpsiste eelistusi

Fundamentals of Multicore Software Development [Kõva köide]

Edited by , Edited by (Intel Corporation, Santa Clara, California, USA), Edited by (University of Karlsruhe, Germany)
  • Formaat: Hardback, 330 pages, kõrgus x laius: 234x156 mm, kaal: 589 g, 1 Tables, black and white; 101 Illustrations, black and white
  • Sari: Chapman & Hall/CRC Computational Science
  • Ilmumisaeg: 12-Dec-2011
  • Kirjastus: CRC Press Inc
  • ISBN-10: 143981273X
  • ISBN-13: 9781439812730
  • Formaat: Hardback, 330 pages, kõrgus x laius: 234x156 mm, kaal: 589 g, 1 Tables, black and white; 101 Illustrations, black and white
  • Sari: Chapman & Hall/CRC Computational Science
  • Ilmumisaeg: 12-Dec-2011
  • Kirjastus: CRC Press Inc
  • ISBN-10: 143981273X
  • ISBN-13: 9781439812730
This collection of twelve articles on multi-core software programming provides detailed information on modern application development with a variety of languages and platforms. The work is divided into sections covering parallel programming basics, languages for multi-core, heterogeneous processors, and emerging technologies. Individual chapters cover topics such as parallel design patterns, parallelism in .NET and Java, programming the cell processor, and auto-tuning parallel application performance. Contributors are academics in computer science from American and European universities. Annotation ©2012 Book News, Inc., Portland, OR (booknews.com)

With multicore processors now in every computer, server, and embedded device, the need for cost-effective, reliable parallel software has never been greater. By explaining key aspects of multicore programming, Fundamentals of Multicore Software Development helps software engineers understand parallel programming and master the multicore challenge.

Accessible to newcomers to the field, the book captures the state of the art of multicore programming in computer science. It covers the fundamentals of multicore hardware, parallel design patterns, and parallel programming in C++, .NET, and Java. It also discusses manycore computing on graphics cards and heterogeneous multicore platforms, automatic parallelization, automatic performance tuning, transactional memory, and emerging applications.

As computing power increasingly comes from parallelism, software developers must embrace parallel programming. Written by leaders in the field, this book provides an overview of the existing and up-and-coming programming choices for multicores. It addresses issues in systems architecture, operating systems, languages, and compilers.

Arvustused

Fundamentals of Multicore Software Development provides a well-organized overview of advances in parallel architectures and software programming. This reviewer learned much from [ the book] and highly recommends it, whether for personal interest or for use as an introductory text. Robert Schaefer, ACM SIGSOFT Software Engineering Notes, May 2012

The individual chapters are well written and self contained; they can be read independently yet fit together well into a coherent and logical presentation. Each chapter includes extensive references. The book will likely appeal most to researchers. Andrew R. Huber, Computing Reviews, March 2012

This book paints a great picture of where we are, and gives more than an inkling of where we may go next. As we gain broader, more general experience with parallel computing based on the foundation presented here, we can be sure that we are helping to rewrite the next chapter probably the most significant one in the amazing history of computing. From the Foreword by Burton J. Smith, Technical Fellow, Microsoft Corporation

Foreword vii
Editors ix
Contributors xi
Chapter 1 Introduction 1(6)
Victor Pankratius
Ali-Reza Adl-Tabatabai
Walter F. Tichy
1.1 Where We Are Today
1(1)
1.2 How This Book Helps
2(1)
1.3 Audience
3(1)
1.4 Organization
3(6)
1.4.1 Part I: Basics of Parallel Programming
3(1)
1.4.2 Part II: Programming Languages for Multicore
3(1)
1.4.3 Part III: Programming Heterogeneous Processors
4(1)
1.4.4 Part IV: Emerging Technologies
4(3)
Part I Basics of Parallel Programming 7(46)
Chapter 2 Fundamentals of Multicore Hardware and Parallel Programming
9(22)
Barry Wilkinson
2.1 Introduction
9(1)
2.2 Potential for Increased Speed
10(3)
2.3 Types of Parallel Computing Platforms
13(3)
2.4 Processor Design
16(2)
2.5 Multicore Processor Architectures
18(3)
2.5.1 General
18(1)
2.5.2 Symmetric Multicore Designs
18(2)
2.5.3 Asymmetric Multicore Designs
20(1)
2.6 Programming Multicore Systems
21(4)
2.6.1 Processes and Threads
21(2)
2.6.2 Thread APIs
23(2)
2.6.3 OpenMP
25(1)
2.7 Parallel Programming Strategies
25(3)
2.7.1 Task and Data Parallelism
25(1)
2.7.1.1 Embarrassingly Parallel Computations
25(1)
2.7.1.2 Pipelining
26(1)
2.7.1.3 Synchronous Computations
27(1)
2.7.1.4 Workpool
27(1)
2.8 Summary
28(1)
References
28(3)
Chapter 3 Parallel Design Patterns
31(22)
Tim Mattson
3.1 Parallel Programming Challenge
31(1)
3.2 Design Patterns: Background and History
32(1)
3.3 Essential Patterns for Parallel Programming
33(17)
3.3.1 Parallel Algorithm Strategy Patterns
34(1)
3.3.1.1 Task Parallelism Pattern
34(1)
3.3.1.2 Data Parallelism
36(1)
3.3.1.3 Divide and Conquer
38(1)
3.3.1.4 Pipeline
39(1)
3.3.1.5 Geometric Decomposition
40(1)
3.3.2 Implementation Strategy Patterns
41(1)
3.3.2.1 SPMD
42(1)
3.3.2.2 SIMD
43(1)
3.3.2.3 Loop-Level Parallelism
44(1)
3.3.2.4 Fork-Join
47(1)
3.3.2.5 Master-Worker/Task-Queue
49(1)
3.4 Conclusions and Next Steps
50(1)
References
50(3)
Part II Programming Languages for Multicore 53(76)
Chapter 4 Threads and Shared Variables in C++
55(24)
Hans Boehm
4.1 Basic Model and Thread Creation
56(1)
4.2 Small Detour: C++0x Lambda Expressions
57(1)
4.3 Complete Example
57(1)
4.4 Shared Variables
58(2)
4.5 More Refined Approach
60(2)
4.6 Avoiding Data Races
62(1)
4.7 Mutexes
62(1)
4.8 Atomic Variables
63(4)
4.8.1 Low-Level Atomics
66(1)
4.9 Other Synchronization Mechanisms
67(3)
4.9.1 Unique_lock
67(1)
4.9.2 Condition Variables
68(1)
4.9.3 Other Mutex Variants and Facilities
69(1)
4.9.4 Call_once
70(1)
4.10 Terminating a Multi-Threaded C++ Program
70(1)
4.11 Task-Based Models
71(2)
4.12 Relationship to Earlier Standards
73(3)
4.12.1 Separate Thread Libraries
73(1)
4.12.2 No Atomics
74(1)
4.12.3 Adjacent Field Overwrites
75(1)
4.12.4 Other Compiler-Introduced Races
75(1)
4.12.5 Program Termination
76(1)
References
76(3)
Chapter 5 Parallelism in .NET and Java
79(22)
Judith Bishop
5.1 Introduction
79(2)
5.1.1 Types of Parallelism
80(1)
5.1.2 Overview of the
Chapter
81(1)
5.2 .NET Parallel Landscape
81(1)
5.3 Task Parallel Library
82(5)
5.3.1 Basic Methods-For, For Each, and Invoke
82(2)
5.3.2 Breaking Out of a Loop
84(1)
5.3.3 Tasks and Futures
85(2)
5.4 PLINQ
87(1)
5.5 Evaluation
88(2)
5.6 Java Platform
90(3)
5.6.1 Thread Basics
90(1)
5.6.2 java.util.concurrent
91(2)
5.7 Fork-Join Framework
93(2)
5.7.1 Performance
94(1)
5.8 ParallelArray Package
95(1)
5.9 Conclusion
96(1)
Acknowledgments
97(1)
References
97(4)
Chapter 6 OpenMP
101(28)
Barbara Chapman
James LaGrone
6.1 Introduction
101(4)
6.1.1 Idea of OpenMP
102(1)
6.1.2 Overview of Features
102(2)
6.1.3 Who Developed OpenMP? How Is It Evolving'?
104(1)
6.2 OpenMP 3.0 Specification
105(13)
6.2.1 Parallel Regions and Worksharing
105(1)
6.2.1.1 Scheduling Parallel Loops
108(2)
6.2.2 Data Environment
110(1)
6.2.2.1 Using Data Attributes
110(2)
6.2.3 Explicit Tasks
112(1)
6.2.3.1 Using Explicit Tasks
112(1)
6.2.4 Synchronization
113(1)
6.2.4.1 Performing Reductions in OpenMP
115(1)
6.2.5 OpenMP Library Routines and Environment Variables
116(1)
6.2.5.1 SPMD Programming Style
117(1)
6.3 Implementation of OpenMP
118(3)
6.4 Programming for Performance
121(2)
6.5 Summary
123(1)
References
124(5)
Part III Programming Heterogeneous Processors 129(70)
Chapter 7 Scalable Manycore Computing with CUDA
131(24)
Michael Garland
Vinod Grover
Kevin Skadron
7.1 Introduction
131(1)
7.2 Manycore GPU Machine Model
132(2)
7.3 Structure of CUDA Programs
134(4)
7.3.1 Program Placement
134(1)
7.3.2 Parallel Kernels
135(1)
7.3.3 Communicating within Blocks
136(1)
7.3.4 Device Memory Management
137(1)
7.3.5 Complete CUDA Example
138(1)
7.4 Execution of Kernels on the GPU
138(5)
7.4.1 Kernel Scheduling
139(2)
7.4.2 Coordinating Tasks in Kernels
141(1)
7.4.3 Memory Consistency
142(1)
7.5 Writing a CUDA Program
143(5)
7.5.1 Block-Level Parallel Prefix
143(2)
7.5.2 Array Reduction
145(2)
7.5.3 Coordinating Whole Grids
147(1)
7.6 GPU Architecture
148(4)
7.7 Further Reading
152(1)
References
152(3)
Chapter 8 Programming the Cell Processor
155(44)
Christoph W. Kessler
8.1 Introduction
156(1)
8.2 Cell Processor Architecture Overview
157(8)
8.2.1 Power Processing Element
157(1)
8.2.2 Synergistic Processing Element
158(2)
8.2.3 Element Interconnect Bus
160(1)
8.2.4 DMA Communication and Memory Access
161(2)
8.2.5 Channels
163(1)
8.2.6 Mailboxes
164(1)
8.2.7 Signals
164(1)
8.3 Cell Programming with the SDK
165(9)
8.3.1 PPE/SPE Thread Coordination
165(2)
8.3.2 DMA Communication
167(2)
8.3.3 DMA Communication and Multi-Buffering
169(1)
8.3.4 Using SIMD Instructions on SPE
170(4)
8.3.5 Summary: Cell Programming with the SDK
174(1)
8.4 Cell SDK Compilers, Libraries, and Tools
174(3)
8.4.1 Compilers
175(1)
8.4.2 Full-System Simulator
175(1)
8.4.3 Performance Analysis and Visualization
175(1)
8.4.4 Cell IDE
176(1)
8.4.5 Libraries, Components, and Frameworks
176(1)
8.5 Higher-Level Programming Environments for Cell
177(10)
8.5.1 OpenMP
178(1)
8.5.2 CellSs
179(1)
8.5.3 Sequoia
180(1)
8.5.4 RapidMind
180(1)
8.5.5 Sieve C++
181(1)
8.5.6 Offload C++
182(2)
8.5.7 NestStep
184(1)
8.5.8 BlockLib
185(1)
8.5.9 StarPU for Cell
186(1)
8.5.10 Other High-Level Programming Environments for Cell
186(1)
8.6 Algorithms and Components for Cell
187(1)
8.7 Summary and Outlook
188(3)
8.8 Bibliographical Remarks
191(1)
Acknowledgments
191(1)
Disclaimers and Declarations
192(1)
Trademarks
192(1)
References
193(6)
Part IV Emerging Technologies 199(110)
Chapter 9 Automatic Extraction of Parallelism from Sequential Code
201(38)
David I. August
Jialu Huang
Thomas B. Jablin
Hanjun Kim
Thomas R. Mason
Prakash Prabhu
Arun Raman
Yun Zhang
9.1 Introduction
202(1)
9.1.1 Background
202(1)
9.1.2 Techniques and Tools
202(1)
9.2 Dependence Analysis
203(6)
9.2.1 Introduction
203(1)
9.2.2 Data Dependence Analysis
204(1)
9.2.2.1 Data Dependence Graph
204(1)
9.2.2.2 Analysis
206(1)
9.2.3 Control Dependence Analysis
207(1)
9.2.3.1 Control Dependence Graph
207(1)
9.2.3.2 Analysis
208(1)
9.2.4 Program Dependence Graph
209(1)
9.3 DOALL Parallelization
209(8)
9.3.1 Introduction
209(2)
9.3.2 Code Generation
211(1)
9.3.3 Advanced Topic: Reduction
212(2)
9.3.4 Advanced Topic: Speculative DOALL
214(1)
9.3.5 Advanced Topic: Further Techniques and Transformations
215(2)
9.4 DOACROSS Parallelization
217(6)
9.4.1 Introduction
217(2)
9.4.2 Code Generation
219(1)
9.4.3 Advanced Topic: Speculation
220(3)
9.5 Pipeline Parallelization
223(7)
9.5.1 Introduction
223(1)
9.5.2 Code Partitioning
224(1)
9.5.3 Code Generation
225(1)
9.5.4 Advanced Topic: Speculation
226(4)
9.6 Bringing It All Together
230(2)
9.7 Conclusion
232(1)
References
233(6)
Chapter 10 Auto-Tuning Parallel Application Performance
239(26)
Christoph A. Schaefer
Victor Pankratius
Walter F. Tichy
10.1 Introduction
240(1)
10.2 Motivating Example
240(1)
10.3 Terminology
241(3)
10.3.1 Auto-Tuning
242(1)
10.3.2 Classification of Approaches
243(1)
10.4 Overview of the Tunable Architectures Approach
244(1)
10.5 Designing Tunable Applications
245(6)
10.5.1 Tunable Architectures
246(1)
10.5.1.1 Atomic Components
246(1)
10.5.1.2 Connectors
247(1)
10.5.1.3 Runtime System and Backend
247(1)
10.5.1.4 A Tunable Architecture Example
248(2)
10.5.2 CO2P3S
250(1)
10.5.3 Comparison
251(1)
10.6 Implementation with Tuning Instrumentation Languages
251(5)
10.6.1 Atune-IL
252(2)
10.6.2 X-Language
254(1)
10.6.3 POET
255(1)
10.6.4 Orio
255(1)
10.6.5 Comparison
255(1)
10.7 Performance Optimization
256(4)
10.7.1 Auto-Tuning Cycle
256(1)
10.7.2 Search Techniques
257(1)
10.7.3 Auto-Tuning Systems
258(1)
10.7.3.1 Atune
258(1)
10.7.3.2 ATLAS/AEOS
259(1)
10.7.3.3 Active Harmony
259(1)
10.7.3.4 Model-Based Systems
260(1)
10.7.4 Comparison
260(1)
10.8 Conclusion and Outlook
260(1)
References
261(4)
Chapter 11 Transactional Memory
265(26)
Tim Harris
11.1 Introduction
265(3)
11.2 Transactional Memory Taxonomy
268(2)
11.2.1 Eager/Lazy Version Management
268(1)
11.2.2 Eager/Lazy Conflict Detection
268(1)
11.2.3 Semantics
269(1)
11.3 Hardware Transactional Memory
270(3)
11.3.1 Classical Cache-Based Bounded-Size HTM
270(2)
11.3.2 LogTM
272(1)
11.4 Software Transactional Memory
273(6)
11.4.1 Bartok-STM
274(3)
11.4.2 TL2
277(2)
11.5 Atomic Blocks
279(6)
11.5.1 Semantics of Atomic Blocks
280(2)
11.5.2 Optimizing Atomic Blocks
282(1)
11.5.3 Composable Blocking
283(2)
11.6 Performance
285(2)
11.7 Where Next with TM?
287(1)
11.8
Chapter Notes
288(1)
References
289(2)
Chapter 12 Emerging Applications
291(18)
Pradeep Dubey
12.1 Introduction
291(1)
12.2 RMS Taxonomy
292(8)
12.2.1 Interactive RMS (iRMS)
294(2)
12.2.2 Growing Significance of Data-Driven Models
296(1)
12.2.2.1 Massive Data Computing: An Algorithmic Opportunity
297(1)
12.2.3 Nested RMS
298(1)
12.2.4 Structured Decomposition of RMS Applications
298(2)
12.3 System Implications
300(5)
12.3.1 Nature and Source of Underlying Parallelism
300(1)
12.3.1.1 Approximate, Yet Real Time
300(1)
12.3.1.2 Curse of Dimensionality and Irregular Access Pattern
301(1)
12.3.1.3 Parallelism: Both Coarse and Fine Grain
301(1)
12.3.1.4 Throughput Computing and Manycore
302(1)
12.3.1.5 Revisiting Amdahl's Law for Throughput Computing
302(1)
12.3.2 Scalability of RMS Applications
303(1)
12.3.2.1 Scalability Implications of Dataset Growth
304(1)
12.3.3 Homogenous versus Heterogeneous Decomposition
305(1)
12.4 Conclusion
305(1)
References
306(3)
Index 309
Victor Pankratius heads the Multicore Software Engineering group at the Karlsruhe Institute of Technology. He is also the elected chairman of the Software Engineering for Parallel Systems (SEPARS) international working group. With a focus on making parallel programming easier, his research encompasses auto-tuning, language design, debugging, and empirical studies.

Ali-Reza Adl-Tabatabai is a senior principal engineer at Intel Corporation, where he leads a team working on compilers and scalable runtimes. His research concentrates on language features that make it easier to build reliable and scalable parallel programs for future multicore architectures.

Walter Tichy is a professor of computer science and head of the Programming Systems group at the Karlsruhe Institute of Technology. He is also a member of the board of directors of software engineering at Forschungszentrum Informatik (FZI), an independent research institution. His research covers tools and methods to simplify the engineering of general-purpose parallel software, including race detection, auto-tuning, and high-level languages for expressing parallelism.