Muutke küpsiste eelistusi

E-raamat: Heterogeneous Computing with OpenCL 2.0

(AMD, Sunnyvale, California, USA), (Northeastern University, Boston, MA, USA), (Northeastern University, Boston, MA, USA), (Northeastern University, Boston, MA, USA)
  • Formaat: PDF+DRM
  • Ilmumisaeg: 18-Jun-2015
  • Kirjastus: Morgan Kaufmann Publishers In
  • Keel: eng
  • ISBN-13: 9780128016497
  • Formaat - PDF+DRM
  • Hind: 62,78 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: PDF+DRM
  • Ilmumisaeg: 18-Jun-2015
  • Kirjastus: Morgan Kaufmann Publishers In
  • Keel: eng
  • ISBN-13: 9780128016497

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Heterogeneous Computing with OpenCL 2.0 teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. This fully-revised edition includes the latest enhancements in OpenCL 2.0 including: Shared virtual memory to increase programming flexibility and reduce data transfers that consume resources Dynamic parallelism which reduces processor load and avoids bottlenecks Improved imaging support and integration with OpenGL, a graphics standard Pipe memory which can be optimized for specific scenarios Improved integration with Android platforms Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms.Updated throughout to cover the latest developments in OpenCL 2.0, including improvements in memory handling, parallelism, and imaging support Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications Covers image processing, web plugins, particle simulations, video editing, performance optimization, and morePresents multiple examples and case studies that demonstrate range of fundamental programming techniques on current system architectures using OpenCL as the target language alongside CPUs, GPUs and APUs

Arvustused

"...one of the best sources to start with OpenCLIf you need to start writing parallel programs but are intimidated by the complexity, this book will not leave you any excuses!" --Computing Reviews

Muu info

Revised with the latest changes in OpenCL 2.0
List of Figures
xi
List of Tables
xvii
Foreword xix
Acknowledgments xxi
Chapter 1 Introduction
1(14)
1.1 Introduction to Heterogeneous Computing
1(1)
1.2 The Goals of This Book
2(1)
1.3 Thinking Parallel
2(5)
1.4 Concurrency and Parallel Programming Models
7(1)
1.5 Threads and Shared Memory
8(1)
1.6 Message-Passing Communication
9(1)
1.7 Different Grains of Parallelism
10(2)
1.7.1 Data Sharing and Synchronization
11(1)
1.7.2 Shared Virtual Memory
11(1)
1.8 Heterogeneous Computing with OpenCL
12(1)
1.9 Book Structure
13(2)
References
14(1)
Chapter 2 Device Architectures
15(26)
2.1 Introduction
15(1)
2.2 Hardware Trade-offs
15(14)
2.2.1 Performance Increase with Frequency, and its Limitations
17(1)
2.2.2 Superscalar Execution
18(1)
2.2.3 Very Long Instruction Word
19(2)
2.2.4 SIMD and Vector Processing
21(1)
2.2.5 Hardware Multithreading
22(3)
2.2.6 Multicore Architectures
25(1)
2.2.7 Integration: Systems-on-Chip and the APU
26(2)
2.2.8 Cache Hierarchies and Memory Systems
28(1)
2.3 The Architectural Design Space
29(9)
2.3.1 CPU Designs
29(4)
2.3.2 GPU Architectures
33(4)
2.3.3 APU and APU-like Designs
37(1)
2.4 Summary
38(3)
References
39(2)
Chapter 3 Introduction to OpenCL
41(34)
3.1 Introduction
41(2)
3.1.1 The OpenCL Standard
41(1)
3.1.2 The OpenCL Specification
42(1)
3.2 The OpenCL Platform Model
43(2)
3.2.1 Platforms and Devices
44(1)
3.3 The OpenCL Execution Model
45(5)
3.3.1 Contexts
45(2)
3.3.2 Command-Queues
47(1)
3.3.3 Events
48(1)
3.3.4 Device-Side Enqueuing
49(1)
3.4 Kernels and the OpenCL Programming Model
50(6)
3.4.1 Compilation and Argument Handling
53(2)
3.4.2 Starting Kernel Execution on a Device
55(1)
3.5 OpenCL Memory Model
56(6)
3.5.1 Memory Objects
56(3)
3.5.2 Data Transfer Commands
59(1)
3.5.3 Memory Regions
60(2)
3.5.4 Generic Address Space
62(1)
3.6 The OpenCL Runtime with an Example
62(7)
3.6.1 Complete Vector Addition Listing
66(3)
3.7 Vector Addition Using an OpenCL C++ Wrapper
69(2)
3.8 OpenCL for CUDA Programmers
71(2)
3.9 Summary
73(2)
Reference
73(2)
Chapter 4 Examples
75(36)
4.1 OpenCL Examples
75(1)
4.2 Histogram
75(8)
4.3 Image Rotation
83(8)
4.4 Image Convolution
91(8)
4.5 Producer-Consumer
99(8)
4.6 Utility Functions
107(2)
4.6.1 Reporting Compilation Errors
107(1)
4.6.2 Creating a Program String
108(1)
4.7 Summary
109(2)
Chapter 5 OpenCL Runtime and Concurrency Model
111(32)
5.1 Commands and the Queuing Model
111(7)
5.1.1 Blocking Memory Operations
111(1)
5.1.2 Events
112(1)
5.1.3 Command Barriers and Markers
113(1)
5.1.4 Event Callbacks
114(1)
5.1.5 Profiling Using Events
114(1)
5.1.6 User Events
115(1)
5.1.7 Out-of-Order Command-Queues
116(2)
5.2 Multiple Command-Queues
118(3)
5.3 The Kernel Execution Domain: Work-Items, Work-Groups, and NDRanges
121(9)
5.3.1 Synchronization
124(1)
5.3.2 Work-Group Barriers
125(3)
5.3.3 Built-In Work-Group Functions
128(1)
5.3.4 Predicate Evaluation Functions
128(1)
5.3.5 Broadcast Functions
129(1)
5.3.6 Parallel Primitive Functions
129(1)
5.4 Native and Built-In Kernels
130(2)
5.4.1 Native kernels
130(2)
5.4.2 Built-In kernels
132(1)
5.5 Device-Side Queuing
132(10)
5.5.1 Creating a Device-Side Queue
135(1)
5.5.2 Enqueuing Device-Side Kernels
136(6)
5.6 Summary
142(1)
Reference
142(1)
Chapter 6 OpenCL Host-Side Memory Model
143(20)
6.1 Memory Objects
144(4)
6.1.1 Buffers
144(1)
6.1.2 Images
145(2)
6.1.3 Pipes
147(1)
6.2 Memory Management
148(11)
6.2.1 Managing Default Memory Objects
149(6)
6.2.2 Managing Memory Objects with Allocation Options
155(4)
6.3 Shared Virtual Memory
159(2)
6.4 Summary
161(2)
Chapter 7 OpenCL Device-Side Memory Model
163(24)
7.1 Synchronization and Communication
164(4)
7.1.1 Barriers
165(1)
7.1.2 Atomics
166(2)
7.2 Global Memory
168(7)
7.2.1 Buffers
168(1)
7.2.2 Images
169(4)
7.2.3 Pipes
173(2)
7.3 Constant Memory
175(1)
7.4 Local Memory
175(3)
7.5 Private Memory
178(1)
7.6 Generic Address Space
178(2)
7.7 Memory Ordering
180(6)
7.7.1 Atomics Revisited
183(2)
7.7.2 Fences
185(1)
7.8 Summary
186(1)
Chapter 8 Dissecting OpenCL on a Heterogeneous System
187(26)
8.1 OpenCL on an AMD FX-8350 CPU
187(5)
8.1.1 Runtime Implementation
188(3)
8.1.2 Vectorizing Within a Work-Item
191(1)
8.1.3 Local Memory
191(1)
8.2 OpenCL on the AMD Radeon R9 290X GPU
192(9)
8.2.1 Threading and the Memory System
194(2)
8.2.2 Instruction Set Architecture and Execution Units
196(4)
8.2.3 Resource Allocation
200(1)
8.3 Memory Performance Considerations in OpenCL
201(10)
8.3.1 Global Memory
201(4)
8.3.2 Local Memory as a Software-Managed Cache
205(6)
8.4 Summary
211(2)
References
211(2)
Chapter 9 Case study: Image clustering
213(16)
9.1 Introduction
213(2)
9.2 The Feature Histogram on the CPU
215(2)
9.2.1 Sequential Implementation
215(1)
9.2.2 OpenMP parallelization
216(1)
9.3 OpenCL Implementation
217(10)
9.3.1 Naive GPU Implementation: GPU1
217(1)
9.3.2 Coalesced Memory Accesses: GPU2
218(3)
9.3.3 Vectorizing Computation: GPU3
221(2)
9.3.4 Move SURF Features to Local Memory: GPU4
223(2)
9.3.5 Move Cluster Centroids to Constant Memory: GPU5
225(2)
9.4 Performance Analysis
227(1)
9.4.1 GPU Performance
227(1)
9.5 Conclusion
228(1)
References
228(1)
Chapter 10 OpenCL Profiling and Debugging
229(20)
10.1 Introduction
229(1)
10.2 Profiling OpenCL Code Using Events
229(2)
10.3 AMD CodeXL
231(1)
10.4 Profiling Using CodeXL
232(6)
10.4.1 Collecting OpenCL Application Traces
233(2)
10.4.2 Host API Trace View
235(1)
10.4.3 Summary Pages View
236(1)
10.4.4 Collecting GPU Kernel Performance Counters
236(1)
10.4.5 CPU Performance Profiling Using CodeXL
237(1)
10.5 Analyzing Kernels Using CodeXL
238(5)
10.5.1 KernelAnalyzer Statistics and ISA Views
239(3)
10.5.2 KernelAnalyzer Analysis View
242(1)
10.6 Debugging OpenCL Kernels Using CodeXL
243(3)
10.6.1 API-Level Debugging
244(1)
10.6.2 Kernel Debugging
245(1)
10.7 Debugging Using printf
246(1)
10.8 Summary
247(2)
Chapter 11 Mapping High-Level Programming Languages to OpenCL 2.0
249(24)
11.1 Introduction
249(1)
11.2 A Brief Introduction to C++ AMP
250(4)
11.2.1 C++ AMP array_view
251(1)
11.2.2 C++ AMP parallel_for_each, or Kernel Invocation
252(2)
11.3 OpenCL 2.0 as a Compiler Target
254(1)
11.4 Mapping Key C++ AMP Constructs to OpenCL
254(5)
11.5 C++ AMP Compilation Flow
259(1)
11.6 Compiled C++ AMP Code
260(1)
11.7 How Shared Virtual Memory in OpenCL 2.0 Fits in
261(2)
11.8 Compiler Support for Tiling in C++ AMP
263(2)
11.8.1 Dividing the Compute Domain
263(1)
11.8.2 Specifying the Address Space and Barriers
264(1)
11.9 Address Space Deduction
265(2)
11.10 Data Movement Optimization
267(1)
11.10.1 discard_data()
267(1)
11.10.2 array_view---<constT, N>
268(1)
11.11 Binomial Options: A Full Example
268(2)
11.12 Preliminary Results
270(1)
11.13 Conclusion
271(2)
Reference
272(1)
Chapter 12 WebCL Enabling OpenCL Acceleration of Web Applications
273(18)
12.1 Introduction
273(1)
12.2 Programming with WebCL
273(8)
12.3 Synchronization
281(1)
12.4 Interoperability with WebGL
282(1)
12.5 Example Application
282(3)
12.6 Security Enhancement
285(1)
12.7 WebCL on the Server
286(2)
12.8 Status and Future of WebCL
288(3)
References
288(1)
Works Cited
288(3)
Chapter 13 Foreign lands
291(10)
13.1 Introduction
291(1)
13.2 Beyond C and C++
291(2)
13.3 Haskell OpenCL
293(6)
13.3.1 Module Structure
294(1)
13.3.2 Environments
295(1)
13.3.3 Reference Counting
295(1)
13.3.4 Platform and Devices
296(1)
13.3.5 The Execution Environment
296(3)
13.4 Summary
299(2)
References
299(2)
Index 301
David Kaeli received a BS and PhD in Electrical Engineering from Rutgers University, and an MS in Computer Engineering from Syracuse University. He is the Associate Dean of Undergraduate Programs in the College of Engineering and a Full Processor on the ECE faculty at Northeastern University, Boston, MA where he directs the Northeastern University Computer Architecture Research Laboratory (NUCAR). Prior to joining Northeastern in 1993, Kaeli spent 12 years at IBM, the last 7 at T.J. Watson Research Center, Yorktown Heights, NY.Dr. Kaeli has co-authored more than 200 critically reviewed publications. His research spans a range of areas including microarchitecture to back-end compilers and software engineering. He leads a number of research projects in the area of GPU Computing. He presently serves as the Chair of the IEEE Technical Committee on Computer Architecture. Dr. Kaeli is an IEEE Fellow and a member of the ACM. Perhaad Mistry works in AMDs developer tools group at the Boston Design Center focusing on developing debugging and performance profiling tools for heterogeneous architectures. He is presently focused on debugger architectures for upcoming platforms shared memory and discrete Graphics Processing Unit (GPU) platforms. Perhaad has been working on GPU architectures and parallel programming since CUDA 0.8 in 2007. He has enjoyed implementing medical imaging algorithms for GPGPU platforms and architecture aware data structures for surgical simulators. Perhaad's present work focuses on the design of debuggers and architectural support for performance analysis for the next generation of applications that will target GPU platforms. Perhaad graduated after 7 years with a PhD from Northeastern University in Electrical and Computer Engineering and was advised by Dr. David Kaeli who the leads Northeastern University Computer Architecture Research Laboratory (NUCAR). Even after graduating, Perhaad is still a member of NUCAR and is advising on research projects on performance analysis of parallel architectures. He received a BS in Electronics Engineering from University of Mumbai and an MS in Computer Engineering from Northeastern University in Boston. He is presently based in Boston. Dana Schaa received a BS in Computer Engineering from Cal Poly, San Luis Obispo, and an MS and PhD in Electrical and Computer Engineering from Northeastern University. He works on GPU architecture modeling at AMD, and has interests and expertise that include memory systems, microarchitecture, performance analysis, and general purpose computing on GPUs. His background includes the development OpenCL-based medical imaging applications ranging from real-time visualization of 3D ultrasound to CT image reconstruction in heterogeneous environments. Dana married his wonderful wife Jenny in 2010, and they live together in San Jose with their charming cats.