Klienditugi: 7440010 (E-R 10-18)

E-raamat: Heterogeneous Computing with OpenCL 2.0

3.83/5 (12 hinnangut Goodreads-ist)

Dong Ping Zhang (AMD, Sunnyvale, California, USA), David R. Kaeli (Northeastern University, Boston, MA, USA), Perhaad Mistry (Northeastern University, Boston, MA, USA), Dana Schaa (Northeastern University, Boston, MA, USA)

Formaat: PDF+DRM
Ilmumisaeg: 18-Jun-2015
Kirjastus: Morgan Kaufmann Publishers In
Keel: eng
ISBN-13: 9780128016497

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 62,78 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Ilmumisaeg: 18-Jun-2015
Kirjastus: Morgan Kaufmann Publishers In
Keel: eng
ISBN-13: 9780128016497

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Heterogeneous Computing with OpenCL 2.0 teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. This fully-revised edition includes the latest enhancements in OpenCL 2.0 including: Shared virtual memory to increase programming flexibility and reduce data transfers that consume resources Dynamic parallelism which reduces processor load and avoids bottlenecks Improved imaging support and integration with OpenGL, a graphics standard Pipe memory which can be optimized for specific scenarios Improved integration with Android platforms Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms.Updated throughout to cover the latest developments in OpenCL 2.0, including improvements in memory handling, parallelism, and imaging support Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications Covers image processing, web plugins, particle simulations, video editing, performance optimization, and morePresents multiple examples and case studies that demonstrate range of fundamental programming techniques on current system architectures using OpenCL as the target language alongside CPUs, GPUs and APUs

Arvustused

"...one of the best sources to start with OpenCLIf you need to start writing parallel programs but are intimidated by the complexity, this book will not leave you any excuses!" --Computing Reviews

Muu info

Revised with the latest changes in OpenCL 2.0

List of Figures

List of Tables

xvii

Foreword

xix

Acknowledgments

xxi

Chapter 1 Introduction

(14)

1.1 Introduction to Heterogeneous Computing

(1)

1.2 The Goals of This Book

(1)

1.3 Thinking Parallel

(5)

1.4 Concurrency and Parallel Programming Models

(1)

1.5 Threads and Shared Memory

(1)

1.6 Message-Passing Communication

(1)

1.7 Different Grains of Parallelism

(2)

1.7.1 Data Sharing and Synchronization

(1)

1.7.2 Shared Virtual Memory

(1)

1.8 Heterogeneous Computing with OpenCL

(1)

1.9 Book Structure

(2)

References

(1)

Chapter 2 Device Architectures

(26)

2.1 Introduction

(1)

2.2 Hardware Trade-offs

(14)

2.2.1 Performance Increase with Frequency, and its Limitations

(1)

2.2.2 Superscalar Execution

(1)

2.2.3 Very Long Instruction Word

(2)

2.2.4 SIMD and Vector Processing

(1)

2.2.5 Hardware Multithreading

(3)

2.2.6 Multicore Architectures

(1)

2.2.7 Integration: Systems-on-Chip and the APU

(2)

2.2.8 Cache Hierarchies and Memory Systems

(1)

2.3 The Architectural Design Space

(9)

2.3.1 CPU Designs

(4)

2.3.2 GPU Architectures

(4)

2.3.3 APU and APU-like Designs

(1)

2.4 Summary

(3)

References

(2)

Chapter 3 Introduction to OpenCL

(34)

3.1 Introduction

(2)

3.1.1 The OpenCL Standard

(1)

3.1.2 The OpenCL Specification

(1)

3.2 The OpenCL Platform Model

(2)

3.2.1 Platforms and Devices

(1)

3.3 The OpenCL Execution Model

(5)

3.3.1 Contexts

(2)

3.3.2 Command-Queues

(1)

3.3.3 Events

(1)

3.3.4 Device-Side Enqueuing

(1)

3.4 Kernels and the OpenCL Programming Model

(6)

3.4.1 Compilation and Argument Handling

(2)

3.4.2 Starting Kernel Execution on a Device

(1)

3.5 OpenCL Memory Model

(6)

3.5.1 Memory Objects

(3)

3.5.2 Data Transfer Commands

(1)

3.5.3 Memory Regions

(2)

3.5.4 Generic Address Space

(1)

3.6 The OpenCL Runtime with an Example

(7)

3.6.1 Complete Vector Addition Listing

(3)

3.7 Vector Addition Using an OpenCL C++ Wrapper

(2)

3.8 OpenCL for CUDA Programmers

(2)

3.9 Summary

(2)

Reference

(2)

Chapter 4 Examples

(36)

4.1 OpenCL Examples

(1)

4.2 Histogram

(8)

4.3 Image Rotation

(8)

4.4 Image Convolution

(8)

4.5 Producer-Consumer

(8)

4.6 Utility Functions

107

(2)

4.6.1 Reporting Compilation Errors

107

(1)

4.6.2 Creating a Program String

108

(1)

4.7 Summary

109

(2)

Chapter 5 OpenCL Runtime and Concurrency Model

111

(32)

5.1 Commands and the Queuing Model

111

(7)

5.1.1 Blocking Memory Operations

111

(1)

5.1.2 Events

112

(1)

5.1.3 Command Barriers and Markers

113

(1)

5.1.4 Event Callbacks

114

(1)

5.1.5 Profiling Using Events

114

(1)

5.1.6 User Events

115

(1)

5.1.7 Out-of-Order Command-Queues

116

(2)

5.2 Multiple Command-Queues

118

(3)

5.3 The Kernel Execution Domain: Work-Items, Work-Groups, and NDRanges

121

(9)

5.3.1 Synchronization

124

(1)

5.3.2 Work-Group Barriers

125

(3)

5.3.3 Built-In Work-Group Functions

128

(1)

5.3.4 Predicate Evaluation Functions

128

(1)

5.3.5 Broadcast Functions

129

(1)

5.3.6 Parallel Primitive Functions

129

(1)

5.4 Native and Built-In Kernels

130

(2)

5.4.1 Native kernels

130

(2)

5.4.2 Built-In kernels

132

(1)

5.5 Device-Side Queuing

132

(10)

5.5.1 Creating a Device-Side Queue

135

(1)

5.5.2 Enqueuing Device-Side Kernels

136

(6)

5.6 Summary

142

(1)

Reference

142

(1)

Chapter 6 OpenCL Host-Side Memory Model

143

(20)

6.1 Memory Objects

144

(4)

6.1.1 Buffers

144

(1)

6.1.2 Images

145

(2)

6.1.3 Pipes

147

(1)

6.2 Memory Management

148

(11)

6.2.1 Managing Default Memory Objects

149

(6)

6.2.2 Managing Memory Objects with Allocation Options

155

(4)

6.3 Shared Virtual Memory

159

(2)

6.4 Summary

161

(2)

Chapter 7 OpenCL Device-Side Memory Model

163

(24)

7.1 Synchronization and Communication

164

(4)

7.1.1 Barriers

165

(1)

7.1.2 Atomics

166

(2)

7.2 Global Memory

168

(7)

7.2.1 Buffers

168

(1)

7.2.2 Images

169

(4)

7.2.3 Pipes

173

(2)

7.3 Constant Memory

175

(1)

7.4 Local Memory

175

(3)

7.5 Private Memory

178

(1)

7.6 Generic Address Space

178

(2)

7.7 Memory Ordering

180

(6)

7.7.1 Atomics Revisited

183

(2)

7.7.2 Fences

185

(1)

7.8 Summary

186

(1)

Chapter 8 Dissecting OpenCL on a Heterogeneous System

187

(26)

8.1 OpenCL on an AMD FX-8350 CPU

187

(5)

8.1.1 Runtime Implementation

188

(3)

8.1.2 Vectorizing Within a Work-Item

191

(1)

8.1.3 Local Memory

191

(1)

8.2 OpenCL on the AMD Radeon R9 290X GPU

192

(9)

8.2.1 Threading and the Memory System

194

(2)

8.2.2 Instruction Set Architecture and Execution Units

196

(4)

8.2.3 Resource Allocation

200

(1)

8.3 Memory Performance Considerations in OpenCL

201

(10)

8.3.1 Global Memory

201

(4)

8.3.2 Local Memory as a Software-Managed Cache

205

(6)

8.4 Summary

211

(2)

References

211

(2)

Chapter 9 Case study: Image clustering

213

(16)

9.1 Introduction

213

(2)

9.2 The Feature Histogram on the CPU

215

(2)

9.2.1 Sequential Implementation

215

(1)

9.2.2 OpenMP parallelization

216

(1)

9.3 OpenCL Implementation

217

(10)

9.3.1 Naive GPU Implementation: GPU1

217

(1)

9.3.2 Coalesced Memory Accesses: GPU2

218

(3)

9.3.3 Vectorizing Computation: GPU3

221

(2)

9.3.4 Move SURF Features to Local Memory: GPU4

223

(2)

9.3.5 Move Cluster Centroids to Constant Memory: GPU5

225

(2)

9.4 Performance Analysis

227

(1)

9.4.1 GPU Performance

227

(1)

9.5 Conclusion

228

(1)

References

228

(1)

Chapter 10 OpenCL Profiling and Debugging

229

(20)

10.1 Introduction

229

(1)

10.2 Profiling OpenCL Code Using Events

229

(2)

10.3 AMD CodeXL

231

(1)

10.4 Profiling Using CodeXL

232

(6)

10.4.1 Collecting OpenCL Application Traces

233

(2)

10.4.2 Host API Trace View

235

(1)

10.4.3 Summary Pages View

236

(1)

10.4.4 Collecting GPU Kernel Performance Counters

236

(1)

10.4.5 CPU Performance Profiling Using CodeXL

237

(1)

10.5 Analyzing Kernels Using CodeXL

238

(5)

10.5.1 KernelAnalyzer Statistics and ISA Views

239

(3)

10.5.2 KernelAnalyzer Analysis View

242

(1)

10.6 Debugging OpenCL Kernels Using CodeXL

243

(3)

10.6.1 API-Level Debugging

244

(1)

10.6.2 Kernel Debugging

245

(1)

10.7 Debugging Using printf

246

(1)

10.8 Summary

247

(2)

Chapter 11 Mapping High-Level Programming Languages to OpenCL 2.0

249

(24)

11.1 Introduction

249

(1)

11.2 A Brief Introduction to C++ AMP

250

(4)

11.2.1 C++ AMP array_view

251

(1)

11.2.2 C++ AMP parallel_for_each, or Kernel Invocation

252

(2)

11.3 OpenCL 2.0 as a Compiler Target

254

(1)

11.4 Mapping Key C++ AMP Constructs to OpenCL

254

(5)

11.5 C++ AMP Compilation Flow

259

(1)

11.6 Compiled C++ AMP Code

260

(1)

11.7 How Shared Virtual Memory in OpenCL 2.0 Fits in

261

(2)

11.8 Compiler Support for Tiling in C++ AMP

263

(2)

11.8.1 Dividing the Compute Domain

263

(1)

11.8.2 Specifying the Address Space and Barriers

264

(1)

11.9 Address Space Deduction

265

(2)

11.10 Data Movement Optimization

267

(1)

11.10.1 discard_data()

267

(1)

11.10.2 array_view---<constT, N>

268

(1)

11.11 Binomial Options: A Full Example

268

(2)

11.12 Preliminary Results

270

(1)

11.13 Conclusion

271

(2)

Reference

272

(1)

Chapter 12 WebCL Enabling OpenCL Acceleration of Web Applications

273

(18)

12.1 Introduction

273

(1)

12.2 Programming with WebCL

273

(8)

12.3 Synchronization

281

(1)

12.4 Interoperability with WebGL

282

(1)

12.5 Example Application

282

(3)

12.6 Security Enhancement

285

(1)

12.7 WebCL on the Server

286

(2)

12.8 Status and Future of WebCL

288

(3)

References

288

(1)

Works Cited

288

(3)

Chapter 13 Foreign lands

291

(10)

13.1 Introduction

291

(1)

13.2 Beyond C and C++

291

(2)

13.3 Haskell OpenCL

293

(6)

13.3.1 Module Structure

294

(1)

13.3.2 Environments

295

(1)

13.3.3 Reference Counting

295

(1)

13.3.4 Platform and Devices

296

(1)

13.3.5 The Execution Environment

296

(3)

13.4 Summary

299

(2)

References

299

(2)

Index

301

David Kaeli received a BS and PhD in Electrical Engineering from Rutgers University, and an MS in Computer Engineering from Syracuse University. He is the Associate Dean of Undergraduate Programs in the College of Engineering and a Full Processor on the ECE faculty at Northeastern University, Boston, MA where he directs the Northeastern University Computer Architecture Research Laboratory (NUCAR). Prior to joining Northeastern in 1993, Kaeli spent 12 years at IBM, the last 7 at T.J. Watson Research Center, Yorktown Heights, NY.Dr. Kaeli has co-authored more than 200 critically reviewed publications. His research spans a range of areas including microarchitecture to back-end compilers and software engineering. He leads a number of research projects in the area of GPU Computing. He presently serves as the Chair of the IEEE Technical Committee on Computer Architecture. Dr. Kaeli is an IEEE Fellow and a member of the ACM. Perhaad Mistry works in AMDs developer tools group at the Boston Design Center focusing on developing debugging and performance profiling tools for heterogeneous architectures. He is presently focused on debugger architectures for upcoming platforms shared memory and discrete Graphics Processing Unit (GPU) platforms. Perhaad has been working on GPU architectures and parallel programming since CUDA 0.8 in 2007. He has enjoyed implementing medical imaging algorithms for GPGPU platforms and architecture aware data structures for surgical simulators. Perhaad's present work focuses on the design of debuggers and architectural support for performance analysis for the next generation of applications that will target GPU platforms. Perhaad graduated after 7 years with a PhD from Northeastern University in Electrical and Computer Engineering and was advised by Dr. David Kaeli who the leads Northeastern University Computer Architecture Research Laboratory (NUCAR). Even after graduating, Perhaad is still a member of NUCAR and is advising on research projects on performance analysis of parallel architectures. He received a BS in Electronics Engineering from University of Mumbai and an MS in Computer Engineering from Northeastern University in Boston. He is presently based in Boston. Dana Schaa received a BS in Computer Engineering from Cal Poly, San Luis Obispo, and an MS and PhD in Electrical and Computer Engineering from Northeastern University. He works on GPU architecture modeling at AMD, and has interests and expertise that include memory systems, microarchitecture, performance analysis, and general purpose computing on GPUs. His background includes the development OpenCL-based medical imaging applications ranging from real-time visualization of 3D ultrasound to CT image reconstruction in heterogeneous environments. Dana married his wonderful wife Jenny in 2010, and they live together in San Jose with their charming cats.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97801280164972e.html

Märksõnad:

E-raamat: Heterogeneous Computing with OpenCL 2.0

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Muu info

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv