Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Pro TBB: C++ Parallel Programming with Threading Building Blocks

4.33/5 (7 hinnangut Goodreads-ist)

Rafael Asenjo, James Reinders, Michael Voss

Formaat: PDF+DRM
Ilmumisaeg: 09-Jul-2019
Kirjastus: APress
Keel: eng
ISBN-13: 9781484243985

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 4,08 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Ilmumisaeg: 09-Jul-2019
Kirjastus: APress
Keel: eng
ISBN-13: 9781484243985

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

This open access book is a modern guide for all C++ programmers to learn Intel Threading Building Blocks (TBB). Written by TBB and parallel programming experts, this book reflects their collective decades of experience in developing and teaching parallel programming with TBB, offering their insights in an approachable manner. Throughout the book the authors present numerous examples and best practices to help you become an effective TBB programmer and leverage the power of parallel systems.

Pro Intel Threading Building Blocks starts with the basics, explaining parallel algorithms and C++'s built-in standard template library for parallelism. You'll learn the key concepts of managing memory, working with data structures and how to handle typical issues with synchronization. Later chapters apply these ideas to complex systems to explain performance tradeoffs, mapping common parallel patterns, controlling threads and overhead, and extending TBB to program heterogeneous systems or system-on-chips.

What You'll Learn

Use Threading Building Blocks to produce code that is portable, simple, scalable, and more understandable
Review best practices for parallelizing computationally intensive tasks in your applications
Integrate TBB with other threading packages
Create scalable, high performance data-parallel programs
Work with generic programming to write efficient algorithms

Who This Book Is For

C++ programmers learning to run applications on multicore systems, as well as C or C++ programmers without much experience with templates. No previous experience with parallel programming or multicore processors is required.

Arvustused

Pro TBB is an invaluable book . The book provides comprehensive coverage of a full-fledged model of parallelism. Besides the TBB constructs, various mechanisms that address issues of exception handling, task partitioning, concurrent data structures, mutual exclusion, granularity, and task-thread affinity are elaborated and evaluated in great detail. The first part of the book is a light introduction to TBB, and the second part provides an in-depth presentation with examples and a performance analysis of TBB constructs. (B. Belkhouche, Computing Reviews, July 29, 2021)

About the Authors

Acknowledgments

xvii

Preface

xix

Part 1

(248)

Chapter 1 Jumping Right In: "Hello, TBB!"

(30)

Why Threading Building Blocks?

(4)

Performance: Small Overhead, Big Benefits for C++

(1)

Evolving Support for Parallelism in TBB and C++

(1)

Recent C++ Additions for Parallelism

(1)

The Threading Building Blocks (TBB) Library

(4)

Parallel Execution Interfaces

(2)

Interfaces That Are Independent of the Execution Model

(1)

Using the Building Blocks in TBB

(1)

Let's Get Started Already!

(10)

Getting the Threading Building Blocks (TBB) Library

(1)

Getting a Copy of the Examples

(1)

Writing a First "Hello, TBB!" Example

(3)

Building the Simple Examples

(1)

Building on Windows Using Microsoft Visual Studio

(1)

Building on a Linux Platform from a Terminal

(4)

A More Complete Example

(12)

Starting with a Serial Implementation

(4)

Adding a Message-Driven Layer Using a Flow Graph

(2)

Adding a Fork-Join Layer Using a parallel_for

(2)

Adding a SIMD Layer Using a Parallel STL Transform

(4)

Chapter 2 Generic Parallel Algorithms

(46)

Functional /Task Parallelism

(5)

A Slightly More Complicated Example: A Parallel Implementation of Quicksort

(2)

Loops: parallel_for, parallel_reduce, and parallel_scan

(15)

parallel_for: Applying a Body to Each Element in a Range

(4)

parallel_reduce: Calculating a Single Result Across a Range

(6)

parallel_scan: A Reduction with Intermediate Values

(2)

How Does This Work?

(2)

A Slightly More Complicated Example: Line of Sight

(1)

Cook Until Done: parallel_do and parallel_pipeline

(22)

parallel_do: Apply a Body Until There Are No More Items Left

(9)

parallel_pipeline: Streaming Items Through a Series of Filters

(12)

Chapter 3 Flow Graphs

(30)

Why Use Graphs to Express Parallelism?

(2)

The Basics of the TBB Flow Graph Interface

(9)

Step 1: Create the Graph Object

(1)

Step 2: Make the Nodes

(3)

Step 3: Add Edges

(2)

Step 4: Start the Graph

(2)

Step 5: Wait for the Graph to Complete Executing

(1)

A More Complicated Example of a Data Flow Graph

(6)

Implementing the Example as a TBB Flow Graph

(3)

Understanding the Performance of a Data Flow Graph

(1)

The Special Case of Dependency Graphs

(9)

Implementing a Dependency Graph

(6)

Estimating the Scalability of a Dependency Graph

105

(1)

Advanced Topics in TBB Flow Graphs

106

(3)

Chapter 4 TBB and the Parallel Algorithms of the C++ Standard Template Library

109

(28)

Does the C++ STL Library Belong in This Book?

110

(2)

A Parallel STL Execution Policy Analogy

112

(1)

A Simple Example Using std::for_each

113

(4)

What Algorithms Are Provided in a Parallel STL Implementation?

117

(3)

How to Get and Use a Copy of Parallel STL That Uses TBB

117

(1)

Algorithms in Intel's Parallel STL

118

(2)

Capturing More Use Cases with Custom Iterators

120

(4)

Highlighting Some of the Most Useful Algorithms

124

(6)

std::for_each, std::for_each_n

124

(2)

std::transform

126

(1)

std::reduce

127

(1)

std::transform_reduce

128

(2)

A Deeper Dive into the Execution Policies

130

(2)

The sequenced_policy

131

(1)

The parallel_policy

131

(1)

The unsequenced_policy

132

(1)

The parallel_unsequenced_policy

132

(1)

Which Execution Policy Should We Use?

132

(2)

Other Ways to Introduce SIMD Parallelism

134

(3)

Chapter 5 Synchronization: Why and How to Avoid It

137

(42)

A Running Example: Histogram of an Image

138

(3)

An Unsafe Parallel Implementation

141

(4)

A First Safe Parallel Implementation: Coarse-Grained Locking

145

(8)

Mutex Flavors

151

(2)

A Second Safe Parallel Implementation: Fine-Grained Locking

153

(5)

A Third Safe Parallel Implementation: Atomics

158

(5)

A Better Parallel Implementation: Privatization and Reduction

163

(7)

Thread Local Storage, TLS

164

(1)

enumerable_thread_specific, ETS

165

(3)

combinable

168

(2)

The Easiest Parallel Implementation: Reduction Template

170

(2)

Recap of Our Options

172

(7)

Chapter 6 Data Structures for Concurrency

179

(28)

Key Data Structures Basics

180

(2)

Unordered Associative Containers

180

(1)

Map vs. Set

181

(1)

Multiple Values

181

(1)

Hashing

181

(1)

Unordered

182

(1)

Concurrent Containers

182

(25)

Concurrent Unordered Associative Containers

185

(8)

Concurrent Queues: Regular, Bounded, and Priority

193

(9)

Concurrent Vector

202

(5)

Chapter 7 Scalable Memory Allocation

207

(26)

Modern C++ Memory Allocation

208

(1)

Scalable Memory Allocation: What

209

(1)

Scalable Memory Allocation: Why

209

(3)

Avoiding False Sharing with Padding

210

(2)

Scalable Memory Allocation Alternatives: Which

212

(2)

Compilation Considerations

214

(1)

Most Popular Usage (C/C++ Proxy Library): How

214

(6)

Linux: malloc/new Proxy Library Usage

216

(1)

macOS: malloc/new Proxy Library Usage

216

(1)

Windows: malloc/new Proxy Library Usage

217

(1)

Testing our Proxy Library Usage

218

(2)

C Functions: Scalable Memory Allocators for C

220

(1)

C++ Classes: Scalable Memory Allocators for C++

221

(1)

Allocators with std::allocator<T> Signature

222

(1)

scalable_allocator

222

(1)

tbb_allocator

222

(1)

zero_allocator

223

(1)

cached_aligned_allocator

223

(1)

Memory Pool Support: memory_pool_allocator

223

(1)

Array Allocation Support: aligned_space

224

(1)

Replacing new and delete Selectively

224

(4)

Performance Tuning: Some Control Knobs

228

(5)

What Are Huge Pages?

228

(1)

TBB Support for Huge Pages

228

(1)

scalable_allocation_mode(int mode, intptr_t value)

229

(1)

TBBMALLOC_USE_HUGE_PAGES

229

(1)

TBBMALLOC_SET_SOFT_HEAP_LIMIT

230

(1)

int scalable_allocation_command(int cmd, void *param)

230

(1)

TBBMALLOC_CLEAN_ALL_BUFFERS

230

(1)

TBBMALLOC_CLEAN_THREAD_BUFFERS

230

(3)

Chapter 8 Mapping Parallel Patterns to TBB

233

(16)

Parallel Patterns vs. Parallel Algorithms

233

(2)

Patterns Categorize Algorithms, Designs, etc.

235

(1)

Patterns That Work

236

(1)

Data Parallelism Wins

237

(1)

Nesting Pattern

238

(1)

Map Pattern

239

(1)

Workpile Pattern

240

(1)

Reduction Patterns (Reduce and Scan)

241

(2)

Fork-Join Pattern

243

(1)

Divide-and-Conquer Pattern

244

(1)

Branch-and-Bound Pattern

244

(2)

Pipeline Pattern

246

(1)

Event-Based Coordination Pattern (Reactive Streams)

247

(2)

Part 2

249

(356)

Chapter 9 The Pillars of Composability

251

(26)

What Is Composability?

253

(6)

Nested Composition

254

(2)

Concurrent Composition

256

(2)

Serial Composition

258

(1)

The Features That Make TBB a Composable Library

259

(11)

The TBB Thread Pool (the Market) and Task Arenas

260

(3)

The TBB Task Dispatcher: Work Stealing and More

263

(7)

Putting It All Together

270

(4)

Looking Forward

274

(3)

Controlling the Number of Threads

274

(1)

Work Isolation

274

(1)

Task-to-Thread and Thread-to-Core Affinity

275

(1)

Task Priorities

275

(2)

Chapter 10 Using Tasks to Create Your Own Algorithms

277

(36)

A Running Example: The Sequence

278

(2)

The High-Level Approach: parallel_invoke

280

(2)

The Highest Among the Lower: task_group

282

(2)

The Low-Level Task Interface: Part One - Task Blocking

284

(6)

The Low-Level Task Interface: Part Two - Task Continuation

290

(7)

Bypassing the Scheduler

297

(1)

The Low-Level Task Interface: Part Three - Task Recycling

297

(3)

Task Interface Checklist

300

(1)

One More Thing: FIFO (aka Fire-and-Forget) Tasks

301

(1)

Putting These Low-Level Features to Work

302

(11)

Chapter 11 Controlling the Number of Threads Used for Execution

313

(24)

A Brief Recap of the TBB Scheduler Architecture

314

(1)

Interfaces for Controlling the Number of Threads

315

(5)

Controlling Thread Count with task_scheduler_init

315

(1)

Controlling Thread Count with task_arena

316

(2)

Controlling Thread Count with global_control

318

(1)

Summary of Concepts and Classes

318

(2)

The Best Approaches for Setting the Number of Threads

320

(12)

Using a Single task_scheduler_init Object for a Simple Application

320

(3)

Using More Than One task_scheduler_init Object in a Simple Application

323

(2)

Using Multiple Arenas with Different Numbers of Slots to Influence Where TBB Places Its Worker Threads

325

(4)

Using global_control to Control How Many Threads Are Available to Fill Arena Slots

329

(1)

Using global_control to Temporarily Restrict the Number of Available Threads

330

(2)

When NOT to Control the Number of Threads

332

(2)

Figuring Out What's Gone Wrong

334

(3)

Chapter 12 Using Work Isolation for Correctness and Performance

337

(20)

Work Isolation for Correctness

338

(11)

Creating an Isolated Region with this_task_arena::isolate

343

(6)

Using Task Arenas for Isolation: A Double-Edged Sword

349

(8)

Don't Be Tempted to Use task_arenas to Create Work Isolation for Correctness

353

(4)

Chapter 13 Creating Thread-to-Core and Task-to-Thread Affinity

357

(16)

Creating Thread-to-Core Affinity

358

(4)

Creating Task-to-Thread Affinity

362

(8)

When and How Should We Use the TBB Affinity Features?

370

(3)

Chapter 14 Using Task Priorities

373

(14)

Support for Non-Preemptive Priorities in the TBB Task Class

374

(2)

Setting Static and Dynamic Priorities

376

(1)

Two Small Examples

377

(5)

Implementing Priorities Without Using TBB Task Support

382

(5)

Chapter 15 Cancellation and Exception Handling

387

(24)

How to Cancel Collective Work

388

(2)

Advanced Task Cancellation

390

(9)

Explicit Assignment of TGC

392

(3)

Default Assignment of TGC

395

(4)

Exception Handling in TBB

399

(3)

Tailoring Our Own TBB Exceptions

402

(3)

Putting All Together: Composability, Cancellation, and Exception Handling

405

(6)

Chapter 16 Tuning TBB Algorithms: Granularity, Locality, Parallelism, and Determinism

411

(40)

Task Granularity: How Big Is Big Enough?

412

(1)

Choosing Ranges and Partitioners for Loops

413

(20)

An Overview of Partitioners

415

(2)

Choosing a Grainsize (or Not) to Manage Task Granularity

417

(3)

Ranges, Partitioners, and Data Cache Performance

420

(8)

Using a static_partitioner

428

(3)

Restricting the Scheduler for Determinism

431

(2)

Tuning TBB Pipelines: Number of Filters, Modes, and Tokens

433

(6)

Understanding a Balanced Pipeline

434

(2)

Understanding an Imbalanced Pipeline

436

(2)

Pipelines and Data Locality and Thread Affinity

438

(1)

Deep in the Weeds

439

(12)

Making Your Own Range Type

439

(3)

The Pipeline Class and Thread-Bound Filters

442

(9)

Chapter 17 Flow Graphs: Beyond the Basics

451

(62)

Optimizing for Granularity, Locality, and Parallelism

452

(28)

Node Granularity: How Big Is Big Enough?

452

(10)

Memory Usage and Data Locality

462

(15)

Task Arenas and Flow Graph

477

(3)

Key FG Advice: Dos and Don'ts

480

(21)

Do: Use Nested Parallelism

480

(1)

Don't: Use Multifunction Nodes in Place of Nested Parallelism

481

(1)

Do: Use join_node, sequencer_node, or multifunction_node to Reestablish Order in a Flow Graph When Needed

481

(4)

Do: Use the Isolate Function for Nested Parallelism

485

(3)

Do: Use Cancellation and Exception Handling in Flow Graphs

488

(4)

Do: Set a Priority for a Graph Using task_group_context

492

(1)

Don't: Make an Edge Between Nodes in Different Graphs

492

(3)

Do: Use try_put to Communicate Across Graphs

495

(2)

Do: Use composite_node to Encapsulate Groups of Nodes

497

(4)

Introducing Intel Advisor: Flow Graph Analyzer

501

(12)

The FGA Design Workflow

502

(3)

The FGA Analysis Workflow

505

(2)

Diagnosing Performance Issues with FGA

507

(6)

Chapter 18 Beef Up Flow Graphs with Async Nodes

513

(22)

Async World Example

514

(5)

Why and When async_node?

519

(2)

A More Realistic Example

521

(14)

Chapter 19 Flow Graphs on Steroids: OpenCL Nodes

535

(46)

Hello OpenCL_Node Example

536

(8)

Where Are We Running Our Kernel?

544

(7)

Back to the More Realistic Example of
Chapter 18

551

(10)

The Devil Is in the Details

561

(9)

The NDRange Concept

562

(6)

Playing with the Offset

568

(1)

Specifying the OpenCL Kernel

569

(1)

Even More on Device Selection

570

(4)

A Warning Regarding the Order Is in Order!

574

(7)

Chapter 20 TBB on NUMA Architectures

581

(24)

Discovering Your Platform Topology

583

(12)

Understanding the Costs of Accessing Memory

587

(1)

Our Baseline Example

588

(1)

Mastering Data Placement and Processor Affinity

589

(6)

Putting hwloc and TBB to Work Together

595

(6)

More Advanced Alternatives

601

(4)

Appendix A: History and Inspiration

605

(18)

A Decade of "Hatchling to Soaring"

605

(6)

1 TBB's Revolution Inside Intel

605

(1)

2 TBB's First Revolution of Parallelism

606

(1)

3 TBB's Second Revolution of Parallelism

607

(1)

4 TBB's Birds

608

(3)

Inspiration for TBB

611

(12)

Relaxed Sequential Execution Model

612

(1)

Influential Libraries

613

(1)

Influential Languages

614

(1)

Influential Pragmas

615

(1)

Influences of Generic Programming

615

(1)

Considering Caches

616

(1)

Considering Costs of Time Slicing

617

(1)

E-raamat: Pro TBB: C++ Parallel Programming with Threading Building Blocks

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv