Klienditugi: 7440010 (E-R 10-18)

E-raamat: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition

3.60/5 (13 hinnangut Goodreads-ist)

Avinash Sodani (PhD, Senior Principal Engineer and Chief Architect of Knights Landing Processor, Intel), James Reinders (Director and Programming Model Architect, Intel Corporation), James Jeffers (Principal Engineer and Visualization Lead, Intel Corporation)

Formaat: EPUB+DRM
Ilmumisaeg: 31-May-2016
Kirjastus: Morgan Kaufmann Publishers In
Keel: eng
ISBN-13: 9780128091951

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 50,49 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: EPUB+DRM
Ilmumisaeg: 31-May-2016
Kirjastus: Morgan Kaufmann Publishers In
Keel: eng
ISBN-13: 9780128091951

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Intel Xeon Phi Processor High Performance Programming, Knights Landing Edition, Second Edition,is a practical guide to code development for Intel’s Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques that are essential to programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these techniques will increase your program performance on any system, and better prepare you for the Xeon Phi Knights Landing coprocessor.The book starts by providing a brief level setting overview of Intel’s “Knights Landing including Xeon Phi and Xeon architectures, and then quickly uses simple but informative code examples to explain the unique aspects of the new Knights Landing chipset. It then dives deeper into the meat of the hardware and software architecture that is behind the high performing examples, explaining the tools, development environment, and coding best practices to successfully leverage wide vectors, many cores, many threads, and high bandwidth cache/memory architecture.Discusses how to leverage parallel programming best practices on Intel Xeon Phi Knights LandingExplains portable, high-performance computing in a familiar and proven threaded, scalar-vector programming modelFeatures input from Intel insiders with key insights and under-the-hood tipsOffers new content and new examples demonstrating the KNL architectureIncludes downloadable source code and supplemental material from the books companion web page

Arvustused

"I believe you will find this book is an invaluable reference to help develop your own Unfair Advantage." James A. Ang, Ph.D., Manager, Exascale Computing Program, Sandia National Laboratories, New Mexico, USA

Muu info

Learn how to get the most out of Knights Landing, the Second Generation Intel Xeon Phi product family

Acknowledgments

xiii

Foreword

xvii

Preface

xxiii

Section I Knights Landing

Chapter 1 Introduction

(12)

Introduction to Many-Core Programming

(1)

Trend: More Parallelism

(2)

Why Intel® Xeon Phi™ Processors Are Needed

(2)

Processors Versus Coprocessor

(1)

Measuring Readiness for Highly Parallel Execution

(1)

What About GPUs?

(1)

Enjoy the Lack of Porting Needed but Still Tune!

(1)

Transformation for Performance

(1)

Hyper-Threading Versus Multithreading

(1)

Programming Models

(1)

Why We Could Skip To Section II Now

(1)

For More Information

(2)

Chapter 2 Knights Landing Overview

(10)

Overview

(1)

Instruction Set

(1)

Architecture Overview

(4)

Motivation: Our Vision and Purpose

(2)

Summary

(1)

For More Information

(1)

Chapter 3 Programming MCDRAM and Cluster Modes

(38)

Programming for Cluster Modes

(1)

Programming for Memory Modes

(18)

Query Memory Mode and MCDRAM Available

(1)

SNC Performance Implications of Allocation and Threading

(2)

How to Not Hard Code the NUMA Node Numbers

(1)

Approaches to Determining What to Put in MCDRAM

(8)

Why Rebooting Is Required to Change Modes

(1)

BIOS

(4)

Summary

(1)

For More Information

(3)

Chapter 4 Knights Landing Architecture

(22)

Tile Architecture

(8)

Cluster Modes

(5)

Memory Interleaving

(2)

Memory Modes

(4)

Interactions of Cluster and Memory Modes

(2)

Summary

(1)

For More Information

(1)

Chapter 5 Intel Omni-Path Fabric

(22)

Overview

(3)

Performance and Scalability

(2)

Transport Layer APIs

(2)

Quality of Service

(3)

Virtual Fabrics

(6)

Unicast Address Resolution

101

(2)

Multicast Address Resolution

103

(1)

Summary

104

(1)

For More Information

105

(2)

Chapter 6 µarch Optimization Advice

107

(42)

Best Performance From 1, 2, or 4 Threads Per Core, Rarely 3

107

(2)

Memory Subsystem

109

(1)

µarch Nuances (Tile)

110

(9)

Direct Mapped MCDRAM Cache

119

(1)

Advice: Use AVX-512

120

(24)

Summary

144

(1)

For More Information

145

(4)

Section II Parallel Programming

Chapter 7 Programming Overview for Knights Landing

149

(6)

To Refactor, or Not to Refactor, That Is the Question

150

(1)

Evolutionary Optimization of Applications

151

(1)

Revolutionary Optimization of Applications

152

(1)

Know When to Hold'em and When to Fold'em

153

(1)

For More Information

154

(1)

Chapter 8 Tasks and Threads

155

(18)

OpenMP

157

(5)

Fortran 2008

162

(3)

Intel TBB

165

(5)

hStreams

170

(1)

Summary

171

(1)

For More Information

172

(1)

Chapter 9 Vectorization

173

(40)

Why Vectorize?

174

(1)

How to Vectorize

174

(1)

Three Approaches to Achieving Vectorization

174

(2)

Six-Step Vectorization Methodology

176

(2)

Streaming Through Caches: Data Layout, Alignment, Prefetching, and so on

178

(9)

Compiler Tips

187

(3)

Compiler Options

190

(2)

Compiler Directives

192

(14)

Use Array Sections to Encourage Vectorization

206

(3)

Look at What the Compiler Created: Assembly Code Inspection

209

(2)

Numerical Result Variations with Vectorization

211

(1)

Summary

211

(1)

For More Information

211

(2)

Chapter 10 Vectorization Advisor

213

(38)

Getting Started with Intel Advisor for Knights Landing

214

(2)

Enabling and Improving AVX-512 Code with the Survey Report

216

(16)

Memory Access Pattern Report

232

(1)

AVX-512 Gather/Scatter Profiler

233

(3)

Mask Utilization and FLOPS Profiler

236

(2)

Advisor Roofline Report

238

(2)

Explore AVX-512 Code Characteristics Without AVX-512 Hardware

240

(2)

Example - Analysis of a Computational Chemistry Code

242

(8)

Summary

250

(1)

For More Information

250

(1)

Chapter 11 Vectorization with SDLT

251

(18)

What Is SDLT?

251

(1)

Getting Started

252

(2)

SDLT Basics

254

(2)

Example Normalizing 3d Points with SIMD

256

(2)

What Is Wrong with AOS Memory Layout and SIMD?

258

(1)

SIMD Prefers Unit-Stride Memory Accesses

259

(1)

Alpha-Blended Overlay Reference

260

(3)

Alpha-Blended Overlay With SDLT

263

(3)

Additional Features

266

(1)

Summary

266

(1)

For More Information

267

(2)

Chapter 12 Vectorization with AVX-512 Intrinsics

269

(28)

What Are Intrinsics?

269

(5)

AVX-512 Overview

274

(3)

Migrating From Knights Corner

277

(1)

AVX-512 Detection

278

(3)

Learning AVX-512 Instructions

281

(1)

Learning AVX-512 Intrinsics

281

(2)

Step-by-Step Example Using AVX-512 Intrinsics

283

(11)

Results Using Our Intrinsics Code

294

(1)

For More Information

295

(2)

Chapter 13 Performance Libraries

297

(18)

Intel Performance Library Overview

297

(2)

Intel Math Kernel Library Overview

299

(1)

Intel Data Analytics Library Overview

300

(2)

Together: MKL and DAAL

302

(1)

Intel Integrated Performance Primitives Library Overview

303

(2)

Intel Performance Libraries and Intel Compilers

305

(1)

Native (Direct) Library Usage

306

(2)

Offloading to Knights Landing While Using a Library

308

(4)

Precision Choices and Variations

312

(1)

Performance Tip for Faster Dynamic Libraries

313

(1)

For More Information

314

(1)

Chapter 14 Profiling and Timing

315

(24)

Introduction to Knight Landing Tuning

315

(1)

Event-Monitoring Registers

316

(1)

Efficiency Metrics

317

(6)

Potential Performance Issues

323

(10)

Intel VTune Amplifier XE Product

333

(1)

Performance Application Programming Interface

334

(1)

MPI Analysis: ITAC

334

(1)

HPCToolkit

335

(1)

Tuning and Analysis Utilities

335

(1)

Timing

335

(2)

Summary

337

(1)

For More Information

337

(2)

Chapter 15 MPI

339

(30)

Internode Parallelism

339

(1)

MPI on Knights Landing

339

(1)

MPI Overview

340

(1)

How to Run MPI Applications

341

(6)

Analyzing MPI Application Runs

347

(5)

Tuning of MPI Applications

352

(3)

Heterogeneous Clusters

355

(2)

Recent Trends in MPI Coding

357

(5)

Putting it all Together

362

(3)

Summary

365

(1)

For More Information

365

(4)

Chapter 16 PGAS Programming Models

369

(14)

To Share or not to Share

369

(3)

Why Use PGAS on Knights Landing?

372

(1)

Programming with PGAS

373

(5)

Performance Evaluation

378

(3)

Beyond PGAS

381

(1)

Summary

381

(1)

For More Information

382

(1)

Chapter 17 Software-Defined Visualization

383

(20)

Motivation for Software-Defined Visualization

384

(3)

Software-Defined Visualization Architecture

387

(1)

OpenSWR: OpenGL Raster-Graphics Software Rendering

388

(2)

Embree: High-Performance Ray Tracing Kernel Library

390

(2)

OSPRay: Scalable Ray Tracing Framework

392

(7)

Summary

399

(1)

Image Attributions

400

(1)

For More Information

400

(3)

Chapter 18 Offload to Knights Landing

403

(10)

Offload Programming Model-Using with Knights Landing

403

(1)

Processors Versus Coprocessor

404

(1)

Offload Model Considerations

405

(1)

OpenMP Target Directives

406

(2)

Concurrent Host and Target Execution

408

(2)

Offload Over Fabric

410

(1)

Summary

411

(1)

For More Information

411

(2)

Chapter 19 Power Analysis

413

(30)

Power Demand Gates Exascale

413

(2)

Power 101

415

(1)

Hardware-Based Power Analysis Techniques

416

(3)

Software-Based Knights Landing Power Analyzer

419

(10)

ManyCore Platform Software Package Power Tools

429

(1)

Running Average Power Limit

430

(4)

Performance Profiling on Knights Landing

434

(2)

Intel Remote Management Module

436

(2)

Summary

438

(1)

For More Information

439

(4)

Section III Pearls

Chapter 20 Optimizing Classical Molecular Dynamics in LAMMPS

443

(28)

Molecular Dynamics

443

(3)

LAMMPS

446

(1)

Knights Landing Processors

447

(2)

LAMMPS Optimizations

449

(1)

Data Alignment

449

(1)

Data Types and Layout

450

(2)

Vectorization

452

(7)

Neighbor List

459

(3)

Long-Range Electrostatics

462

(1)

MPI and OpenMP Parallelization

462

(3)

Performance Results

465

(1)

System, Build, and Run Configurations

465

(1)

Workloads

466

(1)

Organic Photovoltaic Molecules

467

(1)

Hydrocarbon Mixtures

467

(1)

Rhodopsin Protein in Solvated Lipid Bilayer

468

(1)

Coarse Grain Liquid Crystal Simulation

468

(1)

Coarse-Grain Water Simulation

468

(1)

Summary

469

(1)

Acknowledgment

470

(1)

For More Information

470

(1)

Chapter 21 High Performance Seismic Simulations

471

(28)

High-Order Seismic Simulations

472

(1)

Numerical Background

472

(4)

Application Characteristics

476

(8)

Intel Architecture as Compute Engine

484

(1)

Highly-Efficient Small Matrix Kernels

484

(1)

Sparse Matrix Kernel Generation and Sparse/Dense Kernel Selection

485

(1)

Dense Matrix Kernel Generation: AVX2

486

(1)

Dense Matrix Kernel Generation: AVX-512

487

(2)

Kernel Performance Benchmarking

489

(1)

Incorporating Knights Landing's Different Memory Subsystems

490

(3)

Performance Evaluation

493

(1)

Mount Merapi

493

(2)

1992 Landers

495

(2)

Summary and Take-Aways

497

(1)

For More Information

498

(1)

Chapter 22 Weather Research and Forecasting (WRF)

499

(12)

WRF Overview

499

(1)

WRF Execution Profile: Relatively Flat

500

(1)

History of WRF on Intel Many-Core (Intel Xeon Phi Product Line)

500

(1)

Our Early Experiences with WRF on Knights Landing

501

(2)

Compiling WRF for Intel Xeon and Intel Xeon Phi Systems

503

(1)

WRF CONUS12km Benchmark Performance

504

(1)

MCDRAM Bandwidth

504

(3)

Vectorization: Boost of AVX-512 Over AVX2

507

(1)

Core Scaling

508

(1)

Summary

509

(1)

For More Information

509

(2)

Chapter 23 N-Body simulation

511

(16)

Parallel Programming for Noncomputer Scientists

511

(1)

Step-by-Step Improvements

512

(1)

N-Body Simulation

513

(2)

Optimization

515

(1)

Initial Implementation (Optimization Step 0)

515

(1)

Thread Parallelism (Optimization Step 1)

516

(2)

Scalar Performance Tuning (Optimization Step 2)

518

(1)

Vectorization with SOA (Optimization Step 3)

519

(2)

Memory Traffic (Optimization Step 4)

521

(2)

Impact of MCDRAM on Performance

523

(1)

Summary

524

(1)

For More Information

525

(2)

Chapter 24 Machine Learning

527

(22)

Convolutional Neural Networks

528

(10)

OverFeat-FAST Results

538

(10)

For More Information

548

(1)

Chapter 25 Trinity Workloads

549

(32)

Out of the Box Performance

549

(22)

Optimizing MiniGhost OpenMP Performance

571

(7)

Summary

578

(1)

For More Information

579

(2)

Chapter 26 Quantum Chromodynamics

581

(18)

LQCD

581

(1)

The QPhiX Library and Code Generator

582

(1)

Wilson-Dslash Operator

583

(3)

Configuring the QPhiX Code Generator

586

(3)

The Experimental Setup

589

(1)

Results

590

(7)

Conclusion

597

(1)

For More Information

597

(2)

Contributors

599

(14)

Glossary

613

(10)

Index

623

Jim Jeffers was the primary strategic planner and one of the first full-time employees on the program that became Intel ® MIC. He served as lead SW Engineering Manager on the program and formed and launched the SW development team. As the program evolved, he became the workloads (applications) and SW performance team manager. He has some of the deepest insight into the market, architecture and programming usages of the MIC product line. He has been a developer and development manager for embedded and high performance systems for close to 30 years. James Reinders is a senior engineer who joined Intel Corporation in 1989 and has contributed to projects including the worlds first TeraFLOP supercomputer (ASCI Red), as well as compilers and architecture work for a number of Intel processors and parallel systems. James has been a driver behind the development of Intel as a major provider of software development products, and serves as their chief software evangelist. James has published numerous articles, contributed to several books and is widely interviewed on parallelism. James has managed software development groups, customer service and consulting teams, business development and marketing teams. James is sought after to keynote on parallel programming, and is the author/co-author of three books currently in print including Structured Parallel Programming, published by Morgan Kaufmann in 2012. Avinash Sodani is the chief architect of the Knights Landing Xeon Phi Processor. He has many years of experience architecting high end processors and previously was one of the architects for the first Core(tm) processor codenamed Nehalem.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97801280919516e.html

Märksõnad:

High performance processors

E-raamat: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Muu info

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv