Muutke küpsiste eelistusi

E-raamat: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition

(PhD, Senior Principal Engineer and Chief Architect of Knights Landing Processor, Intel), (Director and Programming Model Architect, Intel Corporation), (Principal Engineer and Visualization Lead, Intel Corporation)
  • Formaat: EPUB+DRM
  • Ilmumisaeg: 31-May-2016
  • Kirjastus: Morgan Kaufmann Publishers In
  • Keel: eng
  • ISBN-13: 9780128091951
  • Formaat - EPUB+DRM
  • Hind: 50,49 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: EPUB+DRM
  • Ilmumisaeg: 31-May-2016
  • Kirjastus: Morgan Kaufmann Publishers In
  • Keel: eng
  • ISBN-13: 9780128091951

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Intel Xeon Phi Processor High Performance Programming, Knights Landing Edition, Second Edition,is a practical guide to code development for Intel’s Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques that are essential to programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these techniques will increase your program performance on any system, and better prepare you for the Xeon Phi Knights Landing coprocessor.The book starts by providing a brief level setting overview of Intel’s “Knights Landing including Xeon Phi and Xeon architectures, and then quickly uses simple but informative code examples to explain the unique aspects of the new Knights Landing chipset. It then dives deeper into the meat of the hardware and software architecture that is behind the high performing examples, explaining the tools, development environment, and coding best practices to successfully leverage wide vectors, many cores, many threads, and high bandwidth cache/memory architecture.Discusses how to leverage parallel programming best practices on Intel Xeon Phi Knights LandingExplains portable, high-performance computing in a familiar and proven threaded, scalar-vector programming modelFeatures input from Intel insiders with key insights and under-the-hood tipsOffers new content and new examples demonstrating the KNL architectureIncludes downloadable source code and supplemental material from the books companion web page

Arvustused

"I believe you will find this book is an invaluable reference to help develop your own Unfair Advantage." James A. Ang, Ph.D., Manager, Exascale Computing Program, Sandia National Laboratories, New Mexico, USA

Muu info

Learn how to get the most out of Knights Landing, the Second Generation Intel Xeon Phi product family
Acknowledgments xiii
Foreword xvii
Preface xxiii
Section I Knights Landing
Chapter 1 Introduction
3(12)
Introduction to Many-Core Programming
4(1)
Trend: More Parallelism
4(2)
Why Intel® Xeon Phi™ Processors Are Needed
6(2)
Processors Versus Coprocessor
8(1)
Measuring Readiness for Highly Parallel Execution
9(1)
What About GPUs?
10(1)
Enjoy the Lack of Porting Needed but Still Tune!
10(1)
Transformation for Performance
11(1)
Hyper-Threading Versus Multithreading
11(1)
Programming Models
12(1)
Why We Could Skip To Section II Now
12(1)
For More Information
13(2)
Chapter 2 Knights Landing Overview
15(10)
Overview
15(1)
Instruction Set
16(1)
Architecture Overview
17(4)
Motivation: Our Vision and Purpose
21(2)
Summary
23(1)
For More Information
24(1)
Chapter 3 Programming MCDRAM and Cluster Modes
25(38)
Programming for Cluster Modes
26(1)
Programming for Memory Modes
27(18)
Query Memory Mode and MCDRAM Available
45(1)
SNC Performance Implications of Allocation and Threading
45(2)
How to Not Hard Code the NUMA Node Numbers
47(1)
Approaches to Determining What to Put in MCDRAM
48(8)
Why Rebooting Is Required to Change Modes
56(1)
BIOS
56(4)
Summary
60(1)
For More Information
60(3)
Chapter 4 Knights Landing Architecture
63(22)
Tile Architecture
63(8)
Cluster Modes
71(5)
Memory Interleaving
76(2)
Memory Modes
78(4)
Interactions of Cluster and Memory Modes
82(2)
Summary
84(1)
For More Information
84(1)
Chapter 5 Intel Omni-Path Fabric
85(22)
Overview
85(3)
Performance and Scalability
88(2)
Transport Layer APIs
90(2)
Quality of Service
92(3)
Virtual Fabrics
95(6)
Unicast Address Resolution
101(2)
Multicast Address Resolution
103(1)
Summary
104(1)
For More Information
105(2)
Chapter 6 µarch Optimization Advice
107(42)
Best Performance From 1, 2, or 4 Threads Per Core, Rarely 3
107(2)
Memory Subsystem
109(1)
µarch Nuances (Tile)
110(9)
Direct Mapped MCDRAM Cache
119(1)
Advice: Use AVX-512
120(24)
Summary
144(1)
For More Information
145(4)
Section II Parallel Programming
Chapter 7 Programming Overview for Knights Landing
149(6)
To Refactor, or Not to Refactor, That Is the Question
150(1)
Evolutionary Optimization of Applications
151(1)
Revolutionary Optimization of Applications
152(1)
Know When to Hold'em and When to Fold'em
153(1)
For More Information
154(1)
Chapter 8 Tasks and Threads
155(18)
OpenMP
157(5)
Fortran 2008
162(3)
Intel TBB
165(5)
hStreams
170(1)
Summary
171(1)
For More Information
172(1)
Chapter 9 Vectorization
173(40)
Why Vectorize?
174(1)
How to Vectorize
174(1)
Three Approaches to Achieving Vectorization
174(2)
Six-Step Vectorization Methodology
176(2)
Streaming Through Caches: Data Layout, Alignment, Prefetching, and so on
178(9)
Compiler Tips
187(3)
Compiler Options
190(2)
Compiler Directives
192(14)
Use Array Sections to Encourage Vectorization
206(3)
Look at What the Compiler Created: Assembly Code Inspection
209(2)
Numerical Result Variations with Vectorization
211(1)
Summary
211(1)
For More Information
211(2)
Chapter 10 Vectorization Advisor
213(38)
Getting Started with Intel Advisor for Knights Landing
214(2)
Enabling and Improving AVX-512 Code with the Survey Report
216(16)
Memory Access Pattern Report
232(1)
AVX-512 Gather/Scatter Profiler
233(3)
Mask Utilization and FLOPS Profiler
236(2)
Advisor Roofline Report
238(2)
Explore AVX-512 Code Characteristics Without AVX-512 Hardware
240(2)
Example - Analysis of a Computational Chemistry Code
242(8)
Summary
250(1)
For More Information
250(1)
Chapter 11 Vectorization with SDLT
251(18)
What Is SDLT?
251(1)
Getting Started
252(2)
SDLT Basics
254(2)
Example Normalizing 3d Points with SIMD
256(2)
What Is Wrong with AOS Memory Layout and SIMD?
258(1)
SIMD Prefers Unit-Stride Memory Accesses
259(1)
Alpha-Blended Overlay Reference
260(3)
Alpha-Blended Overlay With SDLT
263(3)
Additional Features
266(1)
Summary
266(1)
For More Information
267(2)
Chapter 12 Vectorization with AVX-512 Intrinsics
269(28)
What Are Intrinsics?
269(5)
AVX-512 Overview
274(3)
Migrating From Knights Corner
277(1)
AVX-512 Detection
278(3)
Learning AVX-512 Instructions
281(1)
Learning AVX-512 Intrinsics
281(2)
Step-by-Step Example Using AVX-512 Intrinsics
283(11)
Results Using Our Intrinsics Code
294(1)
For More Information
295(2)
Chapter 13 Performance Libraries
297(18)
Intel Performance Library Overview
297(2)
Intel Math Kernel Library Overview
299(1)
Intel Data Analytics Library Overview
300(2)
Together: MKL and DAAL
302(1)
Intel Integrated Performance Primitives Library Overview
303(2)
Intel Performance Libraries and Intel Compilers
305(1)
Native (Direct) Library Usage
306(2)
Offloading to Knights Landing While Using a Library
308(4)
Precision Choices and Variations
312(1)
Performance Tip for Faster Dynamic Libraries
313(1)
For More Information
314(1)
Chapter 14 Profiling and Timing
315(24)
Introduction to Knight Landing Tuning
315(1)
Event-Monitoring Registers
316(1)
Efficiency Metrics
317(6)
Potential Performance Issues
323(10)
Intel VTune Amplifier XE Product
333(1)
Performance Application Programming Interface
334(1)
MPI Analysis: ITAC
334(1)
HPCToolkit
335(1)
Tuning and Analysis Utilities
335(1)
Timing
335(2)
Summary
337(1)
For More Information
337(2)
Chapter 15 MPI
339(30)
Internode Parallelism
339(1)
MPI on Knights Landing
339(1)
MPI Overview
340(1)
How to Run MPI Applications
341(6)
Analyzing MPI Application Runs
347(5)
Tuning of MPI Applications
352(3)
Heterogeneous Clusters
355(2)
Recent Trends in MPI Coding
357(5)
Putting it all Together
362(3)
Summary
365(1)
For More Information
365(4)
Chapter 16 PGAS Programming Models
369(14)
To Share or not to Share
369(3)
Why Use PGAS on Knights Landing?
372(1)
Programming with PGAS
373(5)
Performance Evaluation
378(3)
Beyond PGAS
381(1)
Summary
381(1)
For More Information
382(1)
Chapter 17 Software-Defined Visualization
383(20)
Motivation for Software-Defined Visualization
384(3)
Software-Defined Visualization Architecture
387(1)
OpenSWR: OpenGL Raster-Graphics Software Rendering
388(2)
Embree: High-Performance Ray Tracing Kernel Library
390(2)
OSPRay: Scalable Ray Tracing Framework
392(7)
Summary
399(1)
Image Attributions
400(1)
For More Information
400(3)
Chapter 18 Offload to Knights Landing
403(10)
Offload Programming Model-Using with Knights Landing
403(1)
Processors Versus Coprocessor
404(1)
Offload Model Considerations
405(1)
OpenMP Target Directives
406(2)
Concurrent Host and Target Execution
408(2)
Offload Over Fabric
410(1)
Summary
411(1)
For More Information
411(2)
Chapter 19 Power Analysis
413(30)
Power Demand Gates Exascale
413(2)
Power 101
415(1)
Hardware-Based Power Analysis Techniques
416(3)
Software-Based Knights Landing Power Analyzer
419(10)
ManyCore Platform Software Package Power Tools
429(1)
Running Average Power Limit
430(4)
Performance Profiling on Knights Landing
434(2)
Intel Remote Management Module
436(2)
Summary
438(1)
For More Information
439(4)
Section III Pearls
Chapter 20 Optimizing Classical Molecular Dynamics in LAMMPS
443(28)
Molecular Dynamics
443(3)
LAMMPS
446(1)
Knights Landing Processors
447(2)
LAMMPS Optimizations
449(1)
Data Alignment
449(1)
Data Types and Layout
450(2)
Vectorization
452(7)
Neighbor List
459(3)
Long-Range Electrostatics
462(1)
MPI and OpenMP Parallelization
462(3)
Performance Results
465(1)
System, Build, and Run Configurations
465(1)
Workloads
466(1)
Organic Photovoltaic Molecules
467(1)
Hydrocarbon Mixtures
467(1)
Rhodopsin Protein in Solvated Lipid Bilayer
468(1)
Coarse Grain Liquid Crystal Simulation
468(1)
Coarse-Grain Water Simulation
468(1)
Summary
469(1)
Acknowledgment
470(1)
For More Information
470(1)
Chapter 21 High Performance Seismic Simulations
471(28)
High-Order Seismic Simulations
472(1)
Numerical Background
472(4)
Application Characteristics
476(8)
Intel Architecture as Compute Engine
484(1)
Highly-Efficient Small Matrix Kernels
484(1)
Sparse Matrix Kernel Generation and Sparse/Dense Kernel Selection
485(1)
Dense Matrix Kernel Generation: AVX2
486(1)
Dense Matrix Kernel Generation: AVX-512
487(2)
Kernel Performance Benchmarking
489(1)
Incorporating Knights Landing's Different Memory Subsystems
490(3)
Performance Evaluation
493(1)
Mount Merapi
493(2)
1992 Landers
495(2)
Summary and Take-Aways
497(1)
For More Information
498(1)
Chapter 22 Weather Research and Forecasting (WRF)
499(12)
WRF Overview
499(1)
WRF Execution Profile: Relatively Flat
500(1)
History of WRF on Intel Many-Core (Intel Xeon Phi Product Line)
500(1)
Our Early Experiences with WRF on Knights Landing
501(2)
Compiling WRF for Intel Xeon and Intel Xeon Phi Systems
503(1)
WRF CONUS12km Benchmark Performance
504(1)
MCDRAM Bandwidth
504(3)
Vectorization: Boost of AVX-512 Over AVX2
507(1)
Core Scaling
508(1)
Summary
509(1)
For More Information
509(2)
Chapter 23 N-Body simulation
511(16)
Parallel Programming for Noncomputer Scientists
511(1)
Step-by-Step Improvements
512(1)
N-Body Simulation
513(2)
Optimization
515(1)
Initial Implementation (Optimization Step 0)
515(1)
Thread Parallelism (Optimization Step 1)
516(2)
Scalar Performance Tuning (Optimization Step 2)
518(1)
Vectorization with SOA (Optimization Step 3)
519(2)
Memory Traffic (Optimization Step 4)
521(2)
Impact of MCDRAM on Performance
523(1)
Summary
524(1)
For More Information
525(2)
Chapter 24 Machine Learning
527(22)
Convolutional Neural Networks
528(10)
OverFeat-FAST Results
538(10)
For More Information
548(1)
Chapter 25 Trinity Workloads
549(32)
Out of the Box Performance
549(22)
Optimizing MiniGhost OpenMP Performance
571(7)
Summary
578(1)
For More Information
579(2)
Chapter 26 Quantum Chromodynamics
581(18)
LQCD
581(1)
The QPhiX Library and Code Generator
582(1)
Wilson-Dslash Operator
583(3)
Configuring the QPhiX Code Generator
586(3)
The Experimental Setup
589(1)
Results
590(7)
Conclusion
597(1)
For More Information
597(2)
Contributors 599(14)
Glossary 613(10)
Index 623
Jim Jeffers was the primary strategic planner and one of the first full-time employees on the program that became Intel ® MIC. He served as lead SW Engineering Manager on the program and formed and launched the SW development team. As the program evolved, he became the workloads (applications) and SW performance team manager. He has some of the deepest insight into the market, architecture and programming usages of the MIC product line. He has been a developer and development manager for embedded and high performance systems for close to 30 years. James Reinders is a senior engineer who joined Intel Corporation in 1989 and has contributed to projects including the worlds first TeraFLOP supercomputer (ASCI Red), as well as compilers and architecture work for a number of Intel processors and parallel systems. James has been a driver behind the development of Intel as a major provider of software development products, and serves as their chief software evangelist. James has published numerous articles, contributed to several books and is widely interviewed on parallelism. James has managed software development groups, customer service and consulting teams, business development and marketing teams. James is sought after to keynote on parallel programming, and is the author/co-author of three books currently in print including Structured Parallel Programming, published by Morgan Kaufmann in 2012. Avinash Sodani is the chief architect of the Knights Landing Xeon Phi Processor. He has many years of experience architecting high end processors and previously was one of the architects for the first Core(tm) processor codenamed Nehalem.