About the Author |
|
xi | |
About the Technical Reviewer |
|
xiii | |
Acknowledgments |
|
xv | |
Introduction |
|
xvii | |
|
Chapter 1 SIMD Fundamentals |
|
|
1 | (16) |
|
|
1 | (3) |
|
Historical Overview of x86 SIMD |
|
|
4 | (1) |
|
|
5 | (2) |
|
|
7 | (5) |
|
|
7 | (3) |
|
SIMD Floating-Point Arithmetic |
|
|
10 | (2) |
|
SIMD Data Manipulation Operations |
|
|
12 | (3) |
|
|
15 | (1) |
|
|
16 | (1) |
|
Chapter 2 AVX C++ Programming: Part 1 |
|
|
17 | (36) |
|
|
17 | (13) |
|
|
17 | (5) |
|
|
22 | (2) |
|
|
24 | (6) |
|
Integer Bitwise Logical and Shift Operations |
|
|
30 | (5) |
|
Bitwise Logical Operations |
|
|
30 | (3) |
|
|
33 | (2) |
|
C++ SIMD Intrinsic Function Naming Conventions |
|
|
35 | (2) |
|
Image Processing Algorithms |
|
|
37 | (13) |
|
Pixel Minimum and Maximum |
|
|
37 | (8) |
|
|
45 | (5) |
|
|
50 | (3) |
|
Chapter 3 AVX C++ Programming: Part 2 |
|
|
53 | (44) |
|
Floating-Point Operations |
|
|
53 | (15) |
|
Floating-Point Arithmetic |
|
|
53 | (6) |
|
|
59 | (5) |
|
Floating-Point Conversions |
|
|
64 | (4) |
|
|
68 | (20) |
|
Mean and Standard Deviation |
|
|
69 | (9) |
|
|
78 | (10) |
|
|
88 | (7) |
|
|
89 | (6) |
|
|
95 | (2) |
|
Chapter 4 AVX2 C++ Programming: Part 1 |
|
|
97 | (48) |
|
|
97 | (16) |
|
|
97 | (5) |
|
|
102 | (5) |
|
|
107 | (6) |
|
|
113 | (29) |
|
|
114 | (5) |
|
|
119 | (9) |
|
|
128 | (9) |
|
|
137 | (5) |
|
|
142 | (3) |
|
Chapter 5 AVX2 C++ Programming: Part 2 |
|
|
145 | (44) |
|
|
145 | (6) |
|
|
146 | (5) |
|
|
151 | (37) |
|
|
152 | (9) |
|
Matrix (4 × 4) Multiplication |
|
|
161 | (8) |
|
Matrix (4 × 4) Vector Multiplication |
|
|
169 | (12) |
|
|
181 | (7) |
|
|
188 | (1) |
|
Chapter 6 AVX2 C++ Programming: Part 3 |
|
|
189 | (34) |
|
|
189 | (5) |
|
|
189 | (3) |
|
|
192 | (2) |
|
|
194 | (12) |
|
|
206 | (16) |
|
|
206 | (9) |
|
|
215 | (7) |
|
|
222 | (1) |
|
Chapter 7 AVX-512 C++ Programming: Part 1 |
|
|
223 | (36) |
|
|
223 | (1) |
|
|
224 | (13) |
|
|
224 | (6) |
|
Merge Masking and Zero Masking |
|
|
230 | (7) |
|
|
237 | (18) |
|
|
237 | (3) |
|
|
240 | (7) |
|
|
247 | (8) |
|
|
255 | (4) |
|
Chapter 8 AVX-512 C++ Programming: Part 2 |
|
|
259 | (44) |
|
Floating-Point Arithmetic |
|
|
259 | (10) |
|
|
259 | (6) |
|
|
265 | (4) |
|
|
269 | (3) |
|
|
272 | (17) |
|
|
272 | (7) |
|
|
279 | (4) |
|
Matrix (4 × 4) Vector Multiplication |
|
|
283 | (6) |
|
|
289 | (11) |
|
|
289 | (5) |
|
|
294 | (6) |
|
|
300 | (3) |
|
Chapter 9 Supplemental C++ SIMD Programming |
|
|
303 | (30) |
|
|
303 | (12) |
|
Short Vector Math Library |
|
|
315 | (16) |
|
Rectangular to Polar Coordinates |
|
|
316 | (9) |
|
|
325 | (6) |
|
|
331 | (2) |
|
Chapter 10 X86-64 Processor Architecture |
|
|
333 | (16) |
|
|
333 | (3) |
|
|
334 | (1) |
|
|
335 | (1) |
|
|
335 | (1) |
|
|
335 | (1) |
|
|
336 | (7) |
|
General-Purpose Registers |
|
|
337 | (1) |
|
|
338 | (1) |
|
|
338 | (2) |
|
Floating-Point and SIMD Registers |
|
|
340 | (2) |
|
|
342 | (1) |
|
|
343 | (1) |
|
|
344 | (2) |
|
|
346 | (1) |
|
|
347 | (2) |
|
Chapter 11 Core Assembly Language Programming: Part 1 |
|
|
349 | (40) |
|
|
349 | (13) |
|
|
350 | (3) |
|
|
353 | (4) |
|
|
357 | (5) |
|
Calling Convention: Part 1 |
|
|
362 | (6) |
|
|
368 | (5) |
|
|
373 | (3) |
|
|
376 | (5) |
|
|
381 | (5) |
|
|
386 | (3) |
|
Chapter 12 Core Assembly Language Programming: Part 2 |
|
|
389 | (44) |
|
Scalar Floating-Point Arithmetic |
|
|
389 | (20) |
|
Single-Precision Arithmetic |
|
|
389 | (4) |
|
Double-Precision Arithmetic |
|
|
393 | (4) |
|
|
397 | (3) |
|
|
400 | (9) |
|
Scalar Floating-Point Arrays |
|
|
409 | (2) |
|
Calling Convention: Part 2 |
|
|
411 | (20) |
|
|
412 | (4) |
|
Using Nonvolatile General-Purpose Registers |
|
|
416 | (4) |
|
Using Nonvolatile SIMD Registers |
|
|
420 | (5) |
|
Macros for Function Prologues and Epilogues |
|
|
425 | (6) |
|
|
431 | (2) |
|
Chapter 13 AVX Assembly Language Programming: Part 1 |
|
|
433 | (22) |
|
|
433 | (11) |
|
|
433 | (4) |
|
|
437 | (4) |
|
Bitwise Logical Operations |
|
|
441 | (2) |
|
Arithmetic and Logical Shifts |
|
|
443 | (1) |
|
Image Processing Algorithms |
|
|
444 | (9) |
|
Pixel Minimum and Maximum |
|
|
444 | (4) |
|
|
448 | (5) |
|
|
453 | (2) |
|
Chapter 14 AVX Assembly Language Programming: Part 2 |
|
|
455 | (28) |
|
Floating-Point Operations |
|
|
455 | (10) |
|
Floating-Point Arithmetic |
|
|
455 | (6) |
|
|
461 | (4) |
|
|
465 | (12) |
|
Mean and Standard Deviation |
|
|
466 | (4) |
|
|
470 | (7) |
|
|
477 | (4) |
|
|
481 | (2) |
|
Chapter 15 AVX2 Assembly Language Programming: Part 1 |
|
|
483 | (22) |
|
|
483 | (7) |
|
|
483 | (3) |
|
|
486 | (4) |
|
|
490 | (14) |
|
|
491 | (4) |
|
|
495 | (6) |
|
|
501 | (3) |
|
|
504 | (1) |
|
Chapter 16 AVX2 Assembly Language Programming: Part 2 |
|
|
505 | (28) |
|
|
505 | (5) |
|
|
510 | (15) |
|
|
510 | (4) |
|
Matrix (4 × 4) Multiplication |
|
|
514 | (4) |
|
Matrix (4 × 4) Vector Multiplication |
|
|
518 | (7) |
|
|
525 | (5) |
|
|
530 | (3) |
|
Chapter 17 AVX-512 Assembly Language Programming: Part 1 |
|
|
533 | (20) |
|
|
533 | (9) |
|
|
533 | (4) |
|
|
537 | (5) |
|
|
542 | (10) |
|
|
542 | (4) |
|
|
546 | (6) |
|
|
552 | (1) |
|
Chapter 18 AVX-512 Assembly Language Programming: Part 2 |
|
|
553 | (34) |
|
Floating-Point Arithmetic |
|
|
553 | (8) |
|
|
553 | (5) |
|
|
558 | (3) |
|
|
561 | (25) |
|
|
561 | (7) |
|
|
568 | (5) |
|
Matrix (4 × 4) Vector Multiplication |
|
|
573 | (5) |
|
|
578 | (8) |
|
|
586 | (1) |
|
Chapter 19 SIMD Usage and Optimization Guidelines |
|
|
587 | (16) |
|
|
587 | (2) |
|
C++ SIMD Intrinsic Functions or x86 Assembly Language |
|
|
588 | (1) |
|
SIMD Software Development Guidelines |
|
|
589 | (2) |
|
Identify Functions for SIMD Techniques |
|
|
589 | (1) |
|
Select Default and Explicit SIMD Instruction Sets |
|
|
589 | (1) |
|
Establish Benchmark Timing Objectives |
|
|
590 | (1) |
|
Code Explicit SIMD Functions |
|
|
590 | (1) |
|
Benchmark Code to Measure Performance |
|
|
590 | (1) |
|
Optimize Explicit SIMD Code |
|
|
591 | (1) |
|
Repeat Benchmarking and Optimization Steps |
|
|
591 | (1) |
|
Optimization Guidelines and Techniques |
|
|
591 | (3) |
|
|
591 | (1) |
|
Assembly Language Optimization Techniques |
|
|
592 | (2) |
|
SIMD Code Complexity vs. Performance |
|
|
594 | (8) |
|
|
602 | (1) |
|
Appendix A Source Code and Development Tools |
|
|
603 | (18) |
|
Source Code Download and Setup |
|
|
603 | (1) |
|
|
604 | (17) |
|
Visual Studio and Windows |
|
|
604 | (12) |
|
|
616 | (5) |
|
Appendix B References and Resources |
|
|
621 | (4) |
|
C++ SIMD Intrinsic Function Documentation |
|
|
621 | (1) |
|
X86 Programming References |
|
|
621 | (1) |
|
X86 Processor Information |
|
|
622 | (1) |
|
Software Development Tools |
|
|
622 | (1) |
|
|
622 | (1) |
|
|
623 | (1) |
|
Utilities, Tools, and Libraries |
|
|
624 | (1) |
Index |
|
625 | |