List of Figures |
|
xiii | |
List of Tables |
|
xv | |
Listings |
|
xvii | |
Preface |
|
xxv | |
Trademarks |
|
xxvii | |
1 Introduction |
|
1 | (4) |
2 CPU Computing |
|
5 | (88) |
|
|
5 | (9) |
|
2.1.1 The Curse: An Example from Games |
|
|
5 | (5) |
|
2.1.2 The Curse: An Example from Science |
|
|
10 | (1) |
|
2.1.3 The Need to Understand Floating-Point Systems |
|
|
11 | (3) |
|
2.2 Balancing Robustness, Accuracy, and Speed |
|
|
14 | (8) |
|
|
14 | (5) |
|
2.2.1.1 Formal Definitions |
|
|
14 | (2) |
|
2.2.1.2 Algorithms and Implementations |
|
|
16 | (1) |
|
2.2.1.3 Practical Definitions |
|
|
17 | (2) |
|
|
19 | (1) |
|
|
20 | (1) |
|
2.2.4 Computer Science Is a Study of Trade-offs |
|
|
20 | (2) |
|
2.3 IEEE Floating Point Standard |
|
|
22 | (1) |
|
2.4 Binary Scientific Notation |
|
|
23 | (8) |
|
2.4.1 Conversion from Rational to Binary Scientific Numbers |
|
|
24 | (3) |
|
2.4.2 Arithmetic Properties of Binary Scientific Numbers |
|
|
27 | (3) |
|
2.4.2.1 Addition of Binary Scientific Numbers |
|
|
28 | (1) |
|
2.4.2.2 Subtraction of Binary Scientific Numbers |
|
|
28 | (1) |
|
2.4.2.3 Multiplication of Binary Scientific Numbers |
|
|
28 | (1) |
|
2.4.2.4 Division of Binary Scientific Numbers |
|
|
29 | (1) |
|
2.4.3 Algebraic Properties of Binary Scientific Numbers |
|
|
30 | (1) |
|
2.5 Floating-Point Arithmetic |
|
|
31 | (62) |
|
|
32 | (18) |
|
2.5.1.1 8-bit Floating-Point Numbers |
|
|
33 | (3) |
|
2.5.1.2 16-Bit Floating-Point Numbers |
|
|
36 | (3) |
|
2.5.1.3 32-Bit Floating-Point Numbers |
|
|
39 | (2) |
|
2.5.1.4 64-Bit Floating-Point Numbers |
|
|
41 | (1) |
|
2.5.1.5 n-Bit Floating-Point Numbers |
|
|
42 | (3) |
|
2.5.1.6 Classifications of Floating-Point Numbers |
|
|
45 | (5) |
|
2.5.2 Rounding and Conversions |
|
|
50 | (31) |
|
2.5.2.1 Rounding with Ties-to-Even |
|
|
51 | (1) |
|
2.5.2.2 Rounding with Ties-to-Away |
|
|
52 | (1) |
|
2.5.2.3 Rounding toward Zero |
|
|
52 | (1) |
|
2.5.2.4 Rounding toward Positive |
|
|
52 | (1) |
|
2.5.2.5 Rounding toward Negative |
|
|
53 | (1) |
|
2.5.2.6 Rounding from Floating-Point to Integral Floating-Point |
|
|
54 | (6) |
|
2.5.2.7 Conversion from Integer to Floating-Point |
|
|
60 | (4) |
|
2.5.2.8 Conversion from Floating-Point to Rational |
|
|
64 | (3) |
|
2.5.2.9 Conversion from Rational to Floating-Point |
|
|
67 | (3) |
|
2.5.2.10 Conversion to Wider Format |
|
|
70 | (3) |
|
2.5.2.11 Conversion to Narrower Format |
|
|
73 | (8) |
|
2.5.3 Arithmetic Operations |
|
|
81 | (1) |
|
2.5.4 Mathematical Functions |
|
|
82 | (1) |
|
2.5.5 Floating-Point Oddities |
|
|
83 | (10) |
|
2.5.5.1 Where Have My Digits Gone? |
|
|
83 | (5) |
|
2.5.5.2 Have a Nice Stay! |
|
|
88 | (1) |
|
2.5.5.3 The Best I Can Do Is That Bad? |
|
|
89 | (2) |
|
2.5.5.4 You Have Been More Than Helpful |
|
|
91 | (1) |
|
2.5.5.5 Hardware and Optimizing Compiler Issues |
|
|
92 | (1) |
3 SIMD Computing |
|
93 | (30) |
|
3.1 Intel Streaming SIMD Extensions |
|
|
93 | (13) |
|
3.1.1 Shuffling Components |
|
|
94 | (1) |
|
3.1.2 Single-Component versus All-Component Access |
|
|
95 | (1) |
|
3.1.3 Load and Store Instructions |
|
|
95 | (3) |
|
3.1.4 Logical Instructions |
|
|
98 | (1) |
|
3.1.5 Comparison Instructions |
|
|
99 | (1) |
|
3.1.6 Arithmetic Instructions |
|
|
100 | (1) |
|
3.1.7 Matrix Multiplication and Transpose |
|
|
100 | (3) |
|
3.1.8 IEEE Floating-Point Support |
|
|
103 | (1) |
|
3.1.9 Keep the Pipeline Running |
|
|
103 | (1) |
|
3.1.10 Flattening of Branches |
|
|
104 | (2) |
|
|
106 | (1) |
|
3.3 Function Approximations |
|
|
107 | (16) |
|
3.3.1 Minimax Approximations |
|
|
107 | (3) |
|
3.3.2 Inverse Square Root Function Using Root Finding |
|
|
110 | (1) |
|
3.3.3 Square Root Function |
|
|
111 | (3) |
|
3.3.4 Inverse Square Root Using a Minimax Algorithm |
|
|
114 | (2) |
|
|
116 | (1) |
|
|
116 | (1) |
|
|
117 | (1) |
|
3.3.8 Inverse Sine Function |
|
|
117 | (2) |
|
3.3.9 Inverse Cosine Function |
|
|
119 | (1) |
|
3.3.10 Inverse Tangent Function |
|
|
119 | (1) |
|
3.3.11 Exponential Functions |
|
|
120 | (1) |
|
3.3.12 Logarithmic Functions |
|
|
120 | (3) |
4 GPU Computing |
|
123 | (100) |
|
|
123 | (11) |
|
|
123 | (1) |
|
|
123 | (1) |
|
|
124 | (1) |
|
|
125 | (4) |
|
|
129 | (1) |
|
4.1.6 Summary of the Transformations |
|
|
130 | (1) |
|
|
131 | (3) |
|
4.2 High Level Shading Language (HLSL) |
|
|
134 | (34) |
|
4.2.1 Vertex and Pixel Shaders |
|
|
134 | (4) |
|
|
138 | (3) |
|
|
141 | (3) |
|
4.2.4 Compiling HLSL Shaders |
|
|
144 | (16) |
|
4.2.4.1 Compiling the Vertex Coloring Shaders |
|
|
147 | (4) |
|
4.2.4.2 Compiling the Texturing Shaders |
|
|
151 | (1) |
|
4.2.4.3 Compiling the Billboard Shaders |
|
|
152 | (4) |
|
4.2.4.4 Compiling the Gaussian Blurring Shaders |
|
|
156 | (4) |
|
4.2.5 Reflecting HLSL Shaders |
|
|
160 | (8) |
|
4.3 Devices, Contexts, and Swap Chains |
|
|
168 | (5) |
|
4.3.1 Creating a Device and an Immediate Context |
|
|
168 | (2) |
|
4.3.2 Creating Swap Chains |
|
|
170 | (1) |
|
4.3.3 Creating the Back Buffer |
|
|
171 | (2) |
|
|
173 | (28) |
|
4.4.1 Resource Usage and CPU Access |
|
|
174 | (1) |
|
|
175 | (3) |
|
|
178 | (1) |
|
|
179 | (10) |
|
|
181 | (1) |
|
|
181 | (1) |
|
|
182 | (2) |
|
|
184 | (1) |
|
4.4.4.5 Structured Buffers |
|
|
184 | (3) |
|
|
187 | (2) |
|
4.4.4.7 Indirect-Argument Buffers |
|
|
189 | (1) |
|
|
189 | (6) |
|
|
192 | (1) |
|
|
193 | (1) |
|
|
194 | (1) |
|
|
195 | (3) |
|
4.4.6.1 1D Texture Arrays |
|
|
195 | (1) |
|
4.4.6.2 2D Texture Arrays |
|
|
196 | (1) |
|
|
197 | (1) |
|
4.4.6.4 Cubemap Texture Arrays |
|
|
198 | (1) |
|
|
198 | (3) |
|
|
201 | (1) |
|
|
202 | (5) |
|
|
202 | (1) |
|
4.6.2 Vertex, Geometry, and Pixel Shader Execution |
|
|
202 | (3) |
|
4.6.3 Compute Shader Execution |
|
|
205 | (2) |
|
4.7 Copying Data between CPU and GPU |
|
|
207 | (7) |
|
4.7.1 Mapped Writes for Dynamic Update |
|
|
207 | (3) |
|
|
210 | (1) |
|
4.7.3 Copy from CPU to GPU |
|
|
211 | (1) |
|
4.7.4 Copy from GPU to CPU |
|
|
212 | (1) |
|
4.7.5 Copy from GPU to GPU |
|
|
213 | (1) |
|
|
214 | (3) |
|
4.8.1 Enumerating the Adapters |
|
|
214 | (1) |
|
4.8.2 Copying Data between Multiple GPUs |
|
|
215 | (2) |
|
4.9 IEEE Floating-Point on the GPU |
|
|
217 | (6) |
5 Practical Matters |
|
223 | (34) |
|
5.1 Engine Design and Architecture |
|
|
223 | (9) |
|
5.1.1 A Simple Low-Level D3D11 Application |
|
|
223 | (3) |
|
5.1.2 HLSL Compilation in Microsoft Visual Studio |
|
|
226 | (1) |
|
5.1.3 Design Goals for the Geometric Tools Engine |
|
|
227 | (5) |
|
|
227 | (1) |
|
|
228 | (2) |
|
|
230 | (1) |
|
5.1.3.4 Visual Objects and Scene Graphs |
|
|
231 | (1) |
|
|
231 | (1) |
|
|
232 | (9) |
|
5.2.1 Debugging on the CPU |
|
|
232 | (1) |
|
5.2.2 Debugging on the GPU |
|
|
233 | (2) |
|
5.2.3 Be Mindful of Your Surroundings |
|
|
235 | (6) |
|
5.2.3.1 An Example of an HLSL Compiler Bug |
|
|
236 | (3) |
|
5.2.3.2 An Example of a Programmer Bug |
|
|
239 | (2) |
|
|
241 | (8) |
|
5.3.1 Performance on the CPU |
|
|
241 | (2) |
|
5.3.2 Performance on the GPU |
|
|
243 | (4) |
|
5.3.3 Performance Guidelines |
|
|
247 | (2) |
|
|
249 | (8) |
|
5.4.1 Topics in Code Testing |
|
|
250 | (4) |
|
5.4.2 Code Coverage and Unit Testing on the GPU |
|
|
254 | (3) |
6 Linear and Affine Algebra |
|
257 | (84) |
|
|
257 | (16) |
|
6.1.1 Robust Length and Normalization Computations |
|
|
258 | (2) |
|
|
260 | (4) |
|
6.1.2.1 Orthogonality in 2D |
|
|
260 | (1) |
|
6.1.2.2 Orthogonality in 3D |
|
|
261 | (1) |
|
6.1.2.3 Orthogonality in 4D |
|
|
262 | (1) |
|
6.1.2.4 Gram-Schmidt Orthonormalization |
|
|
263 | (1) |
|
|
264 | (5) |
|
6.1.3.1 Orthonormal Sets in 2D |
|
|
265 | (1) |
|
6.1.3.2 Orthonormal Sets in 3D |
|
|
265 | (2) |
|
6.1.3.3 Orthonormal Sets in 4D |
|
|
267 | (2) |
|
6.1.4 Barycentric Coordinates |
|
|
269 | (1) |
|
6.1.5 Intrinsic Dimensionality |
|
|
270 | (3) |
|
|
273 | (15) |
|
6.2.1 Matrix Storage and Transform Conventions |
|
|
274 | (1) |
|
6.2.2 Base Class Matrix Operations |
|
|
275 | (3) |
|
6.2.3 Square Matrix Operations in 2D |
|
|
278 | (1) |
|
6.2.4 Square Matrix Operations in 3D |
|
|
279 | (2) |
|
6.2.5 Square Matrix Operations in 4D |
|
|
281 | (1) |
|
6.2.6 The Laplace Expansion Theorem |
|
|
282 | (6) |
|
|
288 | (30) |
|
|
288 | (1) |
|
|
289 | (3) |
|
|
292 | (2) |
|
|
294 | (11) |
|
6.3.4.1 Algebraic Operations |
|
|
295 | (2) |
|
6.3.4.2 Relationship of Quaternions to Rotations |
|
|
297 | (2) |
|
6.3.4.3 Spherical Linear Interpolation of Quaternions |
|
|
299 | (6) |
|
|
305 | (3) |
|
6.3.5.1 World Coordinates versus Body Coordinates |
|
|
306 | (2) |
|
6.3.6 Conversion between Representations |
|
|
308 | (10) |
|
6.3.6.1 Quaternion to Matrix |
|
|
309 | (1) |
|
6.3.6.2 Matrix to Quaternion |
|
|
309 | (2) |
|
6.3.6.3 Axis-Angle to Matrix |
|
|
311 | (1) |
|
6.3.6.4 Matrix to Axis-Angle |
|
|
311 | (1) |
|
6.3.6.5 Axis-Angle to Quaternion |
|
|
312 | (1) |
|
6.3.6.6 Quaternion to Axis-Angle |
|
|
312 | (1) |
|
6.3.6.7 Euler Angles to Matrix |
|
|
313 | (1) |
|
6.3.6.8 Matrix to Euler Angles |
|
|
313 | (4) |
|
6.3.6.9 Euler Angles to and from Quaternion or Axis-Angle |
|
|
317 | (1) |
|
|
318 | (23) |
|
6.4.1 Geometry and Affine Algebra |
|
|
319 | (3) |
|
|
322 | (10) |
|
6.4.2.1 Composition of Affine Transformations |
|
|
322 | (5) |
|
6.4.2.2 Decomposition of Affine Transformations |
|
|
327 | (3) |
|
6.4.2.3 A Simple Transformation Factory |
|
|
330 | (2) |
|
6.4.3 Coordinate System Conventions |
|
|
332 | (4) |
|
6.4.4 Converting between Coordinate Systems |
|
|
336 | (5) |
7 Sample Applications |
|
341 | (88) |
|
|
341 | (5) |
|
7.1.1 The VideoStream Class |
|
|
341 | (1) |
|
7.1.2 The VideoStreamManager Class |
|
|
342 | (4) |
|
|
346 | (9) |
|
|
346 | (1) |
|
|
347 | (1) |
|
|
348 | (2) |
|
7.2.4 Exhaustive Evaluation |
|
|
350 | (5) |
|
7.2.4.1 CPU Root Finding Using a Single Thread |
|
|
350 | (1) |
|
7.2.4.2 CPU Root Finding Using Multiple Threads |
|
|
351 | (1) |
|
|
352 | (3) |
|
7.3 Least Squares Fitting |
|
|
355 | (9) |
|
7.3.1 Fit a Line to 2D Points |
|
|
355 | (3) |
|
7.3.2 Fit a Plane to 3D Points |
|
|
358 | (1) |
|
7.3.3 Orthogonal Regression |
|
|
359 | (3) |
|
7.3.3.1 Fitting with Lines |
|
|
359 | (2) |
|
7.3.3.2 Fitting with Planes |
|
|
361 | (1) |
|
7.3.4 Estimation of Tangent Planes |
|
|
362 | (2) |
|
|
364 | (4) |
|
7.5 All-Pairs Triangle Intersection |
|
|
368 | (3) |
|
7.6 Shortest Path in a Weighted Graph |
|
|
371 | (6) |
|
|
377 | (7) |
|
|
384 | (8) |
|
|
385 | (2) |
|
7.8.2 Median of 3 x 3 Using Min-Max Operations |
|
|
387 | (2) |
|
7.8.3 Median of 5 x 5 Using Min-Max Operations |
|
|
389 | (3) |
|
7.9 Level Surface Extraction |
|
|
392 | (6) |
|
|
398 | (2) |
|
|
400 | (29) |
|
|
401 | (2) |
|
7.11.2 Solving Fluid Flow in 2D |
|
|
403 | (13) |
|
7.11.2.1 Initialization of State |
|
|
405 | (1) |
|
7.11.2.2 Initialization of External Forces |
|
|
406 | (3) |
|
7.11.2.3 Updating the State with Advection |
|
|
409 | (1) |
|
7.11.2.4 Applying the State Boundary Conditions |
|
|
410 | (3) |
|
7.11.2.5 Computing the Divergence of Velocity |
|
|
413 | (1) |
|
7.11.2.6 Solving the Poisson Equation |
|
|
413 | (1) |
|
7.11.2.7 Updating the Velocity to Be Divergence Free |
|
|
414 | (1) |
|
7.11.2.8 Screen Captures from the Simulation |
|
|
415 | (1) |
|
7.11.3 Solving Fluid Flow in 3D |
|
|
416 | (13) |
|
7.11.3.1 Initialization of State |
|
|
418 | (1) |
|
7.11.3.2 Initialization of External Forces |
|
|
419 | (3) |
|
7.11.3.3 Updating the State with Advection |
|
|
422 | (2) |
|
7.11.3.4 Applying the State Boundary Conditions |
|
|
424 | (1) |
|
7.11.3.5 Computing the Divergence of Velocity |
|
|
425 | (1) |
|
7.11.3.6 Solving the Poisson Equation |
|
|
425 | (2) |
|
7.11.3.7 Updating the Velocity to Be Divergence Free |
|
|
427 | (1) |
|
7.11.3.8 Screen Captures from the Simulation |
|
|
427 | (2) |
Bibliography |
|
429 | (6) |
Index |
|
435 | |