Acknowledgments |
|
xvii | |
Preface |
|
1 | (6) |
Introduction |
|
7 | (8) |
|
1 By The C, By The C, By The Beautiful C |
|
|
15 | (48) |
|
1.1 Getting Started Programming in C |
|
|
16 | (10) |
|
1.1.1 Compiling and Running C Programs |
|
|
19 | (3) |
|
|
22 | (4) |
|
1.2 Input/Output (printf and scanf) |
|
|
26 | (4) |
|
|
26 | (2) |
|
|
28 | (2) |
|
1.3 Conditionals and Loops |
|
|
30 | (8) |
|
1.3.1 Boolean Values in C |
|
|
32 | (1) |
|
|
33 | (5) |
|
|
38 | (6) |
|
|
41 | (3) |
|
|
44 | (8) |
|
1.5.1 Introduction to Arrays |
|
|
44 | (3) |
|
1.5.2 Array Access Methods |
|
|
47 | (1) |
|
1.5.3 Arrays and Functions |
|
|
48 | (2) |
|
1.5.4 Introduction to Strings and the C String Library |
|
|
50 | (2) |
|
|
52 | (8) |
|
1.6.1 Defining a Struct Type |
|
|
53 | (1) |
|
1.6.2 Declaring Variables of Struct Types |
|
|
53 | (1) |
|
1.6.3 Accessing Field Values |
|
|
53 | (5) |
|
1.6.4 Passing Structs to Functions |
|
|
58 | (2) |
|
|
60 | (3) |
|
2 A Deeper Dive Into C Programming |
|
|
63 | (86) |
|
2.1 Parts of Program Memory and Scope |
|
|
64 | (2) |
|
2.2 C's Pointer Variables |
|
|
66 | (5) |
|
|
67 | (4) |
|
2.3 Pointers and Functions |
|
|
71 | (3) |
|
2.4 Dynamic Memory Allocation |
|
|
74 | (7) |
|
|
74 | (1) |
|
|
75 | (2) |
|
2.4.3 Dynamically Allocated Arrays and Strings |
|
|
77 | (3) |
|
2.4.4 Pointers to Heap Memory and Functions |
|
|
80 | (1) |
|
|
81 | (12) |
|
2.5.1 Single-Dimensional Arrays |
|
|
81 | (3) |
|
2.5.2 Two-Dimensional Arrays |
|
|
84 | (9) |
|
2.6 Strings and the String Library |
|
|
93 | (10) |
|
2.6.1 C's Support for Statically Allocated Strings (Arrays of char) |
|
|
94 | (1) |
|
2.6.2 Dynamically Allocating Strings |
|
|
94 | (2) |
|
2.6.3 Libraries for Manipulating C Strings and Characters |
|
|
96 | (7) |
|
|
103 | (10) |
|
2.7.1 Review of the C struct Type |
|
|
104 | (2) |
|
2.7.2 Pointers and Structs |
|
|
106 | (2) |
|
2.7.3 Pointer Fields in Structs |
|
|
108 | (2) |
|
|
110 | (1) |
|
2.7.5 Self-Referential Structs |
|
|
111 | (2) |
|
2.8 I/O in C (Standard and File) |
|
|
113 | (9) |
|
2.8.1 Standard Input/Output |
|
|
113 | (4) |
|
|
117 | (1) |
|
2.8.3 Using Text Files in C |
|
|
118 | (1) |
|
2.8.4 Standard and File I/O Functions in stdio.h |
|
|
119 | (3) |
|
2.9 Some Advanced C Features |
|
|
122 | (26) |
|
|
122 | (3) |
|
2.9.2 Command Line Arguments |
|
|
125 | (1) |
|
2.9.3 The void * Type and Type Recasting |
|
|
126 | (2) |
|
|
128 | (5) |
|
2.9.5 C Libraries: Using, Compiling, and Linking |
|
|
133 | (6) |
|
2.9.6 Writing and Using Your Own C Libraries |
|
|
139 | (6) |
|
2.9.7 Compiling C to Assembly, and Compiling and Linking Assembly and CCode |
|
|
145 | (3) |
|
|
148 | (1) |
|
|
149 | (40) |
|
|
150 | (11) |
|
3.1.1 Getting Started with GDB |
|
|
151 | (1) |
|
3.1.2 Example GDB Sessions |
|
|
151 | (10) |
|
3.2 GDB Commands in Detail |
|
|
161 | (7) |
|
3.2.1 Keyboard Shortcuts in GDB |
|
|
161 | (1) |
|
3.2.2 Common GDB Commands |
|
|
161 | (7) |
|
3.3 Debugging Memory with Valgrind |
|
|
168 | (6) |
|
3.3.1 An Example Program with a Heap Memory Access Error |
|
|
169 | (3) |
|
3.3.2 How to Use Memcheck |
|
|
172 | (2) |
|
3.4 Advanced GDB Features |
|
|
174 | (3) |
|
|
174 | (1) |
|
3.4.2 Attaching GDB to a Running Process |
|
|
175 | (1) |
|
3.4.3 Following a Process on a Fork |
|
|
176 | (1) |
|
|
176 | (1) |
|
3.4.5 DDD Settings and Bug Fixes |
|
|
177 | (1) |
|
3.5 Debugging Assembly Code |
|
|
177 | (5) |
|
3.5.1 Using GDB to Examine Binary Code |
|
|
178 | (1) |
|
3.5.2 Using DDD to Debug at the Assembly Level |
|
|
179 | (1) |
|
3.5.3 GDB Assembly Code Debugging Commands and Examples |
|
|
180 | (1) |
|
3.5.4 Quick Summary of Common Commands for Assembly Debugging |
|
|
181 | (1) |
|
3.6 Debugging Multithreaded Programs with GDB |
|
|
182 | (4) |
|
|
182 | (1) |
|
3.6.2 GDB Thread-Specific Commands |
|
|
183 | (1) |
|
|
183 | (3) |
|
|
186 | (3) |
|
4 Binary And Data Representation |
|
|
189 | (42) |
|
4.1 Number Bases and Unsigned Integers |
|
|
192 | (5) |
|
|
192 | (1) |
|
4.1.2 Unsigned Binary Numbers |
|
|
193 | (2) |
|
|
195 | (1) |
|
4.1.4 Storage Limitations |
|
|
196 | (1) |
|
4.2 Converting Between Bases |
|
|
197 | (5) |
|
4.2.1 Converting Between Binary and Hexadecimal |
|
|
198 | (1) |
|
4.2.2 Converting to Decimal |
|
|
198 | (1) |
|
4.2.3 Converting from Decimal |
|
|
199 | (3) |
|
4.3 Signed Binary Integers |
|
|
202 | (5) |
|
|
202 | (2) |
|
|
204 | (3) |
|
4.4 Binary Integer Arithmetic |
|
|
207 | (4) |
|
|
207 | (2) |
|
|
209 | (1) |
|
4.4.3 Multiplication and Division |
|
|
210 | (1) |
|
|
211 | (7) |
|
|
212 | (1) |
|
4.5.2 Binary Integer Overflow |
|
|
213 | (4) |
|
|
217 | (1) |
|
4.5.4 Overflow Consequences |
|
|
217 | (1) |
|
|
218 | (6) |
|
|
219 | (1) |
|
|
220 | (1) |
|
4.6.3 Bitwise XOR (Exclusive OR) |
|
|
220 | (1) |
|
|
221 | (1) |
|
|
222 | (2) |
|
|
224 | (2) |
|
4.8 Real Numbers in Binary |
|
|
226 | (3) |
|
4.8.1 Fixed-Point Representation |
|
|
226 | (2) |
|
4.8.2 Floating-Point Representation |
|
|
228 | (1) |
|
4.8.3 Rounding Consequences |
|
|
229 | (1) |
|
|
229 | (2) |
|
5 What Von Neumann Knew: Computer Architecture |
|
|
231 | (58) |
|
5.1 The Origin of Modern Computing Architectures |
|
|
233 | (4) |
|
|
234 | (1) |
|
5.1.2 Early Electronic Computers |
|
|
235 | (1) |
|
5.1.3 So What Did von Neumann Know? |
|
|
236 | (1) |
|
5.2 The von Neumann Architecture |
|
|
237 | (5) |
|
|
238 | (1) |
|
5.2.2 The Processing Unit |
|
|
238 | (1) |
|
|
239 | (1) |
|
|
239 | (1) |
|
5.2.5 The Input and Output (I/O) Units |
|
|
240 | (1) |
|
5.2.6 The von Neumann Machine in Action: Executing a Program |
|
|
240 | (2) |
|
|
242 | (4) |
|
|
243 | (1) |
|
|
244 | (2) |
|
|
246 | (14) |
|
5.4.1 Arithmetic and Logic Circuits |
|
|
246 | (6) |
|
|
252 | (5) |
|
|
257 | (3) |
|
5.5 Building a Processor: Putting It All Together |
|
|
260 | (6) |
|
|
261 | (2) |
|
|
263 | (1) |
|
|
264 | (2) |
|
5.6 The Processor's Execution of Program Instructions |
|
|
266 | (8) |
|
5.6.1 Clock-Driven Execution |
|
|
270 | (3) |
|
5.6.2 Putting It All Together: The CPU in a Full Computer |
|
|
273 | (1) |
|
5.7 Pipelining: Making the CPU Faster |
|
|
274 | (2) |
|
5.8 Advanced Pipelined Instruction Considerations |
|
|
276 | (5) |
|
5.8.1 Pipelining Consideration: Data Hazards |
|
|
278 | (1) |
|
5.8.2 Pipelining Hazards: Control Hazards |
|
|
279 | (2) |
|
5.9 Looking Ahead: CPUs Today |
|
|
281 | (5) |
|
5.9.1 Instruction-Level Parallelism |
|
|
282 | (1) |
|
5.9.2 Multicore and Hardware Multithreading |
|
|
283 | (3) |
|
5.9.3 Some Example Processors |
|
|
286 | (1) |
|
|
286 | (3) |
|
6 Under The C: Diving Into Assembly |
|
|
289 | (4) |
|
7 64-BIT X86 Assembly (X86-64) |
|
|
293 | (84) |
|
7.1 Diving into Assembly: Basics |
|
|
294 | (6) |
|
|
296 | (1) |
|
7.1.2 Advanced Register Notation |
|
|
296 | (2) |
|
7.1.3 Instruction Structure |
|
|
298 | (1) |
|
7.1.4 An Example with Operands |
|
|
298 | (2) |
|
7.1.5 Instruction Suffixes |
|
|
300 | (1) |
|
|
300 | (7) |
|
7.2.1 Putting It All Together: A More Concrete Example |
|
|
302 | (5) |
|
7.3 Arithmetic Instructions |
|
|
307 | (3) |
|
7.3.1 Bit Shifting Instructions |
|
|
308 | (1) |
|
7.3.2 Bitwise Instructions |
|
|
309 | (1) |
|
7.3.3 The Load Effective Address Instruction |
|
|
310 | (1) |
|
7.4 Conditional Control and Loops |
|
|
310 | (16) |
|
|
311 | (4) |
|
7.4.2 If Statements in Assembly |
|
|
315 | (6) |
|
|
321 | (5) |
|
7.5 Functions in Assembly |
|
|
326 | (21) |
|
7.5.1 Function Parameters |
|
|
329 | (1) |
|
7.5.2 Tracing Through an Example |
|
|
329 | (2) |
|
7.5.3 Tracing Through main |
|
|
331 | (16) |
|
|
347 | (2) |
|
7.6.1 Animation: Observing How the Call Stack Changes |
|
|
348 | (1) |
|
|
349 | (3) |
|
|
352 | (6) |
|
7.8.1 Contiguous Two-Dimensional Arrays |
|
|
353 | (2) |
|
7.8.2 Noncontiguous Matrix |
|
|
355 | (3) |
|
|
358 | (4) |
|
7.9.1 Data Alignment and structs |
|
|
361 | (1) |
|
7.10 Real World: Buffer Overflow |
|
|
362 | (15) |
|
7.10.1 Famous Examples of Buffer Overflow |
|
|
362 | (1) |
|
7.10.2 A First Look: The Guessing Game |
|
|
363 | (2) |
|
7.10.3 Taking a Closer Look (Under the C) |
|
|
365 | (3) |
|
7.10.4 Buffer Overflow: First Attempt |
|
|
368 | (2) |
|
7.10.5 A Smarter Buffer Overflow: Second Attempt |
|
|
370 | (2) |
|
7.10.6 Protecting Against Buffer Overflow |
|
|
372 | (5) |
|
8 32-BIT X86 Assembly (IA32) |
|
|
377 | (84) |
|
8.1 Diving into Assembly: Basics |
|
|
378 | (5) |
|
|
380 | (1) |
|
8.1.2 Advanced Register Notation |
|
|
380 | (1) |
|
8.1.3 Instruction Structure |
|
|
381 | (1) |
|
8.1.4 An Example with Operands |
|
|
382 | (1) |
|
8.1.5 Instruction Suffixes |
|
|
383 | (1) |
|
|
383 | (7) |
|
8.2.1 Putting It All Together: A More Concrete Example |
|
|
385 | (5) |
|
8.3 Arithmetic Instructions |
|
|
390 | (3) |
|
8.3.1 Bit Shifting Instructions |
|
|
391 | (1) |
|
8.3.2 Bitwise Instructions |
|
|
391 | (1) |
|
8.3.3 The Load Effective Address Instruction |
|
|
392 | (1) |
|
8.4 Conditional Control and Loops |
|
|
393 | (15) |
|
|
393 | (5) |
|
8.4.2 If Statements in Assembly |
|
|
398 | (5) |
|
|
403 | (5) |
|
8.5 Functions in Assembly |
|
|
408 | (24) |
|
8.5.1 Tracing Through an Example |
|
|
410 | (1) |
|
8.5.2 Tracing Through main |
|
|
411 | (21) |
|
|
432 | (2) |
|
8.6.1 Animation: Observing How the Call Stack Changes |
|
|
434 | (1) |
|
|
434 | (3) |
|
|
437 | (6) |
|
8.8.1 Contiguous Two-Dimensional Arrays |
|
|
438 | (3) |
|
8.8.2 Noncontiguous Matrix |
|
|
441 | (2) |
|
|
443 | (4) |
|
8.9.1 Data Alignment and structs |
|
|
445 | (2) |
|
8.10 Real World: Buffer Overflow |
|
|
447 | (14) |
|
8.10.1 Famous Examples of Buffer Overflow |
|
|
447 | (1) |
|
8.10.2 A First Look: The Guessing Game |
|
|
448 | (1) |
|
8.10.3 Taking a Closer Look (Under the C) |
|
|
449 | (3) |
|
8.10.4 Buffer Overflow: First Attempt |
|
|
452 | (2) |
|
8.10.5 A Smarter Buffer Overflow: Second Attempt |
|
|
454 | (2) |
|
8.10.6 Protecting Against Buffer Overflow |
|
|
456 | (5) |
|
|
461 | (78) |
|
9.1 Diving into Assembly: Basics |
|
|
462 | (5) |
|
|
464 | (1) |
|
9.1.2 Advanced Register Notation |
|
|
464 | (1) |
|
9.1.3 Instruction Structure |
|
|
465 | (1) |
|
9.1.4 An Example with Operands |
|
|
465 | (2) |
|
|
467 | (6) |
|
9.2.1 Putting It All Together: A More Concrete Example |
|
|
469 | (4) |
|
9.3 Arithmetic Instructions |
|
|
473 | (3) |
|
9.3.1 Common Arithmetic Instructions |
|
|
473 | (1) |
|
9.3.2 Bit Shifting Instructions |
|
|
474 | (1) |
|
9.3.3 Bitwise Instructions |
|
|
475 | (1) |
|
9.4 Conditional Control and Loops |
|
|
476 | (16) |
|
|
476 | (4) |
|
9.4.2 If Statements in Assembly |
|
|
480 | (7) |
|
|
487 | (5) |
|
9.5 Functions in Assembly |
|
|
492 | (18) |
|
9.5.1 Function Parameters |
|
|
495 | (1) |
|
9.5.2 Tracing Through an Example |
|
|
495 | (1) |
|
9.5.3 Tracing Through main |
|
|
496 | (14) |
|
|
510 | (2) |
|
9.6.1 Animation: Observing How the Call Stack Changes |
|
|
512 | (1) |
|
|
512 | (3) |
|
|
515 | (7) |
|
9.8.1 Contiguous Two-Dimensional Arrays |
|
|
516 | (3) |
|
9.8.2 Noncontiguous Matrix |
|
|
519 | (3) |
|
|
522 | (3) |
|
9.9.1 Data Alignment and structs |
|
|
524 | (1) |
|
9.10 Real World: Buffer Overflow |
|
|
525 | (14) |
|
9.10.1 Famous Examples of Buffer Overflow |
|
|
525 | (1) |
|
9.10.2 A First Look: The Guessing Game |
|
|
526 | (2) |
|
9.10.3 Taking a Closer Look (Under the C) |
|
|
528 | (3) |
|
9.10.4 Buffer Overflow: First Attempt |
|
|
531 | (1) |
|
9.10.5 A Smarter Buffer Overflow: Second Attempt |
|
|
532 | (3) |
|
9.10.6 Protecting Against Buffer Overflow |
|
|
535 | (4) |
|
10 Key Assembly Takeaways |
|
|
539 | (4) |
|
|
539 | (2) |
|
|
541 | (2) |
|
11 Storage And The Memory Hierarchy |
|
|
543 | (46) |
|
11.1 The Memory Hierarchy |
|
|
545 | (1) |
|
|
546 | (6) |
|
|
547 | (2) |
|
|
549 | (3) |
|
|
552 | (5) |
|
11.3.1 Locality Examples in Code |
|
|
553 | (2) |
|
11.3.2 From Locality to Caches |
|
|
555 | (1) |
|
|
555 | (1) |
|
|
556 | (1) |
|
|
557 | (18) |
|
11.4.1 Direct-Mapped Caches |
|
|
558 | (10) |
|
11.4.2 Cache Misses and Associative Designs |
|
|
568 | (2) |
|
11.4.3 Set Associative Caches |
|
|
570 | (5) |
|
11.5 Cache Analysis and Valgrind |
|
|
575 | (6) |
|
11.5.1 A First Cut: Theoretical Analysis and Benchmarking |
|
|
577 | (1) |
|
11.5.2 Cache Analysis in the Real World: Cachegrind |
|
|
578 | (3) |
|
11.6 Looking Ahead: Caching on Multicore Processors |
|
|
581 | (6) |
|
|
583 | (1) |
|
|
584 | (3) |
|
11.6.3 Implementing Cache Coherency Protocols |
|
|
587 | (1) |
|
11.6.4 More About Multicore Caching |
|
|
587 | (1) |
|
|
587 | (2) |
|
|
589 | (28) |
|
12.1 Code Optimization First Steps: Code Profiling |
|
|
597 | (7) |
|
12.1.1 Using Callgrind to Profile |
|
|
600 | (2) |
|
12.1.2 Loop-Invariant Code Motion |
|
|
602 | (2) |
|
12.2 Other Compiler Optimizations: Loop Unrolling and Function Inlining |
|
|
604 | (3) |
|
|
604 | (1) |
|
|
605 | (2) |
|
12.3 Memory Considerations |
|
|
607 | (6) |
|
|
608 | (1) |
|
12.3.2 Some Other Compiler Optimizations for Improving Locality: Fission and Fusion |
|
|
609 | (2) |
|
12.3.3 Memory Profiling with Massif |
|
|
611 | (2) |
|
12.4 Key Takeaways and Summary |
|
|
613 | (4) |
|
|
617 | (52) |
|
13.1 How the OS Works and How It Runs |
|
|
620 | (4) |
|
|
620 | (1) |
|
13.1.2 Getting the OS to Do Something: Interrupts and Traps |
|
|
621 | (3) |
|
|
624 | (15) |
|
13.2.1 Multiprogramming and Context Switching |
|
|
625 | (2) |
|
|
627 | (2) |
|
13.2.3 Creating (and Destroying) Processes |
|
|
629 | (4) |
|
|
633 | (2) |
|
|
635 | (4) |
|
|
639 | (17) |
|
|
642 | (2) |
|
13.3.2 Virtual Address to Physical Address Translation |
|
|
644 | (1) |
|
|
645 | (8) |
|
|
653 | (3) |
|
13.4 Interprocess Communication |
|
|
656 | (10) |
|
|
657 | (5) |
|
|
662 | (2) |
|
|
664 | (2) |
|
13.5 Summary and Other OS Functionality |
|
|
666 | (3) |
|
14 Leveraging Shared Memory In The Multicore Era |
|
|
669 | (66) |
|
14.1 Programming Multicore Systems |
|
|
672 | (5) |
|
14.1.1 The Impact of Multicore Systems on Process Execution |
|
|
672 | (2) |
|
14.1.2 Expediting Process Execution with Threads |
|
|
674 | (3) |
|
14.2 Hello Threading! Writing Your First Multithreaded Program |
|
|
677 | (9) |
|
14.2.1 Creating and Joining Threads |
|
|
679 | (2) |
|
14.2.2 The Thread Function |
|
|
681 | (1) |
|
|
681 | (1) |
|
14.2.4 Revisiting Scalar Multiplication |
|
|
682 | (2) |
|
14.2.5 Improving Scalar Multiplication: Multiple Arguments |
|
|
684 | (2) |
|
14.3 Synchronizing Threads |
|
|
686 | (23) |
|
|
694 | (8) |
|
|
702 | (1) |
|
14.3.3 Other Synchronization Constructs |
|
|
703 | (6) |
|
14.4 Measuring the Performance of Parallel Programs |
|
|
709 | (6) |
|
14.4.1 Parallel Performance Basics |
|
|
709 | (4) |
|
|
713 | (2) |
|
14.5 Cache Coherence and False Sharing |
|
|
715 | (7) |
|
14.5.1 Caches on Multicore Systems |
|
|
716 | (1) |
|
|
717 | (3) |
|
14.5.3 Fixing False Sharing |
|
|
720 | (2) |
|
|
722 | (4) |
|
14.6.1 Fixing Issues of Thread Safety |
|
|
722 | (4) |
|
14.7 Implicit Threading with OpenMP |
|
|
726 | (6) |
|
|
726 | (2) |
|
14.7.2 Hello Threading: OpenMP Flavored |
|
|
728 | (1) |
|
14.7.3 A More Complex Example: CountSort in OpenMP |
|
|
729 | (3) |
|
14.7.4 Learning More About OpenMP |
|
|
732 | (1) |
|
|
732 | (3) |
|
15 Looking Ahead: Other Parallel Systems And Parallel Programming Models |
|
|
735 | (36) |
|
15.1 Heterogeneous Computing: Hardware Accelerators, GPGPU Computing, and CUDA |
|
|
737 | (9) |
|
15.1.1 Hardware Accelerators |
|
|
737 | (1) |
|
15.1.2 GPU Architecture Overview |
|
|
738 | (1) |
|
|
739 | (1) |
|
|
740 | (5) |
|
15.1.5 Other Languages for GPGPU Programming |
|
|
745 | (1) |
|
15.2 Distributed Memory Systems, Message Passing, and MPI |
|
|
746 | (13) |
|
15.2.1 Parallel and Distributed Processing Models |
|
|
747 | (1) |
|
15.2.2 Communication Protocols |
|
|
748 | (1) |
|
15.2.3 Message Passing Interface |
|
|
749 | (1) |
|
|
750 | (1) |
|
15.2.5 MPI Scalar Multiplication |
|
|
751 | (8) |
|
15.2.6 Distributed Systems Challenges |
|
|
759 | (1) |
|
15.3 To Exascale and Beyond: Cloud Computing, Big Data, and the Future of Computing |
|
|
759 | (12) |
|
|
761 | (1) |
|
|
762 | (5) |
|
15.3.3 Looking Toward the Future: Opportunities and Challenges |
|
|
767 | (4) |
Index |
|
771 | |