Preface |
|
xv | |
|
1 Computer Abstractions and Technology |
|
|
2 | (58) |
|
|
3 | (8) |
|
1.2 Eight Great Ideas in Computer Architecture |
|
|
11 | (2) |
|
|
13 | (3) |
|
|
16 | (8) |
|
1.5 Technologies for Building Processors and Memory |
|
|
24 | (4) |
|
|
28 | (12) |
|
|
40 | (3) |
|
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors |
|
|
43 | (3) |
|
1.9 Real Stuff: Benchmarking the Intel Core i7 |
|
|
46 | (3) |
|
1.10 Fallacies and Pitfalls |
|
|
49 | (3) |
|
|
52 | (2) |
|
1.12 Historical Perspective and Further Reading |
|
|
54 | (1) |
|
|
54 | (6) |
|
2 Instructions: Language of the Computer |
|
|
60 | (126) |
|
|
62 | (1) |
|
2.2 Operations of the Computer Hardware |
|
|
63 | (4) |
|
2.3 Operands of the Computer Hardware |
|
|
67 | (8) |
|
2.4 Signed and Unsigned Numbers |
|
|
75 | (7) |
|
2.5 Representing Instructions in the Computer |
|
|
82 | (8) |
|
|
90 | (3) |
|
2.7 Instructions for Making Decisions |
|
|
93 | (7) |
|
2.8 Supporting Procedures in Computer Hardware |
|
|
100 | (10) |
|
2.9 Communicating with People |
|
|
110 | (5) |
|
2.10 LEGv8 Addressing for Wide Immediates and Addresses |
|
|
115 | (10) |
|
2.11 Parallelism and Instructions: Synchronization |
|
|
125 | (3) |
|
2.12 Translating and Starting a Program |
|
|
128 | (9) |
|
2.13 AC Sort Example to Put it All Together |
|
|
137 | (9) |
|
2.14 Arrays versus Pointers |
|
|
146 | (4) |
|
2.15 Advanced Material: Compiling C and Interpreting Java |
|
|
150 | (1) |
|
2.16 Real Stuff: MIPS Instructions |
|
|
150 | (2) |
|
2.17 Real Stuff: ARMv7 (32-bit) Instructions |
|
|
152 | (2) |
|
2.18 Real Stuff: x86 Instructions |
|
|
154 | (9) |
|
2.19 Real Stuff: The Rest oftheARMv8 Instruction Set |
|
|
163 | (6) |
|
2.20 Fallacies and Pitfalls |
|
|
169 | (2) |
|
|
171 | (2) |
|
2.22 Historical Perspective and Further Reading |
|
|
173 | (1) |
|
|
174 | (12) |
|
3 Arithmetic for Computers |
|
|
186 | (68) |
|
|
188 | (1) |
|
3.2 Addition and Subtraction |
|
|
188 | (3) |
|
|
191 | (6) |
|
|
197 | (8) |
|
|
205 | (25) |
|
3.6 Parallelism and Computer Arithmetic: Subword Parallelism |
|
|
230 | (2) |
|
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86 |
|
|
232 | (2) |
|
3.8 Real Stuff: The Rest of the ARMv8 Arithmetic Instructions |
|
|
234 | (4) |
|
3.9 Going Faster: Subword Parallelism and Matrix Multiply |
|
|
238 | (4) |
|
3.10 Fallacies and Pitfalls |
|
|
242 | (3) |
|
|
245 | (3) |
|
3.12 Historical Perspective and Further Reading |
|
|
248 | (1) |
|
|
249 | (5) |
|
|
254 | (132) |
|
|
256 | (4) |
|
4.2 Logic Design Conventions |
|
|
260 | (3) |
|
|
263 | (8) |
|
4.4 A Simple Implementation Scheme |
|
|
271 | (12) |
|
4.5 An Overview of Pipelining |
|
|
283 | (14) |
|
4.6 Pipelined Datapath and Control |
|
|
297 | (19) |
|
4.7 Data Hazards: Forwarding versus Stalling |
|
|
316 | (12) |
|
|
328 | (8) |
|
|
336 | (6) |
|
4.10 Parallelism via Instructions |
|
|
342 | (13) |
|
4.11 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Pipelines |
|
|
355 | (8) |
|
4.12 Going Faster: Instruction-Level Parallelism and Matrix Multiply |
|
|
363 | (3) |
|
4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations |
|
|
366 | (1) |
|
4.14 Fallacies and Pitfalls |
|
|
366 | (1) |
|
|
367 | (1) |
|
4.16 Historical Perspective and Further Reading |
|
|
368 | (1) |
|
|
368 | (18) |
|
5 Large and Fast: Exploiting Memory Hierarchy |
|
|
386 | (128) |
|
|
388 | (4) |
|
|
392 | (5) |
|
|
397 | (15) |
|
5.4 Measuring and Improving Cache Performance |
|
|
412 | (20) |
|
5.5 Dependable Memory Hierarchy |
|
|
432 | (6) |
|
|
438 | (3) |
|
|
441 | (24) |
|
5.8 A Common Framework for Memory Hierarchy |
|
|
465 | (7) |
|
5.9 Using a Finite-State Machine to Control a Simple Cache |
|
|
472 | (5) |
|
5.10 Parallelism and Memory Hierarchy: Cache Coherence |
|
|
477 | (4) |
|
5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks |
|
|
481 | (1) |
|
5.12 Advanced Material: Implementing Cache Controllers |
|
|
482 | (1) |
|
5.13 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Memory Hierarchies |
|
|
482 | (5) |
|
5.14 Real Stuff: The Rest of the ARMv8 System and Special Instructions |
|
|
487 | (1) |
|
5.15 Going Faster: Cache Blocking and Matrix Multiply |
|
|
488 | (3) |
|
5.16 Fallacies and Pitfalls |
|
|
491 | (5) |
|
|
496 | (1) |
|
5.18 Historical Perspective and Further Reading |
|
|
497 | (1) |
|
|
497 | (17) |
|
6 Parallel Processors from Client to Cloud |
|
|
514 | |
|
|
516 | (2) |
|
6.2 The Difficulty of Creating Parallel Processing Programs |
|
|
518 | (5) |
|
6.3 SISD, MIMD, SIMD, SPMD, and Vector |
|
|
523 | (7) |
|
6.4 Hardware Multithreading |
|
|
530 | (3) |
|
6.5 Multicore and Other Shared Memory Multiprocessors |
|
|
533 | (5) |
|
6.6 Introduction to Graphics Processing Units |
|
|
538 | (7) |
|
6.7 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors |
|
|
545 | (5) |
|
6.8 Introduction to Multiprocessor Network Topologies |
|
|
550 | (3) |
|
6.9 Communicating to the Outside World: Cluster Networking |
|
|
553 | (1) |
|
6.10 Multiprocessor Benchmarks and Performance Models |
|
|
554 | (10) |
|
6.11 Real Stuff: Benchmarking and Rooflines of the Intel Core i7 960 and the NVIDIA Tesla GPU |
|
|
564 | (5) |
|
6.12 Going Faster: Multiple Processors and Matrix Multiply |
|
|
569 | (3) |
|
6.13 Fallacies and Pitfalls |
|
|
572 | (2) |
|
|
574 | (3) |
|
6.15 Historical Perspective and Further Reading |
|
|
577 | (1) |
|
|
577 | |