Preface |
|
xi | |
1 Computer Abstractions and Technology |
|
2 | (64) |
|
|
3 | (7) |
|
1.2 Seven Great Ideas in Computer Architecture |
|
|
10 | (3) |
|
|
13 | (3) |
|
|
16 | (9) |
|
1.5 Technologies for Building Processors and Memory |
|
|
25 | (4) |
|
|
29 | (11) |
|
|
40 | (3) |
|
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors |
|
|
43 | (3) |
|
1.9 Real Stuff: Benchmarking the Intel Core i7 |
|
|
46 | (3) |
|
1.10 Going Faster: Matrix Multiply in Python |
|
|
49 | (1) |
|
1.11 Fallacies and Pitfalls |
|
|
50 | (3) |
|
|
53 | (2) |
|
1.13 Historical Perspective and Further Reading |
|
|
55 | (1) |
|
|
55 | (4) |
|
|
59 | (7) |
2 Instructions: Language of the Computer |
|
66 | (122) |
|
|
68 | (1) |
|
2.2 Operations of the Computer Hardware |
|
|
69 | (4) |
|
2.3 Operands of the Computer Hardware |
|
|
73 | (7) |
|
2.4 Signed and Unsigned Numbers |
|
|
80 | (7) |
|
2.5 Representing Instructions in the Computer |
|
|
87 | (8) |
|
|
95 | (3) |
|
2.7 Instructions for Making Decisions |
|
|
98 | (6) |
|
2.8 Supporting Procedures in Computer Hardware |
|
|
104 | (10) |
|
2.9 Communicating with People |
|
|
114 | (6) |
|
2.10 RISC-V Addressing for Wide Immediates and Addresses |
|
|
120 | (8) |
|
2.11 Parallelism and Instructions: Synchronization |
|
|
128 | (3) |
|
2.12 Translating and Starting a Program |
|
|
131 | (9) |
|
2.13 A C Sort Example to Put it All Together |
|
|
140 | (8) |
|
2.14 Arrays versus Pointers |
|
|
148 | (3) |
|
2.15 Advanced Material: Compiling C and Interpreting Java |
|
|
151 | (1) |
|
2.16 Real Stuff: MIPS Instructions |
|
|
152 | (1) |
|
2.17 Real Stuff: ARMv7 (32-bit) Instructions |
|
|
153 | (4) |
|
2.18 Real Stuff: ARMv8 (64-bit) Instructions |
|
|
157 | (1) |
|
2.19 Real Stuff: x86 Instructions |
|
|
158 | (9) |
|
2.20 Real Stuff: The Rest of the RISC-V Instruction Set |
|
|
167 | (1) |
|
2.21 Going Faster: Matrix Multiply in C |
|
|
168 | (2) |
|
2.22 Fallacies and Pitfalls |
|
|
170 | (2) |
|
|
172 | (2) |
|
2.24 Historical Perspective and Further Reading |
|
|
174 | (1) |
|
|
175 | (3) |
|
|
178 | (10) |
3 Arithmetic for Computers |
|
188 | (64) |
|
|
190 | (1) |
|
3.2 Addition and Subtraction |
|
|
190 | (3) |
|
|
193 | (6) |
|
|
199 | (9) |
|
|
208 | (25) |
|
3.6 Parallelism and Computer Arithmetic: Subword Parallelism |
|
|
233 | (1) |
|
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86 |
|
|
234 | (2) |
|
3.8 Going Faster: Subword Parallelism and Matrix Multiply |
|
|
236 | (2) |
|
3.9 Fallacies and Pitfalls |
|
|
238 | (3) |
|
|
241 | (1) |
|
3.11 Historical Perspective and Further Reading |
|
|
242 | (1) |
|
|
242 | (4) |
|
|
246 | (6) |
4 The Processor |
|
252 | (134) |
|
|
254 | (4) |
|
4.2 Logic Design Conventions |
|
|
258 | (3) |
|
|
261 | (8) |
|
4.4 A Simple Implementation Scheme |
|
|
269 | (13) |
|
4.5 Multicyle Implementation |
|
|
282 | (1) |
|
4.6 An Overview of Pipelining |
|
|
283 | (13) |
|
4.7 Pipelined Datapath and Control |
|
|
296 | (17) |
|
4.8 Data Hazards: Forwarding versus Stalling |
|
|
313 | (12) |
|
|
325 | (8) |
|
|
333 | (7) |
|
4.11 Parallelism via Instructions |
|
|
340 | (14) |
|
4.12 Putting it All Together: The Intel Core i7 6700 and ARM Cortex-A53 |
|
|
354 | (9) |
|
4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply |
|
|
363 | (2) |
|
4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations |
|
|
365 | (1) |
|
4.15 Fallacies and Pitfalls |
|
|
365 | (2) |
|
|
367 | (1) |
|
4.17 Historical Perspective and Further Reading |
|
|
368 | (1) |
|
|
368 | (1) |
|
|
369 | (17) |
5 Large and Fast: Exploiting Memory Hierarchy |
|
386 | (132) |
|
|
388 | (4) |
|
|
392 | (6) |
|
|
398 | (14) |
|
5.4 Measuring and Improving Cache Performance |
|
|
412 | (19) |
|
5.5 Dependable Memory Hierarchy |
|
|
431 | (5) |
|
|
436 | (4) |
|
|
440 | (24) |
|
5.8 A Common Framework for Memory Hierarchy |
|
|
464 | (6) |
|
5.9 Using a Finite-State Machine to Control a Simple Cache |
|
|
470 | (5) |
|
5.10 Parallelism and Memory Hierarchy: Cache Coherence |
|
|
475 | (4) |
|
5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks |
|
|
479 | (1) |
|
5.12 Advanced Material: Implementing Cache Controllers |
|
|
480 | (1) |
|
5.13 Real Stuff: The ARM Cortex-A8 and Intel Core i7 Memory Hierarchies |
|
|
480 | (6) |
|
5.14 Real Stuff: The Rest of the RISC-V System and Special Instructions |
|
|
486 | (2) |
|
5.15 Going Faster: Cache Blocking and Matrix Multiply |
|
|
488 | (1) |
|
5.16 Fallacies and Pitfalls |
|
|
489 | (5) |
|
|
494 | (1) |
|
5.18 Historical Perspective and Further Reading |
|
|
495 | (1) |
|
|
495 | (4) |
|
|
499 | (19) |
6 Parallel Processors from Client to Cloud |
|
518 | |
|
|
520 | (2) |
|
6.2 The Difficulty of Creating Parallel Processing Programs |
|
|
522 | (5) |
|
6.3 SISD, MIMD, SIMD, SPMD, and Vector |
|
|
527 | (7) |
|
6.4 Hardware Multithreading |
|
|
534 | (3) |
|
6.5 Multicore and Other Shared Memory Multiprocessors |
|
|
537 | (5) |
|
6.6 Introduction to Graphics Processing Units |
|
|
542 | (7) |
|
6.7 Domain-Specific Architectures |
|
|
549 | (3) |
|
6.8 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors |
|
|
552 | (5) |
|
6.9 Introduction to Multiprocessor Network Topologies |
|
|
557 | (4) |
|
6.10 Communicating to the Outside World: Cluster Networking |
|
|
561 | (1) |
|
6.11 Multiprocessor Benchmarks and Performance Models |
|
|
561 | (11) |
|
6.12 Real Stuff: Benchmarking the Google TPUv3 Supercomputer and an NVIDIA Volta GPU Cluster |
|
|
572 | (8) |
|
6.13 Going Faster: Multiple Processors and Matrix Multiply |
|
|
580 | (3) |
|
6.14 Fallacies and Pitfalls |
|
|
583 | (2) |
|
|
585 | (2) |
|
6.16 Historical Perspective and Further Reading |
|
|
587 | (1) |
|
|
588 | (2) |
|
|
590 | |
Appendix |
|
|
A The Basics of Logic Design |
|
|
A-2 | |
|
|
A-3 | |
|
A.2 Gates, Truth Tables, and Logic Equations |
|
|
A-4 | |
|
|
A-9 | |
|
A.4 Using a Hardware Description Language |
|
|
A-20 | |
|
A.5 Constructing a Basic Arithmetic Logic Unit |
|
|
A-26 | |
|
A.6 Faster Addition: Carry Lookahead |
|
|
A-37 | |
|
|
A-47 | |
|
A.8 Memory Elements: Flip-Flops, Latches, and Registers |
|
|
A-49 | |
|
A.9 Memory Elements: SRAMs and DRAMs |
|
|
A-57 | |
|
A.10 Finite-State Machines |
|
|
A-66 | |
|
A.11 Timing Methodologies |
|
|
A-71 | |
|
A.12 Field Programmable Devices |
|
|
A-77 | |
|
|
A-78 | |
|
|
A-79 | |
Index |
|
I-1 | |