Preface |
|
xv | |
|
1 Computer Abstractions and Technology |
|
|
2 | (58) |
|
|
3 | (8) |
|
1.2 Eight Great Ideas in Computer Architecture |
|
|
11 | (2) |
|
|
13 | (3) |
|
|
16 | (8) |
|
1.5 Technologies for Building Processors and Memory |
|
|
24 | (4) |
|
|
28 | (12) |
|
|
40 | (3) |
|
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors |
|
|
43 | (3) |
|
1.9 Real Stuff: Benchmarking the Intel Core i7 |
|
|
46 | (3) |
|
1.10 Fallacies and Pitfalls |
|
|
49 | (3) |
|
|
52 | (2) |
|
1.12 Historical Perspective and Further Reading |
|
|
54 | (1) |
|
|
54 | (6) |
|
2 Instructions: Language of the Computer |
|
|
60 | (112) |
|
|
62 | (1) |
|
2.2 Operations of the Computer Hardware |
|
|
63 | (4) |
|
2.3 Operands of the Computer Hardware |
|
|
67 | (7) |
|
2.4 Signed and Unsigned Numbers |
|
|
74 | (7) |
|
2.5 Representing Instructions in the Computer |
|
|
81 | (8) |
|
|
89 | (3) |
|
2.7 Instructions for Making Decisions |
|
|
92 | (6) |
|
2.8 Supporting Procedures in Computer Hardware |
|
|
98 | (10) |
|
2.9 Communicating with People |
|
|
108 | (5) |
|
2.10 RISC-V Addressing for Wide Immediates and Addresses |
|
|
113 | (8) |
|
2.11 Parallelism and Instructions: Synchronization |
|
|
121 | (3) |
|
2.12 Translating and Starting a Program |
|
|
124 | (9) |
|
2.13 A C Sort Example to Put it All Together |
|
|
133 | (8) |
|
2.14 Arrays versus Pointers |
|
|
141 | (3) |
|
2.15 Advanced Material: Compiling C and Interpreting Java |
|
|
144 | (1) |
|
2.16 Real Stuff: MIPS Instructions |
|
|
145 | (1) |
|
2.17 Real Stuff: x86 Instructions |
|
|
146 | (9) |
|
2.18 Real Stuff: The Rest of the RISC-V Instruction Set |
|
|
155 | (2) |
|
2.19 Fallacies and Pitfalls |
|
|
157 | (2) |
|
|
159 | (3) |
|
2.21 Historical Perspective and Further Reading |
|
|
162 | (1) |
|
|
162 | (10) |
|
3 Arithmetic for Computers |
|
|
172 | (62) |
|
|
174 | (1) |
|
3.2 Addition and Subtraction |
|
|
174 | (3) |
|
|
177 | (6) |
|
|
183 | (8) |
|
|
191 | (25) |
|
3.6 Parallelism and Computer Arithmetic: Subword Parallelism |
|
|
216 | (1) |
|
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86 |
|
|
217 | (1) |
|
3.8 Going Faster: Subword Parallelism and Matrix Multiply |
|
|
218 | (4) |
|
3.9 Fallacies and Pitfalls |
|
|
222 | (3) |
|
|
225 | (2) |
|
3.11 Historical Perspective and Further Reading |
|
|
227 | (1) |
|
|
227 | (7) |
|
|
234 | (130) |
|
|
236 | (4) |
|
4.2 Logic Design Conventions |
|
|
240 | (3) |
|
|
243 | (8) |
|
4.4 A Simple Implementation Scheme |
|
|
251 | (11) |
|
4.5 An Overview of Pipelining |
|
|
262 | (14) |
|
4.6 Pipelined Datapath and Control |
|
|
276 | (18) |
|
4.7 Data Hazards: Forwarding versus Stalling |
|
|
294 | (13) |
|
|
307 | (8) |
|
|
315 | (6) |
|
4.10 Parallelism via Instructions |
|
|
321 | (13) |
|
4.11 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Pipelines |
|
|
334 | (8) |
|
4.12 Going Faster: Instruction-Level Parallelism and Matrix Multiply |
|
|
342 | (3) |
|
4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations |
|
|
345 | (1) |
|
4.14 Fallacies and Pitfalls |
|
|
345 | (1) |
|
|
346 | (1) |
|
4.16 Historical Perspective and Further Reading |
|
|
347 | (1) |
|
|
347 | (17) |
|
5 Large and Fast: Exploiting Memory Hierarchy |
|
|
364 | (126) |
|
|
366 | (4) |
|
|
370 | (5) |
|
|
375 | (15) |
|
5.4 Measuring and Improving Cache Performance |
|
|
390 | (20) |
|
5.5 Dependable Memory Hierarchy |
|
|
410 | (6) |
|
|
416 | (3) |
|
|
419 | (24) |
|
5.8 A Common Framework for Memory Hierarchy |
|
|
443 | (6) |
|
5.9 Using a Finite-State Machine to Control a Simple Cache |
|
|
449 | (5) |
|
5.10 Parallelism and Memory Hierarchy: Cache Coherence |
|
|
454 | (4) |
|
5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks |
|
|
458 | (1) |
|
5.12 Advanced Material: Implementing Cache Controllers |
|
|
459 | (1) |
|
5.13 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Memory Hierarchies |
|
|
459 | (5) |
|
5.14 Real Stuff: The Rest of the RISC-V System and Special Instructions |
|
|
464 | (1) |
|
5.15 Going Faster: Cache Blocking and Matrix Multiply |
|
|
465 | (3) |
|
5.16 Fallacies and Pitfalls |
|
|
468 | (4) |
|
|
472 | (1) |
|
5.18 Historical Perspective and Further Reading |
|
|
473 | (1) |
|
|
473 | (17) |
|
6 Parallel Processors from Client to Cloud |
|
|
490 | |
|
|
492 | (2) |
|
6.2 The Difficulty of Creating Parallel Processing Programs |
|
|
494 | (5) |
|
6.3 SISD, MIMD, SIMD, SPMD, and Vector |
|
|
499 | (7) |
|
6.4 Hardware Multithreading |
|
|
506 | (3) |
|
6.5 Multicore and Other Shared Memory Multiprocessors |
|
|
509 | (5) |
|
6.6 Introduction to Graphics Processing Units |
|
|
514 | (7) |
|
6.7 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors |
|
|
521 | (5) |
|
6.8 Introduction to Multiprocessor Network Topologies |
|
|
526 | (3) |
|
6.9 Communicating to the Outside World: Cluster Networking |
|
|
529 | (1) |
|
6.10 Multiprocessor Benchmarks and Performance Models |
|
|
530 | (10) |
|
6.11 Real Stuff: Benchmarking and Rooflines of the Intel Core i7 960 and the NVIDIA Tesla GPU |
|
|
540 | (5) |
|
6.12 Going Faster: Multiple Processors and Matrix Multiply |
|
|
545 | (3) |
|
6.13 Fallacies and Pitfalls |
|
|
548 | (2) |
|
|
550 | (3) |
|
6.15 Historical Perspective and Further Reading |
|
|
553 | (1) |
|
|
553 | |
|
|
|
A The Basics of Logic Design |
|
|
2 | |
|
|
3 | (1) |
|
A.2 Gates, Truth Tables, and Logic Equations |
|
|
4 | (5) |
|
|
9 | (11) |
|
A.4 Using a Hardware Description Language |
|
|
20 | (6) |
|
A.5 Constructing a Basic Arithmetic Logic Unit |
|
|
26 | (11) |
|
A.6 Faster Addition: Carry Lookahead |
|
|
37 | (10) |
|
|
47 | (2) |
|
A.8 Memory Elements: Flip-Flops, Latches, and Registers |
|
|
49 | (8) |
|
A.9 Memory Elements: SRAMs and DRAMs |
|
|
57 | (9) |
|
A.10 Finite-State Machines |
|
|
66 | (5) |
|
A.11 Timing Methodologies |
|
|
71 | (6) |
|
A.12 Field Programmable Devices |
|
|
77 | (1) |
|
|
78 | (1) |
|
|
79 | |
Index |
|
1 | |