Foreword |
|
ix | |
Preface |
|
xv | |
Acknowledgments |
|
xxiii | |
|
Chapter 1 Fundamentals of Quantitative Design and Analysis |
|
|
|
|
2 | (3) |
|
|
5 | (6) |
|
1.3 Defining Computer Architecture |
|
|
11 | (6) |
|
|
17 | (4) |
|
1.5 Trends in Power and Energy in Integrated Circuits |
|
|
21 | (6) |
|
|
27 | (6) |
|
|
33 | (3) |
|
1.8 Measuring, Reporting, and Summarizing Performance |
|
|
36 | (8) |
|
1.9 Quantitative Principles of Computer Design |
|
|
44 | (8) |
|
1.10 Putting It All Together: Performance, Price, and Power |
|
|
52 | (3) |
|
1.11 Fallacies and Pitfalls |
|
|
55 | (4) |
|
|
59 | (2) |
|
1.13 Historical Perspectives and References |
|
|
61 | (11) |
|
Case Studies and Exercises by Diana Franklin |
|
|
61 | (11) |
|
Chapter 2 Memory Hierarchy Design |
|
|
|
|
72 | (6) |
|
2.2 Ten Advanced Optimizations of Cache Performance |
|
|
78 | (18) |
|
2.3 Memory Technology and Optimizations |
|
|
96 | (9) |
|
2.4 Protection: Virtual Memory and Virtual Machines |
|
|
105 | (7) |
|
2.5 Crosscutting Issues: The Design of Memory Hierarchies |
|
|
112 | (1) |
|
2.6 Putting It All Together: Memory Hierachies in the ARM Cortex-A8 and Intel Core i7 |
|
|
113 | (12) |
|
2.7 Fallacies and Pitfalls |
|
|
125 | (4) |
|
2.8 Concluding Remarks: Looking Ahead |
|
|
129 | (2) |
|
2.9 Historical Perspective and References |
|
|
131 | (17) |
|
Case Studies and Exercises by Norman P. Jouppi, Naveen Muralimanohar, and Sheng Li |
|
|
131 | (17) |
|
Chapter 3 Instruction-Level Parallelism and Its Exploitation |
|
|
|
3.1 Instruction-Level Parallelism: Concepts and Challenges |
|
|
148 | (8) |
|
3.2 Basic Compiler Techniques for Exposing ILP |
|
|
156 | (6) |
|
3.3 Reducing Branch Costs with Advanced Branch Prediction |
|
|
162 | (5) |
|
3.4 Overcoming Data Hazards with Dynamic Scheduling |
|
|
167 | (9) |
|
3.5 Dynamic Scheduling: Examples and the Algorithm |
|
|
176 | (7) |
|
3.6 Hardware-Based Speculation |
|
|
183 | (9) |
|
3.7 Exploiting ILP Using Multiple Issue and Static Scheduling |
|
|
192 | (5) |
|
3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation |
|
|
197 | (5) |
|
3.9 Advanced Techniques for Instruction Delivery and Speculation |
|
|
202 | (11) |
|
3.10 Studies of the Limitations of ILP |
|
|
213 | (8) |
|
3.11 Cross-Cutting Issues: ILP Approaches and the Memory System |
|
|
221 | (2) |
|
3.12 Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput |
|
|
223 | (10) |
|
3.13 Putting It All Together: The Intel Core i7 and ARM Cortex-A8 |
|
|
233 | (8) |
|
3.14 Fallacies and Pitfalls |
|
|
241 | (4) |
|
3.15 Concluding Remarks: What's Ahead? |
|
|
245 | (2) |
|
3.16 Historical Perspective and References |
|
|
247 | (15) |
|
Case Studies and Exercises |
|
|
247 | (15) |
|
|
|
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures |
|
|
|
|
262 | (2) |
|
|
264 | (18) |
|
4.3 SIMD Instruction Set Extensions for Multimedia |
|
|
282 | (6) |
|
4.4 Graphics Processing Units |
|
|
288 | (27) |
|
4.5 Detecting and Enhancing Loop-Level Parallelism |
|
|
315 | (7) |
|
|
322 | (1) |
|
4.7 Putting It All Together: Mobile versus Server GPUs and Tesla versus Core i7 |
|
|
323 | (7) |
|
4.8 Fallacies and Pitfalls |
|
|
330 | (2) |
|
|
332 | (2) |
|
4.10 Historical Perspective and References |
|
|
334 | (10) |
|
|
334 | (10) |
|
|
Chapter 5 Thread-Level Parallelism |
|
|
|
|
344 | (7) |
|
5.2 Centralized Shared-Memory Architectures |
|
|
351 | (15) |
|
5.3 Performance of Symmetric Shared-Memory Multiprocessors |
|
|
366 | (12) |
|
5.4 Distributed Shared-Memory and Directory-Based Coherence |
|
|
378 | (8) |
|
5.5 Synchronization: The Basics |
|
|
386 | (6) |
|
5.6 Models of Memory Consistency: An Introduction |
|
|
392 | (3) |
|
|
395 | (5) |
|
5.8 Putting It All Together: Multicore Processors and Their Performance |
|
|
400 | (5) |
|
5.9 Fallacies and Pitfalls |
|
|
405 | (4) |
|
|
409 | (3) |
|
5.11 Historical Perspectives and References |
|
|
412 | (20) |
|
Case Studies and Exercises |
|
|
412 | (20) |
|
|
|
Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism |
|
|
|
|
432 | (4) |
|
6.2 Programming Models and Workloads for Warehouse-Scale Computers |
|
|
436 | (5) |
|
6.3 Computer Architecture of Warehouse-Scale Computers |
|
|
441 | (5) |
|
6.4 Physical Infrastructure and Costs of Warehouse-Scale Computers |
|
|
446 | (9) |
|
6.5 Cloud Computing: The Return of Utility Computing |
|
|
455 | (6) |
|
|
461 | (3) |
|
6.7 Putting It All Together: A Google Warehouse-Scale Computer |
|
|
464 | (7) |
|
6.8 Fallacies and Pitfalls |
|
|
471 | (4) |
|
|
475 | (1) |
|
6.10 Historical Perspectives and References |
|
|
476 | |
|
Case Studies and Exercises |
|
|
476 | |
|
Parthasarathy Ranganathan |
|
|
Appendix A Instruction Set Principles |
|
|
|
|
2 | (1) |
|
A.2 Classifying Instruction Set Architectures |
|
|
3 | (4) |
|
|
7 | (6) |
|
A.4 Type and Size of Operands |
|
|
13 | (1) |
|
A.5 Operations in the Instruction Set |
|
|
14 | (2) |
|
A.6 Instructions for Control Flow |
|
|
16 | (5) |
|
A.7 Encoding an Instruction Set |
|
|
21 | (3) |
|
A.8 Crosscutting Issues: The Role of Compilers |
|
|
24 | (8) |
|
A.9 Putting It All Together: The MIPS Architecture |
|
|
32 | (7) |
|
A.10 Fallacies and Pitfalls |
|
|
39 | (6) |
|
|
45 | (2) |
|
A.12 Historical Perspective and References |
|
|
47 | |
|
|
47 | |
|
|
Appendix B Review of Memory Hierarchy |
|
|
|
|
2 | (14) |
|
|
16 | (6) |
|
B.3 Six Basic Cache Optimizations |
|
|
22 | (18) |
|
|
40 | (9) |
|
B.5 Protection and Examples of Virtual Memory |
|
|
49 | (8) |
|
B.6 Fallacies and Pitfalls |
|
|
57 | (2) |
|
|
59 | (1) |
|
B.8 Historical Perspective and References |
|
|
59 | |
|
|
60 | |
|
|
Appendix C Pipelining: Basic and Intermediate Concepts |
|
|
|
|
2 | (9) |
|
C.2 The Major Hurdle of Pipelining---Pipeline Hazards |
|
|
11 | (19) |
|
C.3 How Is Pipelining Implemented? |
|
|
30 | (13) |
|
C.4 What Makes Pipelining Hard to Implement? |
|
|
43 | (8) |
|
C.5 Extending the MIPS Pipeline to Handle Multicycle Operations |
|
|
51 | (10) |
|
C.6 Putting It All Together: The MIPS R4000 Pipeline |
|
|
61 | (9) |
|
|
70 | (10) |
|
C.8 Fallacies and Pitfalls |
|
|
80 | (1) |
|
|
81 | (1) |
|
C.10 Historical Perspective and References |
|
|
81 | |
|
|
82 | |
|
|
|
|
Appendix D Storage Systems |
|
|
|
Appendix E Embedded Systems |
|
|
|
|
Appendix F Interconnection Networks |
|
|
|
|
|
Appendix G Vector Processors in More Depth |
|
|
|
|
Appendix H Hardware and Software for VLIW and EPIC |
|
|
|
Appendix I Large-Scale Multiprocessors and Scientific Applications |
|
|
|
Appendix J Computer Arithmetic |
|
|
|
|
Appendix K Survey of Instruction Set Architectures |
|
|
|
Appendix L Historical Perspectives and References |
|
|
References |
|
1 | (1) |
Index |
|
1 | |