Preface to the Second Edition |
|
xvii | |
Preface to the First Edition |
|
xix | |
|
1 Introduction to Consistency and Coherence |
|
|
1 | (8) |
|
1.1 Consistency (a.k.a., Memory Consistency, Memory Consistency Model, or Memory Model) |
|
|
2 | (2) |
|
1.2 Coherence (a.k.a., Cache Coherence) |
|
|
4 | (2) |
|
1.3 Consistency and Coherence for Heterogeneous Systems |
|
|
6 | (1) |
|
1.4 Specifying and Validating Memory Consistency Models and Cache Coherence |
|
|
6 | (1) |
|
1.5 A Consistency and Coherence Quiz |
|
|
6 | (1) |
|
1.6 What This Primer Does Not Do |
|
|
7 | (1) |
|
|
8 | (1) |
|
|
9 | (8) |
|
2.1 Baseline System Model |
|
|
9 | (1) |
|
2.2 The Problem: How Incoherence Could Possibly Occur |
|
|
10 | (1) |
|
2.3 The Cache Coherence Interface |
|
|
11 | (1) |
|
2.4 (Consistency-Agnostic) Coherence Invariants |
|
|
12 | (4) |
|
2.4.1 Maintaining the Coherence Invariants |
|
|
14 | (1) |
|
2.4.2 The Granularity of Coherence |
|
|
14 | (1) |
|
2.4.3 When is Coherence Relevant? |
|
|
15 | (1) |
|
|
16 | (1) |
|
3 Memory Consistency Motivation and Sequential Consistency |
|
|
17 | (22) |
|
3.1 Problems with Shared Memory Behavior |
|
|
17 | (3) |
|
3.2 What is a Memory Consistency Model? |
|
|
20 | (1) |
|
3.3 Consistency vs. Coherence |
|
|
21 | (1) |
|
3.4 Basic Idea of Sequential Consistency (SC) |
|
|
22 | (1) |
|
3.5 A Little SC Formalism |
|
|
23 | (3) |
|
3.6 Naive SC Implementations |
|
|
26 | (1) |
|
3.7 A Basic SC Implementation with Cache Coherence |
|
|
27 | (2) |
|
3.8 Optimized SC Implementations with Cache Coherence |
|
|
29 | (4) |
|
3.9 Atomic Operations with SC |
|
|
33 | (1) |
|
3.10 Putting it All Together: MIPS R10000 |
|
|
34 | (1) |
|
3.11 Further Reading Regarding SC |
|
|
35 | (1) |
|
|
36 | (3) |
|
4 Total Store Order and the x86 Memory Model |
|
|
39 | (16) |
|
4.1 Motivation for TSO/x86 |
|
|
39 | (1) |
|
4.2 Basic Idea of TSO/x86 |
|
|
40 | (2) |
|
4.3 A Little TSO/x86 Formalism |
|
|
42 | (5) |
|
|
47 | (3) |
|
4.4.1 Implementing Atomic Instructions |
|
|
47 | (1) |
|
4.4.2 Implementing Fences |
|
|
48 | (2) |
|
4.5 Further Reading Regarding TSO |
|
|
50 | (1) |
|
|
50 | (2) |
|
|
52 | (3) |
|
5 Relaxed Memory Consistency |
|
|
55 | (36) |
|
|
55 | (3) |
|
5.1.1 Opportunities to Reorder Memory Operations |
|
|
56 | (1) |
|
5.1.2 Opportunities to Exploit Reordering |
|
|
57 | (1) |
|
5.2 An Example Relaxed Consistency Model (XC) |
|
|
58 | (7) |
|
5.2.1 The Basic Idea of the XC Model |
|
|
59 | (1) |
|
5.2.2 Examples Using Fences Under XC |
|
|
60 | (1) |
|
|
60 | (2) |
|
5.2.4 Examples Showing XC Operating Correctly |
|
|
62 | (3) |
|
|
65 | (3) |
|
5.3.1 Atomic Instructions with XC |
|
|
66 | (1) |
|
|
67 | (1) |
|
|
68 | (1) |
|
5.4 Sequential Consistency for Data-Race-Free Programs |
|
|
68 | (4) |
|
5.5 Some Relaxed Model Concepts |
|
|
72 | (3) |
|
5.5.1 Release Consistency |
|
|
72 | (1) |
|
5.5.2 Causality and Write Atomicity |
|
|
73 | (2) |
|
5.6 Relaxed Memory Model Case Studies |
|
|
75 | (6) |
|
5.6.1 RISC-V Weak Memory Order (RVWMO) |
|
|
75 | (3) |
|
|
78 | (3) |
|
5.7 Further Reading and Commercial Relaxed Memory Models |
|
|
81 | (1) |
|
5.7.1 Academic Literature |
|
|
81 | (1) |
|
|
81 | (1) |
|
5.8 Comparing Memory Models |
|
|
82 | (1) |
|
5.8.1 How Do Relaxed Memory Models Relate to Each Other and TSO and SC? |
|
|
82 | (1) |
|
5.8.2 How Good Are Relaxed Models? |
|
|
82 | (1) |
|
5.9 High-Level Language Models |
|
|
83 | (3) |
|
|
86 | (5) |
|
|
91 | (16) |
|
|
91 | (2) |
|
6.2 Specifying Coherence Protocols |
|
|
93 | (1) |
|
6.3 Example of a Simple Coherence Protocol |
|
|
94 | (2) |
|
6.4 Overview of Coherence Protocol Design Space |
|
|
96 | (9) |
|
|
97 | (4) |
|
|
101 | (2) |
|
6.4.3 Major Protocol Design Options |
|
|
103 | (2) |
|
|
105 | (2) |
|
7 Snooping Coherence Protocols |
|
|
107 | (44) |
|
7.1 Introduction to Snooping |
|
|
107 | (4) |
|
7.2 Baseline Snooping Protocol |
|
|
111 | (12) |
|
7.2.1 High-Level Protocol Specification |
|
|
112 | (1) |
|
7.2.2 Simple Snooping System Model: Atomic Requests, Atomic Transactions |
|
|
113 | (4) |
|
7.2.3 Baseline Snooping System Model: Non-Atomic Requests, Atomic Transactions |
|
|
117 | (4) |
|
|
121 | (1) |
|
7.2.5 Protocol Simplifications |
|
|
122 | (1) |
|
7.3 Adding the Exclusive State |
|
|
123 | (3) |
|
|
123 | (1) |
|
7.3.2 Getting to the Exclusive State |
|
|
123 | (1) |
|
7.3.3 High-Level Specification of Protocol |
|
|
124 | (2) |
|
7.3.4 Detailed Specification |
|
|
126 | (1) |
|
|
126 | (1) |
|
7.4 Adding the Owned State |
|
|
126 | (6) |
|
|
128 | (1) |
|
7.4.2 High-Level Protocol Specification |
|
|
129 | (1) |
|
7.4.3 Detailed Protocol Specification |
|
|
130 | (2) |
|
|
132 | (1) |
|
|
132 | (10) |
|
|
132 | (1) |
|
7.5.2 In-Order vs. Out-of-Order Responses |
|
|
133 | (1) |
|
7.5.3 Non-Atomic System Model |
|
|
134 | (1) |
|
7.5.4 An MSI Protocol with a SpUt-Transaction Bus |
|
|
135 | (5) |
|
7.5.5 An Optimized, Non-Stalling MSI Protocol with a Split-Transaction Bus |
|
|
140 | (2) |
|
7.6 Optimizations to the Bus Interconnection Network |
|
|
142 | (2) |
|
7.6.1 Separate Non-Bus Network for Data Responses |
|
|
142 | (1) |
|
7.6.2 Logical Bus for Coherence Requests |
|
|
143 | (1) |
|
|
144 | (4) |
|
7.7.1 Sun Starfire E10000 |
|
|
144 | (1) |
|
|
145 | (3) |
|
7.8 Discussion and the Future of Snooping |
|
|
148 | (1) |
|
|
148 | (3) |
|
8 Directory Coherence Protocols |
|
|
151 | (40) |
|
8.1 Introduction to Directory Protocols |
|
|
151 | (2) |
|
8.2 Baseline Directory System |
|
|
153 | (9) |
|
8.2.1 Directory System Model |
|
|
153 | (1) |
|
8.2.2 High-Level Protocol Specification |
|
|
153 | (2) |
|
|
155 | (3) |
|
8.2.4 Detailed Protocol Specification |
|
|
158 | (1) |
|
|
159 | (2) |
|
8.2.6 Protocol Simplifications |
|
|
161 | (1) |
|
8.3 Adding the Exclusive State |
|
|
162 | (2) |
|
8.3.1 High-Level Protocol Specification |
|
|
162 | (2) |
|
8.3.2 Detailed Protocol Specification |
|
|
164 | (1) |
|
8.4 Adding the Owned State |
|
|
164 | (4) |
|
8.4.1 High-Level Protocol Specification |
|
|
164 | (4) |
|
8.4.2 Detailed Protocol Specification |
|
|
168 | (1) |
|
8.5 Representing Directory State |
|
|
168 | (4) |
|
|
168 | (3) |
|
8.5.2 Limited Pointer Directory |
|
|
171 | (1) |
|
8.6 Directory Organization |
|
|
172 | (5) |
|
8.6.1 Directory Cache Backed by DRAM |
|
|
173 | (1) |
|
8.6.2 Inclusive Directory Caches |
|
|
174 | (2) |
|
8.6.3 Null Directory Cache (With no Backing Store) |
|
|
176 | (1) |
|
8.7 Performance and Scalability Optimizations |
|
|
177 | (6) |
|
8.7.1 Distributed Directories |
|
|
177 | (1) |
|
8.7.2 Non-Stalling Directory Protocols |
|
|
178 | (2) |
|
8.7.3 Interconnection Networks Without Point-to-Point Ordering |
|
|
180 | (2) |
|
8.7.4 Silent vs. Non-Silent Evictions of Blocks in State S |
|
|
182 | (1) |
|
|
183 | (6) |
|
|
183 | (2) |
|
8.8.2 Coherent HyperTransport |
|
|
185 | (2) |
|
8.8.3 Hypertransport Assist |
|
|
187 | (1) |
|
|
187 | (2) |
|
8.9 Discussion and the Future of Directory Protocols |
|
|
189 | (1) |
|
|
189 | (2) |
|
9 Advanced Topics in Coherence |
|
|
191 | (20) |
|
|
191 | (7) |
|
|
191 | (1) |
|
9.1.2 Translation Lookaside Buffers (TLBs) |
|
|
192 | (1) |
|
|
192 | (1) |
|
9.1.4 Write-Through Caches |
|
|
193 | (1) |
|
9.1.5 Coherent Direct Memory Access (DMA) |
|
|
194 | (1) |
|
9.1.6 Multi-Level Caches and Hierarchical Coherence Protocols |
|
|
195 | (3) |
|
9.2 Performance Optimizations |
|
|
198 | (2) |
|
9.2.1 Migratory Sharing Optimization |
|
|
198 | (1) |
|
9.2.2 False Sharing Optimizations |
|
|
199 | (1) |
|
|
200 | (7) |
|
|
200 | (3) |
|
|
203 | (3) |
|
|
206 | (1) |
|
|
207 | (1) |
|
9.5 The Future of Coherence |
|
|
207 | (1) |
|
|
207 | (4) |
|
10 Consistency and Coherence for Heterogeneous Systems |
|
|
211 | (40) |
|
10.1 GPU Consistency and Coherence |
|
|
211 | (26) |
|
10.1.1 Early GPUs: Architecture and Programming Model |
|
|
212 | (4) |
|
10.1.2 Big Picture: GPGPU Consistency and Coherence |
|
|
216 | (1) |
|
10.1.3 Temporal Coherence |
|
|
217 | (9) |
|
10.1.4 Release Consistency-directed Coherence |
|
|
226 | (11) |
|
10.2 More Heterogeneity Than Just GPUs |
|
|
237 | (10) |
|
10.2.1 Heterogeneous Consistency Models |
|
|
237 | (3) |
|
10.2.2 Heterogeneous Coherence Protocols |
|
|
240 | (7) |
|
|
247 | (1) |
|
|
247 | (4) |
|
11 Specifying and Validating Memory Consistency Models and Cache Coherence |
|
|
251 | (22) |
|
|
251 | (9) |
|
11.1.1 Operational Specification |
|
|
252 | (4) |
|
11.1.2 Axiomatic Specification |
|
|
256 | (4) |
|
11.2 Exploring the Behavior of Memory Consistency Models |
|
|
260 | (2) |
|
|
260 | (1) |
|
|
261 | (1) |
|
11.3 Validating Implementations |
|
|
262 | (5) |
|
|
262 | (3) |
|
|
265 | (2) |
|
11.4 History and Further Reading |
|
|
267 | (1) |
|
|
267 | (6) |
Authors' Biographies |
|
273 | |