Acknowledgments |
|
xvii | |
Preface |
|
xxi | |
Suggested Ways to Teach the Art of Multiprocessor Programming |
|
xxiii | |
|
|
1 | (18) |
|
1.1 Shared Objects and Synchronization |
|
|
3 | (3) |
|
|
6 | (4) |
|
1.2.1 Properties of Mutual Exclusion |
|
|
8 | (1) |
|
|
9 | (1) |
|
1.3 The Producer-Consumer Problem |
|
|
10 | (2) |
|
1.4 The Readers-Writers Problem |
|
|
12 | (1) |
|
1.5 The Harsh Realities of Parallelization |
|
|
13 | (2) |
|
|
15 | (1) |
|
|
15 | (1) |
|
|
16 | (3) |
|
|
19 | (120) |
|
|
21 | (24) |
|
|
21 | (1) |
|
|
22 | (6) |
|
|
24 | (1) |
|
|
25 | (1) |
|
|
26 | (1) |
|
|
27 | (1) |
|
|
28 | (3) |
|
|
31 | (1) |
|
2.6 Lamport's Bakery Algorithm |
|
|
31 | (2) |
|
|
33 | (4) |
|
2.8 Lower Bounds on the Number of Locations |
|
|
37 | (3) |
|
|
40 | (1) |
|
|
41 | (4) |
|
|
45 | (26) |
|
3.1 Concurrency and Correctness |
|
|
45 | (3) |
|
|
48 | (1) |
|
3.3 Quiescent Consistency |
|
|
49 | (2) |
|
|
51 | (1) |
|
3.4 Sequential Consistency |
|
|
51 | (3) |
|
|
52 | (2) |
|
|
54 | (1) |
|
3.5.1 Linearization Points |
|
|
55 | (1) |
|
|
55 | (1) |
|
|
55 | (4) |
|
|
57 | (1) |
|
3.6.2 Compositional Linearizability |
|
|
57 | (1) |
|
3.6.3 The Nonblocking Property |
|
|
58 | (1) |
|
|
59 | (2) |
|
3.7.1 Dependent Progress Conditions |
|
|
60 | (1) |
|
3.8 The Java Memory Model |
|
|
61 | (3) |
|
3.8.1 Locks and Synchronized Blocks |
|
|
62 | (1) |
|
|
63 | (1) |
|
|
63 | (1) |
|
|
64 | (1) |
|
|
65 | (1) |
|
|
66 | (5) |
|
4 Foundations of Shared Memory |
|
|
71 | (28) |
|
4.1 The Space of Registers |
|
|
72 | (5) |
|
4.2 Register Constructions |
|
|
77 | (10) |
|
4.2.1 MRSW Safe Registers |
|
|
78 | (1) |
|
4.2.2 A Regular Boolean MRSW Register |
|
|
78 | (1) |
|
4.2.3 A Regular M-Valued MRSW Register |
|
|
79 | (2) |
|
4.2.4 An Atomic SRSW Register |
|
|
81 | (1) |
|
4.2.5 An Atomic MRSW Register |
|
|
82 | (3) |
|
4.2.6 An Atomic MRMW Register |
|
|
85 | (2) |
|
|
87 | (6) |
|
4.3.1 An Obstruction-Free Snapshot |
|
|
87 | (1) |
|
4.3.2 A Wait-Free Snapshot |
|
|
88 | (2) |
|
4.3.3 Correctness Arguments |
|
|
90 | (3) |
|
|
93 | (1) |
|
|
94 | (5) |
|
5 The Relative Power of Primitive Synchronization Operations |
|
|
99 | (26) |
|
|
100 | (3) |
|
|
101 | (2) |
|
|
103 | (3) |
|
|
106 | (1) |
|
|
106 | (4) |
|
5.5 Multiple Assignment Objects |
|
|
110 | (2) |
|
5.6 Read-Modify-Write Operations |
|
|
112 | (2) |
|
5.7 Common2 RMW Operations |
|
|
114 | (2) |
|
5.8 The compareAndSet() Operation |
|
|
116 | (1) |
|
|
117 | (1) |
|
|
118 | (7) |
|
6 Universality of Consensus |
|
|
125 | (14) |
|
|
125 | (1) |
|
|
126 | (1) |
|
6.3 A Lock-Free Universal Construction |
|
|
126 | (4) |
|
6.4 A Wait-Free Universal Construction |
|
|
130 | (6) |
|
|
136 | (1) |
|
|
137 | (2) |
|
|
139 | (312) |
|
7 Spin Locks and Contention |
|
|
141 | (36) |
|
7.1 Welcome to the Real World |
|
|
141 | (3) |
|
|
144 | (2) |
|
7.3 TAS-Based Spin Locks Revisited |
|
|
146 | (1) |
|
|
147 | (2) |
|
|
149 | (8) |
|
|
150 | (1) |
|
|
151 | (3) |
|
|
154 | (3) |
|
7.6 A Queue Lock with Timeouts |
|
|
157 | (2) |
|
|
159 | (8) |
|
7.7.1 A Fast-Path Composite Lock |
|
|
165 | (2) |
|
|
167 | (6) |
|
7.8.1 A Hierarchical Backoff Lock |
|
|
167 | (1) |
|
7.8.2 A Hierarchical CLH Queue Lock |
|
|
168 | (5) |
|
7.9 One Lock To Rule Them All |
|
|
173 | (1) |
|
|
173 | (1) |
|
|
174 | (3) |
|
8 Monitors and Blocking Synchronization |
|
|
177 | (18) |
|
|
177 | (1) |
|
8.2 Monitor Locks and Conditions |
|
|
178 | (5) |
|
|
179 | (2) |
|
8.2.2 The Lost-Wakeup Problem |
|
|
181 | (2) |
|
8.3 Readers-Writers Locks |
|
|
183 | (4) |
|
8.3.1 Simple Readers-Writers Lock |
|
|
184 | (1) |
|
8.3.2 Fair Readers-Writers Lock |
|
|
185 | (2) |
|
8.4 Our Own Reentrant Lock |
|
|
187 | (2) |
|
|
189 | (1) |
|
|
189 | (1) |
|
|
190 | (5) |
|
9 Linked Lists: The Role of Locking |
|
|
195 | (28) |
|
|
195 | (1) |
|
|
196 | (2) |
|
|
198 | (2) |
|
9.4 Coarse-Grained Synchronization |
|
|
200 | (1) |
|
9.5 Fine-Grained Synchronization |
|
|
201 | (4) |
|
9.6 Optimistic Synchronization |
|
|
205 | (3) |
|
|
208 | (5) |
|
9.8 Non-Blocking Synchronization |
|
|
213 | (5) |
|
|
218 | (1) |
|
|
219 | (1) |
|
|
219 | (4) |
|
10 Concurrent Queues and the ABA Problem |
|
|
223 | (22) |
|
|
223 | (1) |
|
|
224 | (1) |
|
10.3 A Bounded Partial Queue |
|
|
225 | (4) |
|
10.4 An Unbounded Total Queue |
|
|
229 | (1) |
|
10.5 An Unbounded Lock-Free Queue |
|
|
230 | (6) |
|
10.6 Memory Reclamation and the ABA Problem |
|
|
236 | (2) |
|
10.6.1 A Naive Synchronous Queue |
|
|
237 | (1) |
|
10.7 Dual Data Structures |
|
|
238 | (3) |
|
|
241 | (1) |
|
|
241 | (4) |
|
11 Concurrent Stacks and Elimination |
|
|
245 | (14) |
|
|
245 | (1) |
|
11.2 An Unbounded Lock-Free Stack |
|
|
245 | (3) |
|
|
248 | (1) |
|
11.4 The Elimination Backoff Stack |
|
|
249 | (5) |
|
11.4.1 A Lock-Free Exchanger |
|
|
249 | (2) |
|
11.4.2 The Elimination Array |
|
|
251 | (3) |
|
|
254 | (1) |
|
|
255 | (4) |
|
12 Counting, Sorting, and Distributed Coordination |
|
|
259 | (40) |
|
|
259 | (1) |
|
|
259 | (1) |
|
|
260 | (9) |
|
|
261 | (6) |
|
12.3.2 An Extended Example |
|
|
267 | (2) |
|
12.3.3 Performance and Robustness |
|
|
269 | (1) |
|
12.4 Quiescently Consistent Pools and Counters |
|
|
269 | (1) |
|
|
270 | (12) |
|
12.5.1 Networks That Count |
|
|
270 | (3) |
|
12.5.2 The Bitonic Counting Network |
|
|
273 | (7) |
|
12.5.3 Performance and Pipelining |
|
|
280 | (2) |
|
|
282 | (4) |
|
|
286 | (1) |
|
|
286 | (4) |
|
12.8.1 Designing a Sorting Network |
|
|
287 | (3) |
|
|
290 | (1) |
|
12.10 Distributed Coordination |
|
|
291 | (1) |
|
|
292 | (1) |
|
|
293 | (6) |
|
13 Concurrent Hashing and Natural Parallelism |
|
|
299 | (30) |
|
|
299 | (1) |
|
13.2 Closed-Address Hash Sets |
|
|
300 | (9) |
|
13.2.1 A Coarse-Grained Hash Set |
|
|
302 | (1) |
|
13.2.2 A Striped Hash Set |
|
|
303 | (2) |
|
13.2.3 A Refinable Hash Set |
|
|
305 | (4) |
|
13.3 A Lock-Free Hash Set |
|
|
309 | (7) |
|
13.3.1 Recursive Split-Ordering |
|
|
309 | (3) |
|
13.3.2 The Bucket List Class |
|
|
312 | (1) |
|
13.3.3 The LockFreeHashSet<T>Class |
|
|
313 | (3) |
|
13.4 An Open-Addressed Hash Set |
|
|
316 | (9) |
|
|
316 | (2) |
|
13.4.2 Concurrent Cuckoo Hashing |
|
|
318 | (4) |
|
13.4.3 Striped Concurrent Cuckoo Hashing |
|
|
322 | (2) |
|
13.4.4 A Refinable Concurrent Cuckoo Hash Set |
|
|
324 | (1) |
|
|
325 | (1) |
|
|
326 | (3) |
|
14 Skiplists and Balanced Search |
|
|
329 | (22) |
|
|
329 | (1) |
|
14.2 Sequential Skiplists |
|
|
329 | (2) |
|
14.3 A Lock-Based Concurrent Skiplist |
|
|
331 | (8) |
|
|
331 | (2) |
|
|
333 | (6) |
|
14.4 A Lock-Free Concurrent Skiplist |
|
|
339 | (9) |
|
|
339 | (2) |
|
14.4.2 The Algorithm in Detail |
|
|
341 | (7) |
|
14.5 Concurrent Skiplists |
|
|
348 | (1) |
|
|
348 | (1) |
|
|
349 | (2) |
|
|
351 | (18) |
|
|
351 | (1) |
|
15.1.1 Concurrent Priority Queues |
|
|
351 | (1) |
|
15.2 An Array-Based Bounded Priority Queue |
|
|
352 | (1) |
|
15.3 A Tree-Based Bounded Priority Queue |
|
|
353 | (2) |
|
15.4 An Unbounded Heap-Based Priority Queue |
|
|
355 | (8) |
|
|
356 | (1) |
|
|
357 | (6) |
|
15.5 A Skiplist-Based Unbounded Priority Queue |
|
|
363 | (3) |
|
|
366 | (1) |
|
|
366 | (3) |
|
16 Futures, Scheduling, and Work Distribution |
|
|
369 | (28) |
|
|
369 | (6) |
|
16.2 Analyzing Parallelism |
|
|
375 | (3) |
|
16.3 Realistic Multiprocessor Scheduling |
|
|
378 | (2) |
|
|
380 | (2) |
|
|
380 | (1) |
|
16.4.2 Yielding and Multiprogramming |
|
|
381 | (1) |
|
16.5 Work-Stealing Dequeues |
|
|
382 | (9) |
|
16.5.1 A Bounded Work-Stealing Dequeue |
|
|
382 | (4) |
|
16.5.2 An Unbounded Work-Stealing DEQueue |
|
|
386 | (3) |
|
|
389 | (2) |
|
|
391 | (1) |
|
|
392 | (5) |
|
|
397 | (20) |
|
|
397 | (1) |
|
17.2 Barrier Implementations |
|
|
398 | (1) |
|
17.3 Sense-Reversing Barrier |
|
|
399 | (1) |
|
17.4 Combining Tree Barrier |
|
|
400 | (2) |
|
|
402 | (2) |
|
17.6 Termination Detecting Barriers |
|
|
404 | (4) |
|
|
408 | (1) |
|
|
409 | (8) |
|
|
417 | (34) |
|
|
417 | (4) |
|
18.1.1 What is Wrong with Locking? |
|
|
417 | (1) |
|
18.1.2 What is Wrong with compareAndSet()? |
|
|
418 | (2) |
|
18.1.3 What is Wrong with Compositionality? |
|
|
420 | (1) |
|
18.1.4 What can We Do about It? |
|
|
421 | (1) |
|
18.2 Transactions and Atomicity |
|
|
421 | (3) |
|
18.3 Software Transactional Memory |
|
|
424 | (21) |
|
18.3.1 Transactions and Transactional Threads |
|
|
427 | (1) |
|
18.3.2 Zombies and Consistency |
|
|
428 | (1) |
|
|
429 | (2) |
|
18.3.4 Dependent or Independent Progress? |
|
|
431 | (1) |
|
18.3.5 Contention Managers |
|
|
431 | (2) |
|
18.3.6 Implementing Atomic Objects |
|
|
433 | (1) |
|
18.3.7 An Obstruction-Free Atomic Object |
|
|
434 | (4) |
|
18.3.8 A Lock-Based Atomic Object |
|
|
438 | (7) |
|
18.4 Hardware Transactional Memory |
|
|
445 | (3) |
|
|
446 | (1) |
|
18.4.2 Transactional Cache Coherence |
|
|
447 | (1) |
|
|
447 | (1) |
|
|
448 | (1) |
|
|
449 | (2) |
|
|
451 | (32) |
|
|
453 | (16) |
|
|
453 | (1) |
|
|
453 | (1) |
|
|
453 | (2) |
|
|
455 | (3) |
|
A.2.3 Yielding and Sleeping |
|
|
458 | (1) |
|
A.2.4 Thread-Local Objects |
|
|
458 | (2) |
|
|
460 | (1) |
|
|
460 | (1) |
|
|
461 | (1) |
|
A.3.3 Thread-Local Objects |
|
|
462 | (2) |
|
|
464 | (1) |
|
A.4.1 Thread-Local Storage |
|
|
465 | (1) |
|
|
466 | (3) |
|
|
469 | (14) |
|
B.1 Introduction (and a Puzzle) |
|
|
469 | (3) |
|
B.2 Processors and Threads |
|
|
472 | (1) |
|
|
472 | (1) |
|
|
473 | (1) |
|
|
473 | (1) |
|
|
474 | (2) |
|
|
476 | (1) |
|
B.6 Cache-Conscious Programming, or the Puzzle Solved |
|
|
476 | (1) |
|
B.7 Multi-Core and Multi-Threaded Architectures |
|
|
477 | (1) |
|
B.7.1 Relaxed Memory Consistency |
|
|
478 | (1) |
|
B.8 Hardware Synchronization Instructions |
|
|
479 | (2) |
|
|
481 | (1) |
|
|
481 | (2) |
Bibliography |
|
483 | (12) |
Index |
|
495 | |