Preface |
|
xv | |
Acknowledgments |
|
xix | |
Suggested ways to teach the art of multiprocessor programming |
|
xxi | |
|
|
1 | (20) |
|
1.1 Shared objects and synchronization |
|
|
3 | (3) |
|
|
6 | (3) |
|
1.2.1 Properties of a mutual exclusion protocol |
|
|
8 | (1) |
|
|
9 | (1) |
|
1.3 The producer-consumer problem |
|
|
9 | (2) |
|
1.4 The readers-writers problem |
|
|
11 | (1) |
|
1.5 The harsh realities of parallelization |
|
|
12 | (2) |
|
|
14 | (1) |
|
|
15 | (1) |
|
|
15 | (6) |
|
|
|
Chapter 2 Mutual exclusion |
|
|
21 | (28) |
|
|
21 | (1) |
|
|
22 | (3) |
|
|
25 | (4) |
|
|
25 | (1) |
|
|
26 | (1) |
|
|
27 | (2) |
|
|
29 | (1) |
|
|
30 | (3) |
|
|
33 | (1) |
|
2.7 Lamport's Bakery algorithm |
|
|
34 | (1) |
|
|
35 | (4) |
|
2.9 Lower bounds on the number of locations |
|
|
39 | (2) |
|
|
41 | (1) |
|
|
42 | (7) |
|
Chapter 3 Concurrent objects |
|
|
49 | (26) |
|
3.1 Concurrency and correctness |
|
|
49 | (3) |
|
|
52 | (1) |
|
3.3 Sequential consistency |
|
|
53 | (5) |
|
3.3.1 Sequential consistency versus real-time order |
|
|
55 | (1) |
|
3.3.2 Sequential consistency is nonblocking |
|
|
56 | (1) |
|
|
57 | (1) |
|
|
58 | (1) |
|
3.4.1 Linearization points |
|
|
58 | (1) |
|
3.4.2 Linearizability versus sequential consistency |
|
|
59 | (1) |
|
3.5 Quiescent consistency |
|
|
59 | (1) |
|
3.5.1 Properties of quiescent consistency |
|
|
60 | (1) |
|
|
60 | (4) |
|
|
60 | (1) |
|
|
61 | (2) |
|
3.6.3 Linearizability is compositional |
|
|
63 | (1) |
|
3.6.4 Linearizability is nonblocking |
|
|
63 | (1) |
|
3.7 Memory consistency models |
|
|
64 | (1) |
|
|
64 | (4) |
|
|
65 | (1) |
|
|
65 | (1) |
|
3.8.3 Obstruction-freedom |
|
|
66 | (1) |
|
3.8.4 Blocking progress conditions |
|
|
67 | (1) |
|
3.8.5 Characterizing progress conditions |
|
|
67 | (1) |
|
|
68 | (1) |
|
|
69 | (1) |
|
|
70 | (5) |
|
Chapter 4 Foundations of shared memory |
|
|
75 | (28) |
|
4.1 The space of registers |
|
|
76 | (5) |
|
4.2 Register constructions |
|
|
81 | (11) |
|
4.2.1 Safe MRSW registers |
|
|
82 | (1) |
|
4.2.2 A regular Boolean MRSW register |
|
|
83 | (1) |
|
4.2.3 A regular A/-valued MRSW register |
|
|
84 | (1) |
|
4.2.4 An atomic SRSW register |
|
|
85 | (2) |
|
4.2.5 An atomic MRSW register |
|
|
87 | (3) |
|
4.2.6 An atomic MRMW register |
|
|
90 | (2) |
|
|
92 | (6) |
|
4.3.1 An obstruction-free snapshot |
|
|
92 | (1) |
|
4.3.2 A wait-free snapshot |
|
|
93 | (4) |
|
4.3.3 Correctness arguments |
|
|
97 | (1) |
|
|
98 | (1) |
|
|
99 | (4) |
|
Chapter 5 The relative power of primitive synchronization operations |
|
|
103 | (26) |
|
|
104 | (2) |
|
|
105 | (1) |
|
|
106 | (3) |
|
|
109 | (1) |
|
|
110 | (3) |
|
5.5 Multiple assignment objects |
|
|
113 | (3) |
|
5.6 Read-modify-write operations |
|
|
116 | (1) |
|
5.7 Common2 RMW operations |
|
|
117 | (2) |
|
5.8 The compare And Set operation |
|
|
119 | (1) |
|
|
120 | (1) |
|
|
121 | (8) |
|
Chapter 6 Universality of consensus |
|
|
129 | (18) |
|
|
129 | (1) |
|
|
130 | (1) |
|
6.3 A lock-free universal construction |
|
|
130 | (4) |
|
6.4 A wait-free universal construction |
|
|
134 | (6) |
|
|
140 | (1) |
|
|
141 | (6) |
|
|
|
Chapter 7 Spin locks and contention |
|
|
147 | (36) |
|
7.1 Welcome to the real world |
|
|
147 | (3) |
|
7.2 Volatile fields and atomic objects |
|
|
150 | (1) |
|
|
150 | (4) |
|
|
154 | (2) |
|
|
156 | (7) |
|
|
156 | (3) |
|
|
159 | (2) |
|
|
161 | (2) |
|
7.6 A queue lock with timeouts |
|
|
163 | (3) |
|
|
166 | (5) |
|
7.7.1 A hierarchical back-off lock |
|
|
167 | (1) |
|
|
167 | (3) |
|
7.7.3 A cohort lock implementation |
|
|
170 | (1) |
|
|
171 | (7) |
|
7.9 A fast path for threads running alone |
|
|
178 | (2) |
|
7.10 One lock to rule them all |
|
|
180 | (1) |
|
|
180 | (1) |
|
|
181 | (2) |
|
Chapter 8 Monitors and blocking synchronization |
|
|
183 | (18) |
|
|
183 | (1) |
|
8.2 Monitor locks and conditions |
|
|
183 | (6) |
|
|
185 | (2) |
|
8.2.2 The lost-wakeup problem |
|
|
187 | (2) |
|
8.3 Readers-writers locks |
|
|
189 | (5) |
|
8.3.1 Simple readers-writers lock |
|
|
190 | (2) |
|
8.3.2 Fair readers-writers lock |
|
|
192 | (2) |
|
8.4 Our own reentrant lock |
|
|
194 | (1) |
|
|
194 | (2) |
|
|
196 | (1) |
|
|
197 | (4) |
|
Chapter 9 Linked lists: The role of locking |
|
|
201 | (28) |
|
|
201 | (1) |
|
|
202 | (2) |
|
|
204 | (2) |
|
9.4 Coarse-grained synchronization |
|
|
206 | (1) |
|
9.5 Fine-grained synchronization |
|
|
207 | (4) |
|
9.6 Optimistic synchronization |
|
|
211 | (4) |
|
|
215 | (5) |
|
9.8 Nonblocking synchronization |
|
|
220 | (5) |
|
|
225 | (1) |
|
|
226 | (1) |
|
|
226 | (3) |
|
Chapter 10 Queues, memory management, and the ABA problem |
|
|
229 | (22) |
|
|
229 | (1) |
|
|
230 | (1) |
|
10.3 A bounded partial queue |
|
|
230 | (5) |
|
10.4 An unbounded total queue |
|
|
235 | (1) |
|
10.5 A lock-free unbounded queue |
|
|
236 | (4) |
|
10.6 Memory reclamation and the ABA problem |
|
|
240 | (4) |
|
10.6.1 A naive synchronous queue |
|
|
244 | (1) |
|
10.7 Dual data structures |
|
|
244 | (4) |
|
|
248 | (1) |
|
|
248 | (3) |
|
Chapter 11 Stacks and elimination |
|
|
251 | (14) |
|
|
251 | (1) |
|
11.2 An unbounded lock-free stack |
|
|
251 | (3) |
|
|
254 | (1) |
|
11.4 The elimination back-off stack |
|
|
255 | (5) |
|
11.4.1 A lock-free exchanger |
|
|
255 | (2) |
|
11.4.2 The elimination array |
|
|
257 | (3) |
|
|
260 | (1) |
|
|
261 | (4) |
|
Chapter 12 Counting, sorting, and distributed coordination |
|
|
265 | (40) |
|
|
265 | (1) |
|
|
265 | (1) |
|
|
266 | (10) |
|
|
267 | (7) |
|
12.3.2 An extended example |
|
|
274 | (1) |
|
12.3.3 Performance and robustness |
|
|
275 | (1) |
|
12.4 Quiescently consistent pools and counters |
|
|
276 | (1) |
|
|
276 | (12) |
|
12.5.1 Networks that count |
|
|
276 | (3) |
|
12.5.2 The bitonic counting network |
|
|
279 | (8) |
|
12.5.3 Performance and pipelining |
|
|
287 | (1) |
|
|
288 | (4) |
|
|
292 | (1) |
|
|
293 | (3) |
|
12.8.1 Designing a sorting network |
|
|
294 | (2) |
|
|
296 | (2) |
|
12.10 Distributed coordination |
|
|
298 | (1) |
|
|
299 | (1) |
|
|
300 | (5) |
|
Chapter 13 Concurrent hashing and natural parallelism |
|
|
305 | (30) |
|
|
305 | (1) |
|
13.2 Closed-address hash sets |
|
|
306 | (9) |
|
13.2.1 A coarse-grained hash set |
|
|
308 | (2) |
|
13.2.2 A striped hash set |
|
|
310 | (1) |
|
13.2.3 A refinable hash set |
|
|
311 | (4) |
|
13.3 A lock-free hash set |
|
|
315 | (8) |
|
13.3.1 Recursive split-ordering |
|
|
315 | (3) |
|
13.3.2 The BucketList class |
|
|
318 | (1) |
|
13.3.3 The LockFreeHashSet<T> class |
|
|
319 | (4) |
|
13.4 An open-address hash set |
|
|
323 | (9) |
|
|
323 | (1) |
|
13.4.2 Concurrent cuckoo hashing |
|
|
324 | (5) |
|
13.4.3 Striped concurrent cuckoo hashing |
|
|
329 | (2) |
|
13.4.4 A refinable concurrent cuckoo hash set |
|
|
331 | (1) |
|
|
332 | (2) |
|
|
334 | (1) |
|
Chapter 14 Skiplists and balanced search |
|
|
335 | (24) |
|
|
335 | (1) |
|
14.2 Sequential skiplists |
|
|
335 | (2) |
|
14.3 A lock-based concurrent skiplist |
|
|
337 | (8) |
|
|
337 | (2) |
|
|
339 | (6) |
|
14.4 A lock-free concurrent skiplist |
|
|
345 | (10) |
|
|
345 | (3) |
|
14.4.2 The algorithm in detail |
|
|
348 | (7) |
|
14.5 Concurrent skiplists |
|
|
355 | (1) |
|
|
356 | (1) |
|
|
356 | (3) |
|
Chapter 15 Priority queues |
|
|
359 | (18) |
|
|
359 | (1) |
|
15.1.1 Concurrent priority queues |
|
|
359 | (1) |
|
15.2 An array-based bounded priority queue |
|
|
360 | (1) |
|
15.3 A tree-based bounded priority queue |
|
|
361 | (2) |
|
15.4 An unbounded heap-based priority queue |
|
|
363 | (8) |
|
|
363 | (2) |
|
|
365 | (6) |
|
15.5 A skiplist-based unbounded priority queue |
|
|
371 | (3) |
|
|
374 | (1) |
|
|
375 | (2) |
|
Chapter 16 Scheduling and work distribution |
|
|
377 | (28) |
|
|
377 | (7) |
|
16.2 Analyzing parallelism |
|
|
384 | (3) |
|
16.3 Realistic multiprocessor scheduling |
|
|
387 | (2) |
|
|
389 | (1) |
|
|
389 | (1) |
|
16.4.2 Yielding and multiprogramming |
|
|
390 | (1) |
|
16.5 Work-stealing deques |
|
|
390 | (10) |
|
16.5.1 A bounded work-stealing deque |
|
|
391 | (4) |
|
16.5.2 An unbounded work-stealing deque |
|
|
395 | (2) |
|
|
397 | (3) |
|
|
400 | (1) |
|
|
401 | (4) |
|
Chapter 17 Data parallelism |
|
|
405 | (26) |
|
|
407 | (7) |
|
17.1.1 The MapReduce framework |
|
|
408 | (2) |
|
17.1.2 A MapReduce-based WordCount application |
|
|
410 | (1) |
|
17.1.3 A MapReduce-based KMeans application |
|
|
411 | (1) |
|
17.1.4 The MapReduce implementation |
|
|
411 | (3) |
|
|
414 | (8) |
|
17.2.1 A stream-based WordCount application |
|
|
416 | (1) |
|
17.2.2 A stream-based KMeans application |
|
|
417 | (2) |
|
17.2.3 Making aggregate operations parallel |
|
|
419 | (3) |
|
|
422 | (1) |
|
|
423 | (8) |
|
|
431 | (20) |
|
|
431 | (1) |
|
18.2 Barrier implementations |
|
|
432 | (1) |
|
18.3 Sense reversing barrier |
|
|
433 | (1) |
|
18.4 Combining tree barrier |
|
|
434 | (2) |
|
|
436 | (2) |
|
18.6 Termination detection barriers |
|
|
438 | (4) |
|
|
442 | (1) |
|
|
443 | (8) |
|
Chapter 19 Optimism and manual memory management |
|
|
451 | (16) |
|
19.1 Transitioning from Java to C++ |
|
|
451 | (1) |
|
19.2 Optimism and explicit reclamation |
|
|
451 | (3) |
|
19.3 Protecting pending operations |
|
|
454 | (1) |
|
19.4 An object for managing memory |
|
|
455 | (1) |
|
|
456 | (2) |
|
|
458 | (4) |
|
19.7 Epoch-based reclamation |
|
|
462 | (3) |
|
|
465 | (1) |
|
|
466 | (1) |
|
Chapter 20 Transactional programming |
|
|
467 | (30) |
|
20.1 Challenges in concurrent programming |
|
|
467 | (5) |
|
20.1.1 Problems with locking |
|
|
467 | (1) |
|
20.1.2 Problems with explicit speculation |
|
|
468 | (2) |
|
20.1.3 Problems with nonblocking algorithms |
|
|
470 | (1) |
|
20.1.4 Problems with compositionality |
|
|
471 | (1) |
|
|
472 | (1) |
|
20.2 Transactional programming |
|
|
472 | (3) |
|
20.2.1 An example of transactional programming |
|
|
473 | (2) |
|
20.3 Hardware support for transactional programming |
|
|
475 | (3) |
|
20.3.1 Hardware speculation |
|
|
475 | (1) |
|
20.3.2 Basic cache coherence |
|
|
475 | (1) |
|
20.3.3 Transactional cache coherence |
|
|
476 | (1) |
|
20.3.4 Limitations of hardware support |
|
|
477 | (1) |
|
20.4 Transactional lock elision |
|
|
478 | (3) |
|
|
480 | (1) |
|
20.5 Transactional memory |
|
|
481 | (2) |
|
20.5.1 Run-time scheduling |
|
|
482 | (1) |
|
20.5.2 Explicit self-abort |
|
|
483 | (1) |
|
20.6 Software transactions |
|
|
483 | (9) |
|
20.6.1 Transactions with ownership records |
|
|
485 | (5) |
|
20.6.2 Transactions with value-based validation |
|
|
490 | (2) |
|
20.7 Combining hardware and software transactions |
|
|
492 | (1) |
|
20.8 Transactional data structure design |
|
|
493 | (1) |
|
|
494 | (1) |
|
|
494 | (3) |
|
Appendix A Software basics |
|
|
497 | (22) |
|
|
497 | (1) |
|
|
497 | (7) |
|
|
497 | (1) |
|
|
498 | (3) |
|
A.2.3 Yielding and sleeping |
|
|
501 | (1) |
|
A.2.4 Thread-local objects |
|
|
502 | (1) |
|
|
503 | (1) |
|
A.3 The Java memory model |
|
|
504 | (4) |
|
A.3.1 Locks and synchronized blocks |
|
|
505 | (1) |
|
|
506 | (1) |
|
|
506 | (2) |
|
|
508 | (6) |
|
|
508 | (1) |
|
|
509 | (1) |
|
A.4.3 Condition variables |
|
|
510 | (2) |
|
|
512 | (1) |
|
A.4.5 Thread-local storage |
|
|
513 | (1) |
|
|
514 | (4) |
|
|
514 | (1) |
|
|
515 | (2) |
|
A.5.3 Thread-local objects |
|
|
517 | (1) |
|
|
518 | (1) |
|
Appendix B Hardware basics |
|
|
519 | (14) |
|
B.1 Introduction (and a puzzle) |
|
|
519 | (3) |
|
B.2 Processors and threads |
|
|
522 | (1) |
|
|
522 | (1) |
|
|
523 | (1) |
|
|
523 | (3) |
|
|
524 | (2) |
|
|
526 | (1) |
|
B.6 Cache-conscious programming, or the puzzle solved |
|
|
526 | (1) |
|
B.7 Multicore and multithreaded architectures |
|
|
527 | (2) |
|
B.7.1 Relaxed memory consistency |
|
|
528 | (1) |
|
B.8 Hardware synchronization instructions |
|
|
529 | (1) |
|
|
530 | (1) |
|
|
531 | (2) |
Bibliography |
|
533 | (8) |
Index |
|
541 | |