|
Experiences with Shared Virtual Memory on System Area Network Clusters: System Simulation, Implementation, and Emulation |
|
|
1 | (50) |
|
|
1 | (2) |
|
Overall Methodology and Results |
|
|
3 | (4) |
|
|
3 | (1) |
|
|
4 | (2) |
|
|
6 | (1) |
|
|
7 | (10) |
|
|
7 | (2) |
|
|
9 | (1) |
|
|
10 | (1) |
|
Effects of Communication Parameters |
|
|
11 | (3) |
|
Limitations on Application Performance |
|
|
14 | (2) |
|
|
16 | (1) |
|
|
17 | (9) |
|
Network Interface and SVM Protocol Extensions |
|
|
18 | (2) |
|
|
20 | (1) |
|
|
21 | (1) |
|
|
21 | (3) |
|
|
24 | (2) |
|
|
26 | (10) |
|
|
26 | (4) |
|
Impact of Fast Interconnection Networks |
|
|
30 | (2) |
|
Impact of Wide, CC-NUMA Nodes |
|
|
32 | (4) |
|
Discussion on Methodology |
|
|
36 | (1) |
|
|
37 | (2) |
|
|
39 | (2) |
|
|
41 | (1) |
|
|
41 | (10) |
|
Average-Case Scalability Analysis of Parallel Computations |
|
|
51 | (45) |
|
|
51 | (3) |
|
|
51 | (1) |
|
|
52 | (2) |
|
Scalability of Parallel Computations |
|
|
54 | (4) |
|
Isoefficiency Scalability |
|
|
54 | (2) |
|
|
56 | (1) |
|
|
57 | (1) |
|
Task Precedence Graphs on Multiprocessors |
|
|
58 | (15) |
|
|
58 | (1) |
|
|
59 | (4) |
|
|
63 | (1) |
|
|
63 | (1) |
|
|
64 | (3) |
|
|
67 | (2) |
|
|
69 | (2) |
|
|
71 | (2) |
|
Task Interaction Graphs on Multicomputers |
|
|
73 | (18) |
|
|
73 | (1) |
|
Symmetric Static Networks |
|
|
74 | (5) |
|
Completely Connected Networks |
|
|
79 | (1) |
|
|
80 | (1) |
|
Meshes with Wraparound Connections (Tori) |
|
|
81 | (2) |
|
|
83 | (2) |
|
|
85 | (5) |
|
Comparison and Discussion |
|
|
90 | (1) |
|
|
91 | (1) |
|
|
91 | (2) |
|
|
93 | (3) |
|
Parallel IO Prefetching and Caching |
|
|
96 | (56) |
|
|
96 | (11) |
|
Improving I/O Performance with Parallel I/O |
|
|
97 | (1) |
|
Overview of Parallel I/O Organizations |
|
|
98 | (1) |
|
|
99 | (2) |
|
Simplifying the Problem: Read-Once Accesses |
|
|
101 | (2) |
|
Caching: Dealing with Repeated Accesses |
|
|
103 | (1) |
|
An Application to Real-Time Video Scheduling |
|
|
104 | (2) |
|
Organization of This Chapter |
|
|
106 | (1) |
|
Formalizing the Scheduling Problem |
|
|
107 | (2) |
|
Model of Parallel I/O System |
|
|
107 | (1) |
|
The Parallel I/O Scheduling Problem |
|
|
108 | (1) |
|
Offline Prefetching and Caching |
|
|
109 | (8) |
|
|
109 | (2) |
|
|
111 | (6) |
|
Online Prefetching and Caching |
|
|
117 | (30) |
|
|
118 | (1) |
|
|
119 | (1) |
|
|
120 | (10) |
|
|
130 | (11) |
|
|
141 | (6) |
|
|
147 | (1) |
|
|
148 | (4) |
|
A C++/Tuple-Lock Implementation for Distributed Objects |
|
|
152 | (30) |
|
|
152 | (2) |
|
|
153 | (1) |
|
|
154 | (1) |
|
|
154 | (3) |
|
|
157 | (1) |
|
|
157 | (3) |
|
Design and Implementation |
|
|
160 | (8) |
|
|
160 | (1) |
|
|
160 | (1) |
|
|
161 | (3) |
|
|
164 | (1) |
|
|
164 | (1) |
|
|
165 | (3) |
|
|
168 | (7) |
|
|
169 | (1) |
|
|
169 | (3) |
|
|
172 | (3) |
|
|
175 | (3) |
|
|
176 | (1) |
|
|
176 | (2) |
|
Advantages and Disadvantages |
|
|
178 | (1) |
|
|
179 | (1) |
|
|
179 | (3) |
|
Static Data Allocation and Load Balancing Techniques for Heterogeneous Systems |
|
|
182 | (43) |
|
|
182 | (1) |
|
Static Data Allocation Strategies for Linear Arrays |
|
|
183 | (10) |
|
|
184 | (1) |
|
Distributing Independent Chunks |
|
|
185 | (3) |
|
Finite-Difference Computations |
|
|
188 | (1) |
|
|
189 | (4) |
|
Static Data Allocation Strategies for Geometric Problems |
|
|
193 | (13) |
|
|
193 | (3) |
|
The Heterogeneous MM Optimization Problem |
|
|
196 | (7) |
|
|
203 | (3) |
|
|
206 | (8) |
|
|
207 | (2) |
|
With Initial and Final Communications |
|
|
209 | (1) |
|
Solution With an Initial Scattering of Data |
|
|
210 | (1) |
|
Solution With Initial and Final Communications |
|
|
211 | (3) |
|
|
214 | (1) |
|
|
214 | (1) |
|
|
215 | (10) |
|
Building a Global Object Space for Supporting Single System Image on a Cluster |
|
|
225 | (36) |
|
|
225 | (1) |
|
|
226 | (5) |
|
|
227 | (1) |
|
JESSICA System Architecture |
|
|
228 | (2) |
|
Preemtpive Thread Migration |
|
|
230 | (1) |
|
|
231 | (1) |
|
Design Issues of the Global Object Space |
|
|
231 | (7) |
|
|
231 | (2) |
|
GOS Initialization and Object Allocation |
|
|
233 | (1) |
|
|
234 | (1) |
|
Criteria for an Efficient GOS |
|
|
234 | (4) |
|
Factors Contributing to GOS Efficiency |
|
|
238 | (13) |
|
Memory Consistency Models |
|
|
238 | (5) |
|
|
243 | (6) |
|
Implementation Optimizations |
|
|
249 | (2) |
|
|
251 | (3) |
|
|
254 | (1) |
|
|
255 | (2) |
|
|
257 | (1) |
|
|
257 | (1) |
|
|
257 | (4) |
|
A Computation-Centric Multilocation Consistency Model for Shared Memory |
|
|
261 | (20) |
|
|
261 | (3) |
|
Computation-Centric Framework |
|
|
264 | (2) |
|
|
266 | (3) |
|
Weakening MLC: The MLC Hierarchy |
|
|
269 | (1) |
|
Backer is Multilocation Consistent |
|
|
270 | (2) |
|
Strengthening MLC: Acquires and Releases |
|
|
272 | (1) |
|
|
273 | (1) |
|
|
273 | (2) |
|
|
275 | (6) |
|
Proof of Acyclicity Theorem (i) |
|
|
277 | (1) |
|
Proof of Reasonableness Theorem |
|
|
278 | (2) |
|
|
280 | (1) |
|
|
281 | |