Terminology |
|
vii | |
|
|
1 | (12) |
|
1.1 Automatic Parallelization |
|
|
1 | (2) |
|
1.2 Processor Array Structures |
|
|
3 | (2) |
|
1.3 Nested Loops and Data Dependencies |
|
|
5 | (3) |
|
|
8 | (2) |
|
|
10 | (3) |
|
2 Survey and Analysis of Partitioning and Mapping |
|
|
13 | (14) |
|
|
13 | (1) |
|
2.2 Independent Partitioning |
|
|
14 | (2) |
|
2.3 General Partitioning and Projecting For Space-Time Mapping |
|
|
16 | (8) |
|
2.3.1 Projecting-Partitioning Methods |
|
|
17 | (5) |
|
2.3.2 Partitioning-Projecting Methods |
|
|
22 | (1) |
|
|
23 | (1) |
|
2.4 Lower-Dimensional Mapping |
|
|
24 | (2) |
|
|
26 | (1) |
|
3 Improvements to Some Existing Partitioning and Mapping Methods |
|
|
27 | (20) |
|
3.1 On the Maximal Independent Partitioning |
|
|
27 | (6) |
|
|
27 | (1) |
|
3.1.2 Improved Methods of Independent Partitioning |
|
|
28 | (1) |
|
|
29 | (2) |
|
3.1.4 Algorithm Generation |
|
|
31 | (2) |
|
3.2 A Synthesis Method using LSGP Partitioning for Given-Shape Regular Arrays |
|
|
33 | (10) |
|
|
33 | (1) |
|
3.2.2 A New LSGP Synthesis Method |
|
|
33 | (3) |
|
3.2.3 The Conditions for Valid Q and t |
|
|
36 | (4) |
|
|
40 | (1) |
|
3.2.5 A Particular Area of Application |
|
|
41 | (2) |
|
|
43 | (2) |
|
|
43 | (1) |
|
|
44 | (1) |
|
|
45 | (2) |
|
4 A Methodology of Partitioning and Mapping for Given (N-1)-D Regular Arrays |
|
|
47 | (28) |
|
|
47 | (2) |
|
4.2 Transforming to Canonical Dependencies |
|
|
49 | (6) |
|
4.2.1 A Cone Including Dependencies |
|
|
49 | (3) |
|
4.2.2 Forming Canonical Dependencies in Supernode Space |
|
|
52 | (3) |
|
|
55 | (5) |
|
4.3.1 S-T Transformation and Interconnection Primitives |
|
|
55 | (1) |
|
|
56 | (3) |
|
4.3.3 Permuting the Space Projection Matrix |
|
|
59 | (1) |
|
4.4 Further LSGP Partitioning for SBC |
|
|
60 | (1) |
|
4.5 Scaling and Optimisation |
|
|
61 | (7) |
|
|
63 | (1) |
|
|
64 | (1) |
|
|
65 | (2) |
|
4.5.4 Integralization of the Quasi-supernode Transformation Matrix |
|
|
67 | (1) |
|
4.6 Examples and Discussions |
|
|
68 | (3) |
|
|
68 | (3) |
|
4.6.2 More Results and Discussions |
|
|
71 | (1) |
|
|
71 | (4) |
|
5 Optimal Mapping onto Lower-Dimensional Regular Arrays |
|
|
75 | (24) |
|
5.1 A Methodology for Partitioning and Mapping onto Lower Dimensional Arrays |
|
|
75 | (2) |
|
5.2 The Transformations into K-D Time Domain and M-D Processor Array |
|
|
77 | (2) |
|
5.2.1 Selecting a Family of T |
|
|
77 | (2) |
|
5.2.2 Scaling the Supernode Parallelepiped |
|
|
79 | (1) |
|
5.3 Maximum Local K-D Time Domain |
|
|
79 | (3) |
|
5.3.1 Intersecting a Polyhedron with a Hyperplane |
|
|
79 | (1) |
|
5.3.2 Maximum Local Supernode Domain |
|
|
80 | (1) |
|
5.3.3 Mapping to K-D Time Domain |
|
|
81 | (1) |
|
5.4 Valid Minimum Projecting Vector |
|
|
82 | (3) |
|
5.5 A General Methodology |
|
|
85 | (4) |
|
5.5.1 The First and Last Nodes of Polyhedrons |
|
|
85 | (1) |
|
|
86 | (3) |
|
|
89 | (4) |
|
|
89 | (1) |
|
|
90 | (3) |
|
5.7 Special Example: Partitioning and Mapping a Knapsack Problem onto a Linear Array |
|
|
93 | (4) |
|
5.7.1 Description of Computational Structure |
|
|
93 | (2) |
|
5.7.2 Supernode Polyhedron |
|
|
95 | (1) |
|
5.7.3 Transformation onto a Time-Processor Domain |
|
|
95 | (2) |
|
|
97 | (2) |
|
6 The Structure of Parallel Programs |
|
|
99 | (30) |
|
|
99 | (2) |
|
|
101 | (8) |
|
6.2.1 Boundary of the Supernode Polyhedron |
|
|
102 | (5) |
|
6.2.2 Vertices of the Enlarged Quasi-Supernode Polyhedron |
|
|
107 | (1) |
|
6.2.3 Boundary of a Single Supernode |
|
|
108 | (1) |
|
|
109 | (2) |
|
6.3.1 Outgoing Data of a Single Supernode |
|
|
109 | (1) |
|
6.3.2 Outgoing Data Packets of a Processor |
|
|
110 | (1) |
|
6.4 Algorithm Generation for a Lower-Dimensional Processor Array |
|
|
111 | (4) |
|
6.4.1 K-D Parallel Algorithms |
|
|
111 | (1) |
|
6.4.2 K-D Time Domain to 1-D Time Domain |
|
|
112 | (3) |
|
6.5 Algorithm Generation For LSGP Case |
|
|
115 | (7) |
|
6.5.1 Rectangular Boundary in Virtual Processor-Time Domain |
|
|
116 | (1) |
|
6.5.2 Algorithm Generation Involving LSGP |
|
|
117 | (2) |
|
6.5.3 Outgoing Data after LSGP |
|
|
119 | (1) |
|
|
120 | (2) |
|
6.5.5 Algorithm Generation for (N-1)-D SUC Case |
|
|
122 | (1) |
|
6.6 From Inequalities To Boundaries |
|
|
122 | (4) |
|
6.6.1 Finding All Possible Lower and Upper Bounds |
|
|
123 | (1) |
|
6.2.2 Deleting Redundant Bounds |
|
|
124 | (2) |
|
|
126 | (3) |
|
7 Parallel Code Generation |
|
|
129 | (22) |
|
|
129 | (1) |
|
|
129 | (2) |
|
|
131 | (4) |
|
7.3.1 From Supernode Domain to Local Supernode Domains |
|
|
132 | (1) |
|
7.3.2 Supernode Storage and in/out Data Storage |
|
|
133 | (2) |
|
|
135 | (8) |
|
|
136 | (5) |
|
7.4.2 Non LSGP Case and Direct Data Flows |
|
|
141 | (2) |
|
7.5 Outline of the Parallel Code |
|
|
143 | (7) |
|
7.5.1 Parallel Codes for Non-LSGP |
|
|
144 | (4) |
|
7.5.2 Parallel Codes with LSGP |
|
|
148 | (2) |
|
|
150 | (1) |
|
8 Experimental Results and Discussions |
|
|
151 | (12) |
|
|
151 | (6) |
|
8.1.1 Test of the Correctness |
|
|
151 | (1) |
|
8.1.2 Measuring Performance |
|
|
152 | (5) |
|
8.2 Discussions on Factors Affecting Efficiency |
|
|
157 | (5) |
|
8.2.1 Space and Time Fitness of the Mapping |
|
|
157 | (3) |
|
|
160 | (1) |
|
8.2.3 Single Supernode Loops |
|
|
161 | (1) |
|
8.2.4 The Effect of Granularity |
|
|
161 | (1) |
|
|
162 | (1) |
|
|
163 | (6) |
|
9.1 Summary of the Whole Design Procedure |
|
|
163 | (1) |
|
9.2 Evaluations and Comparisons |
|
|
164 | (3) |
|
|
164 | (2) |
|
|
166 | (1) |
|
|
167 | (1) |
|
|
167 | (2) |
|
|
169 | (80) |
|
|
177 | (6) |
|
A.1 Initial Wu'iS and Wl'iS |
|
|
177 | (2) |
|
A.2 SF as a Function of fk |
|
|
179 | (1) |
|
A.2.1 Determining fa with a Known Set of Wui and Wli |
|
|
179 | (1) |
|
A.2.2 Determining the Truning Points |
|
|
179 | (1) |
|
A.2.3 Selecting Wui and wli from Multi-Candidates |
|
|
180 | (1) |
|
A.3 Delimit fk According to Dependencies |
|
|
181 | (2) |
|
B Collection of Algorithms for (N-1)-D Partitioning and Mapping |
|
|
183 | (4) |
|
|
183 | (1) |
|
|
184 | (3) |
|
C Collection of Algorithms for Lower-Dimensional Mapping |
|
|
187 | (4) |
|
D Parallel Algorithm for Pure LSGP Method |
|
|
191 | (2) |
|
E Generating Data Flows and Relays for LSGP |
|
|
193 | (4) |
|
F The Collections of Experimental Results |
|
|
197 | (28) |
|
G Examples of Automatically Generated Parallel Codes |
|
|
225 | (24) |
|
G.1 Parallel Codes for Non-LSGP |
|
|
225 | (1) |
|
|
225 | (5) |
|
|
230 | (7) |
|
G.2 Parallel Codes for LSGP |
|
|
237 | (1) |
|
|
237 | (4) |
|
|
241 | (8) |
Index |
|
249 | |