|
|
1 | (12) |
|
Multiprocessor DSP Systems |
|
|
2 | (2) |
|
Application-Specific Multiprocessors |
|
|
4 | (1) |
|
Exploitation of Parallelism |
|
|
5 | (1) |
|
Dataflow Modeling for DSP Design |
|
|
6 | (3) |
|
Utility of Dataflow for DSP |
|
|
9 | (2) |
|
|
11 | (2) |
|
Application-Specific Multiprocessors |
|
|
13 | (22) |
|
Parallel Architecture Classifications |
|
|
13 | (2) |
|
Exploiting Instruction Level Parallelism |
|
|
15 | (5) |
|
ILP in Programmable DSP Processors |
|
|
15 | (2) |
|
|
17 | (1) |
|
|
18 | (2) |
|
Dataflow DSP Architectures |
|
|
20 | (1) |
|
Systolic and Wavefront Arrays |
|
|
20 | (1) |
|
Multiprocessor DSP Architectures |
|
|
21 | (2) |
|
Single-Chip Multiprocessors |
|
|
23 | (7) |
|
|
24 | (1) |
|
|
24 | (1) |
|
|
25 | (1) |
|
|
25 | (1) |
|
The Sandbridge Sandblaster |
|
|
26 | (1) |
|
|
26 | (1) |
|
Single-Chip Multiprocessors from Texas Instruments |
|
|
27 | (2) |
|
|
29 | (1) |
|
Heterogeneous Multiprocessors |
|
|
29 | (1) |
|
|
30 | (2) |
|
Architectures that Exploit Predictable IPC |
|
|
32 | (2) |
|
|
34 | (1) |
|
Background Terminology and Notation |
|
|
35 | (24) |
|
|
35 | (1) |
|
|
36 | (1) |
|
|
37 | (1) |
|
|
37 | (1) |
|
|
38 | (1) |
|
Analytical Properties of SDF Graphs |
|
|
39 | (1) |
|
Converting a General SDF Graph into a Homogeneous SDF Graph |
|
|
40 | (2) |
|
Acyclic Precedence Expansion Graph |
|
|
42 | (2) |
|
|
44 | (1) |
|
|
44 | (3) |
|
HSDFG Concepts and Notations |
|
|
47 | (2) |
|
|
49 | (2) |
|
Shortest and Longest Paths in Graphs |
|
|
51 | (3) |
|
|
52 | (1) |
|
The Bellman-Ford Algorithm |
|
|
52 | (1) |
|
The Floyd-Warshall Algorithm |
|
|
52 | (2) |
|
Solving Difference Constraints Using Shortest Paths |
|
|
54 | (2) |
|
|
56 | (1) |
|
|
57 | (2) |
|
DSP-Oriented Dataflow Models of Computation |
|
|
59 | (26) |
|
Scalable Synchronous Dataflow |
|
|
60 | (2) |
|
|
62 | (5) |
|
Lumped SDF Representations |
|
|
63 | (3) |
|
Phase Repetitions Vectors |
|
|
66 | (1) |
|
|
66 | (1) |
|
|
67 | (1) |
|
Multidimensional Synchronous Dataflow |
|
|
67 | (4) |
|
|
71 | (2) |
|
Reactive Process Networks |
|
|
73 | (1) |
|
Integrating Dataflow and State Machine Models |
|
|
74 | (4) |
|
|
75 | (1) |
|
|
76 | (1) |
|
Starcharts and Heterochronous Dataflow |
|
|
76 | (1) |
|
|
77 | (1) |
|
|
78 | (1) |
|
Controlled Dataflow Actors |
|
|
78 | (5) |
|
|
78 | (1) |
|
|
79 | (4) |
|
|
83 | (1) |
|
|
83 | (2) |
|
Multiprocessor Scheduling Models |
|
|
85 | (20) |
|
Task-Level Parallelism and Data Parallelism |
|
|
85 | (1) |
|
Static versus Dynamic Scheduling Strategies |
|
|
86 | (1) |
|
|
87 | (5) |
|
|
92 | (2) |
|
|
94 | (1) |
|
|
94 | (1) |
|
|
95 | (4) |
|
|
99 | (2) |
|
Execution Time Estimates and Static Schedules |
|
|
101 | (3) |
|
|
104 | (1) |
|
IPC-Conscious Scheduling Algorithms |
|
|
105 | (26) |
|
|
105 | (1) |
|
Stone's Assignment Algorithm |
|
|
106 | (4) |
|
List Scheduling Algorithms |
|
|
110 | (7) |
|
|
111 | (3) |
|
The Basic Algorithms --- HLFET and ETF |
|
|
114 | (1) |
|
|
114 | (1) |
|
|
115 | (1) |
|
Dynamic Critical Path Scheduling |
|
|
116 | (1) |
|
|
117 | (5) |
|
|
118 | (1) |
|
|
119 | (1) |
|
Dominant Sequence Clustering |
|
|
119 | (1) |
|
|
120 | (2) |
|
Integrated Scheduling Algorithms |
|
|
122 | (2) |
|
|
124 | (5) |
|
|
129 | (2) |
|
The Ordered-Transactions Strategy |
|
|
131 | (34) |
|
The Ordered-Transactions Strategy |
|
|
131 | (2) |
|
|
133 | (1) |
|
Interprocessor Communication Mechanisms |
|
|
134 | (3) |
|
Using the Ordered-Transactions Approach |
|
|
137 | (1) |
|
Design of an Ordered Memory Access Multiprocessor |
|
|
138 | (4) |
|
High Level Design Description |
|
|
138 | (1) |
|
|
139 | (3) |
|
Design Details of a Prototype |
|
|
142 | (13) |
|
|
142 | (2) |
|
Transaction Order Controller |
|
|
144 | (5) |
|
|
149 | (1) |
|
|
150 | (1) |
|
|
151 | (3) |
|
|
154 | (1) |
|
Connecting Multiple Boards |
|
|
154 | (1) |
|
Hardware and Software Implementation |
|
|
155 | (2) |
|
|
155 | (1) |
|
|
156 | (1) |
|
Ordered I/O and Parameter Control |
|
|
157 | (2) |
|
|
159 | (3) |
|
|
159 | (1) |
|
|
159 | (3) |
|
1024 Point Complex Fast Fourier Transform (FFT) |
|
|
162 | (1) |
|
|
162 | (3) |
|
Analysis of the Ordered-Transactions Strategy |
|
|
165 | (24) |
|
Inter-processor Communication Graph (Gipc) |
|
|
168 | (5) |
|
|
173 | (1) |
|
Ordering Constraints Viewed as Added Edges |
|
|
174 | (1) |
|
|
175 | (1) |
|
|
176 | (3) |
|
Effects of Changes in Execution Times |
|
|
179 | (7) |
|
|
180 | (2) |
|
Modeling Run-time Variations in Execution Times |
|
|
182 | (2) |
|
Bounds on the Average Iteration Period |
|
|
184 | (1) |
|
Implications for the Ordered Transactions Schedule |
|
|
185 | (1) |
|
Effects of Interprocessor Communication Costs |
|
|
186 | (2) |
|
|
188 | (1) |
|
Extending the OMA Architecture |
|
|
189 | (16) |
|
|
189 | (2) |
|
Parallel Implementation on Shared Memory Machines |
|
|
191 | (11) |
|
|
191 | (3) |
|
Implementation on the OMA |
|
|
194 | (2) |
|
|
196 | (3) |
|
Generating the Annotated Bus Access List |
|
|
199 | (3) |
|
|
202 | (1) |
|
|
203 | (2) |
|
Synchronization in Self-Timed Systems |
|
|
205 | (36) |
|
The Barrier MIMD Technique |
|
|
206 | (1) |
|
Redundant Synchronization Removal in Non-Iterative Dataflow |
|
|
207 | (3) |
|
Analysis of Self-Timed Execution |
|
|
210 | (1) |
|
|
210 | (1) |
|
Strongly Connected Components and Buffer Size Bounds |
|
|
210 | (3) |
|
|
213 | (5) |
|
Synchronization Protocols |
|
|
213 | (2) |
|
The Synchronization Graph |
|
|
215 | (3) |
|
A Synchronization Cost Metric |
|
|
218 | (1) |
|
Removing Redundant Synchronizations |
|
|
219 | (6) |
|
The Independence of Redundant Synchronizations |
|
|
220 | (1) |
|
Removing Redundant Synchronizations |
|
|
221 | (2) |
|
Comparison with Shaffer's Approach |
|
|
223 | (1) |
|
|
223 | (2) |
|
Making the Synchronization Graph Strongly Connected |
|
|
225 | (4) |
|
Adding Edges to the Synchronization Graph |
|
|
227 | (2) |
|
|
229 | (10) |
|
Analysis of DetermineDelays |
|
|
231 | (3) |
|
|
234 | (2) |
|
|
236 | (1) |
|
|
237 | (1) |
|
|
238 | (1) |
|
|
239 | (2) |
|
|
241 | (32) |
|
Definition of Resynchronization |
|
|
241 | (2) |
|
Properties of Resynchronization |
|
|
243 | (3) |
|
Relationship to Set Covering |
|
|
246 | (3) |
|
Intractability of Resynchronization |
|
|
249 | (3) |
|
|
252 | (12) |
|
Applying Set Covering Techniques to Pairs of SCCs |
|
|
252 | (3) |
|
|
255 | (4) |
|
Unit-Subsumption Resynchronization Edges |
|
|
259 | (2) |
|
|
261 | (2) |
|
|
263 | (1) |
|
Chainable Synchronization Graphs |
|
|
264 | (6) |
|
Chainable Synchronization Graph SCCs |
|
|
264 | (2) |
|
Comparison to the Global-Resynchronize Heuristic |
|
|
266 | (2) |
|
A Generalization of the Chaining Technique |
|
|
268 | (1) |
|
Incorporating the Chaining Technique |
|
|
268 | (2) |
|
Resynchronization of Constraint Graphs for Relative Scheduling |
|
|
270 | (1) |
|
|
271 | (2) |
|
Latency-Constrained Resynchronization |
|
|
273 | (46) |
|
Elimination of Synchronization Edges |
|
|
274 | (1) |
|
Latency-Constrained Resynchronization |
|
|
275 | (6) |
|
|
281 | (7) |
|
|
288 | (16) |
|
|
289 | (1) |
|
Two-Processor Latency-Constrained Resynchronization |
|
|
289 | (4) |
|
Taking Delays into Account |
|
|
293 | (11) |
|
A Heuristic for General Synchronization Graphs |
|
|
304 | (11) |
|
Customization to Transparent Synchronization Graphs |
|
|
305 | (2) |
|
|
307 | (1) |
|
|
308 | (7) |
|
|
315 | (4) |
|
Integrated Synchronization Optimization |
|
|
319 | (6) |
|
|
319 | (1) |
|
A Framework for Self-Timed Implementation |
|
|
320 | (2) |
|
|
322 | (3) |
|
Future Research Directions |
|
|
325 | (4) |
Bibliography |
|
329 | (24) |
Index |
|
353 | (8) |
About the Authors |
|
361 | |