Foreword |
|
v | |
Preface |
|
vii | |
Acknowledgments |
|
ix | |
Acronyms |
|
xix | |
Part I Introduction and Basic Concepts |
|
|
|
1 | (16) |
|
|
1 | (1) |
|
|
2 | (2) |
|
Amdahl's Law and Structured Parallel Design |
|
|
4 | (1) |
|
Introduction to PPF Systems |
|
|
4 | (4) |
|
|
8 | (9) |
|
|
10 | (1) |
|
Simple Design Example: The H.261 Decoder |
|
|
10 | (7) |
|
|
17 | (20) |
|
|
20 | (4) |
|
|
24 | (3) |
|
|
25 | (1) |
|
|
26 | (1) |
|
Data Farming and Demand-based Scheduling |
|
|
27 | (1) |
|
Data-farm Performance Criteria |
|
|
28 | (2) |
|
|
30 | (7) |
|
|
31 | (1) |
|
|
31 | (6) |
|
|
37 | (20) |
|
|
38 | (1) |
|
|
39 | (1) |
|
Parallelization of the Postcode Recognizer |
|
|
39 | (8) |
|
Partitioning the postcode recognizer |
|
|
40 | (1) |
|
Scaling the postcode recognizer |
|
|
41 | (2) |
|
|
43 | (4) |
|
Parallelization of the address verifier |
|
|
47 | (4) |
|
Partitioning the address verifier |
|
|
47 | (2) |
|
Scaling the address verifier |
|
|
49 | (1) |
|
Address verification farms |
|
|
50 | (1) |
|
Overall performance achieved |
|
|
50 | (1) |
|
Meeting the Specification |
|
|
51 | (2) |
|
|
53 | (4) |
|
|
53 | (1) |
|
Other Parallel Postcode Recognition Systems |
|
|
53 | (4) |
|
Development of PPF Applications |
|
|
57 | (10) |
|
|
58 | (1) |
|
|
59 | (1) |
|
|
60 | (2) |
|
|
62 | (5) |
Part II Analysis and Partitioning of Sequential Applications |
|
|
Initial Development of an Application |
|
|
67 | (14) |
|
|
67 | (2) |
|
Automatic and Semi-automatic Parallelization |
|
|
69 | (2) |
|
|
71 | (1) |
|
|
72 | (1) |
|
Semi-automatic Partitioning |
|
|
73 | (2) |
|
|
75 | (2) |
|
|
77 | (1) |
|
|
77 | (2) |
|
|
79 | (2) |
|
Graphical Simulation and Performance Analysis of PPFs |
|
|
81 | (14) |
|
Simulating Asynchronous Pipelines |
|
|
82 | (1) |
|
Simulation Implementation |
|
|
82 | (2) |
|
|
84 | (4) |
|
|
88 | (1) |
|
Cross-architectural Comparison |
|
|
89 | (4) |
|
|
93 | (2) |
|
Template-based Implementation |
|
|
95 | (22) |
|
Template Design Principles |
|
|
96 | (3) |
|
|
99 | (1) |
|
Parallel Logic Implementation |
|
|
100 | (1) |
|
Target Machine Implementation |
|
|
101 | (3) |
|
Common implementation issues |
|
|
102 | (2) |
|
`NOW' Implementation for Logic Debugging |
|
|
104 | (5) |
|
Target Machine Implementations for Performance Tuning |
|
|
109 | (3) |
|
|
112 | (1) |
|
|
113 | (4) |
Part III Case Studies |
|
|
|
117 | (46) |
|
Case Study 1: H.261 Encoder |
|
|
118 | (14) |
|
Purpose of parallelization |
|
|
119 | (1) |
|
`Per macroblock' quantization without motion estimation |
|
|
119 | (4) |
|
`Per picture' quantization without motion estimation |
|
|
123 | (2) |
|
`Per picture' quantization with motion estimation |
|
|
125 | (1) |
|
Implementation of the parallel encoders |
|
|
126 | (2) |
|
H.261 encoders without motion estimation |
|
|
128 | (1) |
|
H.261 encoder with motion estimation |
|
|
129 | (2) |
|
|
131 | (1) |
|
Case Study 2: H263 Encoder/Decoder |
|
|
132 | (7) |
|
Static analysis of H.263 algorithm |
|
|
134 | (1) |
|
Results from parallelizing H.263 |
|
|
135 | (4) |
|
Case Study 3: `Eigenfaces' - Face Detection |
|
|
139 | (6) |
|
|
139 | (1) |
|
|
140 | (1) |
|
|
141 | (2) |
|
Introduction of second and third farms |
|
|
143 | (2) |
|
Case Study 4: Optical Flow |
|
|
145 | (16) |
|
|
145 | (2) |
|
Existing sequential implementation |
|
|
147 | (1) |
|
|
147 | (3) |
|
|
150 | (4) |
|
|
154 | (2) |
|
|
156 | (2) |
|
|
158 | (2) |
|
|
160 | (1) |
|
|
161 | (2) |
|
|
163 | (26) |
|
Case Study 1: Karhunen-Loeve Transform (KLT) |
|
|
164 | (7) |
|
|
164 | (1) |
|
|
165 | (1) |
|
Parallelization of the KLT |
|
|
165 | (3) |
|
|
168 | (3) |
|
|
171 | (1) |
|
Case Study 2: 2D-Wavelet Transform |
|
|
171 | (8) |
|
|
172 | (1) |
|
|
173 | (1) |
|
Parallel implementation of Discrete Wavelet Transform (DWT) |
|
|
173 | (3) |
|
Parallel implementation of oversampled WT |
|
|
176 | (3) |
|
Case Study 3: Vector Quantization |
|
|
179 | (7) |
|
|
180 | (1) |
|
|
181 | (2) |
|
|
183 | (3) |
|
|
186 | (3) |
|
|
189 | (22) |
|
Case Study 1: Large Vocabulary Continuous-Speech Recognition |
|
|
190 | (6) |
|
|
190 | (1) |
|
Static analysis of the LVCR system |
|
|
191 | (2) |
|
|
193 | (2) |
|
|
195 | (1) |
|
Case Study 2: Model-based Coding |
|
|
196 | (6) |
|
Parallelization of the model-based coder |
|
|
196 | (2) |
|
|
198 | (4) |
|
Case Study 3: Microphone Beam Array |
|
|
202 | (4) |
|
Griffiths-Jim beam-former |
|
|
202 | (1) |
|
Sequential implementation |
|
|
203 | (1) |
|
Parallel implementation of the G-J Algorithm |
|
|
204 | (2) |
|
|
206 | (5) |
Part IV Underlying Theory and Analysis |
|
|
|
211 | (36) |
|
|
212 | (1) |
|
|
212 | (8) |
|
|
213 | (3) |
|
|
216 | (1) |
|
|
217 | (2) |
|
|
219 | (1) |
|
Gathering Performance Data |
|
|
220 | (1) |
|
Performance Prediction Equations |
|
|
221 | (2) |
|
|
223 | (2) |
|
|
224 | (1) |
|
|
225 | (2) |
|
Asynchronous Pipeline Estimate |
|
|
227 | (3) |
|
|
230 | (5) |
|
|
235 | (3) |
|
|
236 | (1) |
|
|
236 | (1) |
|
Heuristic scheduling schemes |
|
|
237 | (1) |
|
|
238 | (1) |
|
|
238 | (3) |
|
|
238 | (2) |
|
|
240 | (1) |
|
|
241 | (6) |
|
|
242 | (1) |
|
Outline derivation of Kruskal-Weiss prediction equation |
|
|
242 | (1) |
|
Factoring regime derivation |
|
|
243 | (4) |
|
Instrumentation of Templates |
|
|
247 | (16) |
|
|
248 | (1) |
|
|
249 | (1) |
|
|
249 | (1) |
|
|
250 | (3) |
|
Establishing a Refresh Interval |
|
|
253 | (3) |
|
|
256 | (1) |
|
Implementation on the Paramid |
|
|
257 | (2) |
|
|
259 | (4) |
Part V Future Trends |
|
|
|
263 | (6) |
|
Designing for Differing Embedded Hardware |
|
|
265 | (1) |
|
Adapting to Mobile Networked Computation |
|
|
265 | (2) |
|
|
267 | (2) |
References |
|
269 | (30) |
Index |
|
299 | |