List of Figures |
|
xiii | |
List of Tables |
|
xxi | |
Foreword |
|
xxiii | |
Preface |
|
xxv | |
1 Multi-Core Architectures for Embedded Systems |
|
1 | (30) |
|
|
|
2 | (7) |
|
1.1.1 What Makes Multiprocessor Solutions Attractive? |
|
|
3 | (6) |
|
1.2 Architectural Considerations |
|
|
9 | (2) |
|
1.3 Interconnection Networks |
|
|
11 | (2) |
|
1.4 Software Optimizations |
|
|
13 | (1) |
|
|
14 | (11) |
|
1.5.1 HiBRID-SoC for Multimedia Signal Processing |
|
|
14 | (2) |
|
1.5.2 VIPER Multiprocessor SoC |
|
|
16 | (1) |
|
1.5.3 Defect-Tolerant and Reconfigurable MPSoC |
|
|
17 | (1) |
|
1.5.4 Homogeneous Multiprocessor for Embedded Printer Application |
|
|
18 | (2) |
|
1.5.5 General Purpose Multiprocessor DSP |
|
|
20 | (1) |
|
1.5.6 Multiprocessor DSP for Mobile Applications |
|
|
21 | (2) |
|
1.5.7 Multi-Core DSP Platforms |
|
|
23 | (2) |
|
|
25 | (1) |
|
|
25 | (2) |
|
|
27 | (4) |
2 Application-Specific Customizable Embedded Systems |
|
31 | (40) |
|
|
|
32 | (2) |
|
2.2 Challenges and Opportunities |
|
|
34 | (3) |
|
|
35 | (2) |
|
|
37 | (4) |
|
2.3.1 Customized Application-Specific Processor Techniques |
|
|
37 | (3) |
|
2.3.2 Customized Application-Specific On-Chip Interconnect Techniques |
|
|
40 | (1) |
|
2.4 Configurable Processors and Instruction Set Synthesis |
|
|
41 | (11) |
|
2.4.1 Design Methodology for Processor Customization |
|
|
43 | (1) |
|
2.4.2 Instruction Set Extension Techniques |
|
|
44 | (4) |
|
2.4.3 Application-Specific Memory-Aware Customization |
|
|
48 | (1) |
|
2.4.4 Customizing On-Chip Communication Interconnect |
|
|
48 | (1) |
|
2.4.5 Customization of MPSoCs |
|
|
49 | (3) |
|
2.5 Reconfigurable Instruction Set Processors |
|
|
52 | (2) |
|
|
53 | (1) |
|
2.6 Hardware/Software Codesign |
|
|
54 | (1) |
|
2.7 Hardware Architecture Description Languages |
|
|
55 | (3) |
|
2.7.1 LISATek Design Platform |
|
|
57 | (1) |
|
|
58 | (2) |
|
2.9 Case Study: Realizing Customizable Multi-Core Designs |
|
|
60 | (2) |
|
2.10 The Future: System Design with Customizable Architectures, Software, and Tools |
|
|
62 | (1) |
|
|
63 | (1) |
|
|
63 | (8) |
3 Power Optimization in Multi-Core System-on-Chip |
|
71 | (40) |
|
|
|
|
|
|
72 | (2) |
|
|
74 | (8) |
|
|
75 | (5) |
|
3.2.2 Power Analysis Tools |
|
|
80 | (2) |
|
|
82 | (5) |
|
|
82 | (1) |
|
|
83 | (1) |
|
|
84 | (1) |
|
|
85 | (1) |
|
3.3.5 Application Examples |
|
|
86 | (1) |
|
3.4 On-Chip Communication Architectures |
|
|
87 | (3) |
|
|
90 | (5) |
|
|
91 | (4) |
|
3.6 DPM and DVS in Multi-Core Systems |
|
|
95 | (5) |
|
|
100 | (1) |
|
|
101 | (1) |
|
|
102 | (9) |
4 Routing Algorithms for Irregular Mesh-Based Network-on- Chip |
|
111 | (44) |
|
|
|
|
112 | (1) |
|
4.2 An Overview of Irregular Mesh Topology |
|
|
113 | (2) |
|
|
113 | (1) |
|
4.2.2 Irregular Mesh Topology |
|
|
113 | (2) |
|
4.3 Fault-Tolerant Routing Algorithms for 2D Meshes |
|
|
115 | (11) |
|
4.3.1 Fault-Tolerant Routing Using Virtual Channels |
|
|
116 | (1) |
|
4.3.2 Fault-Tolerant Routing with Turn Model |
|
|
117 | (9) |
|
4.4 Routing Algorithms for Irregular Mesh Topology |
|
|
126 | (10) |
|
4.4.1 Traffic-Balanced OAPR Routing Algorithm |
|
|
127 | (5) |
|
4.4.2 Application-Specific Routing Algorithm |
|
|
132 | (4) |
|
4.5 Placement for Irregular Mesh Topology |
|
|
136 | (7) |
|
4.5.1 OIP Placements Based on Chen and Chiu's Algorithm |
|
|
137 | (3) |
|
4.5.2 OIP Placements Based on OAPR |
|
|
140 | (3) |
|
4.6 Hardware Efficient Routing Algorithms |
|
|
143 | (8) |
|
4.6.1 Turns-Table Routing (TT) |
|
|
146 | (1) |
|
4.6.2 XY-Deviation Table Routing (XYDT) |
|
|
147 | (1) |
|
4.6.3 Source Routing for Deviation Points (SRDP) |
|
|
147 | (1) |
|
4.6.4 Degree Priority Routing Algorithm |
|
|
148 | (3) |
|
|
151 | (1) |
|
|
151 | (1) |
|
|
151 | (4) |
5 Debugging Multi-Core Systems-on-Chip |
|
155 | (46) |
|
|
|
|
156 | (2) |
|
5.2 Why Debugging Is Difficult |
|
|
158 | (5) |
|
5.2.1 Limited Internal Observability |
|
|
158 | (1) |
|
5.2.2 Asynchronicity and Consistent Global States |
|
|
159 | (2) |
|
5.2.3 Non-Determinism and Multiple Traces |
|
|
161 | (2) |
|
|
163 | (6) |
|
|
164 | (1) |
|
5.3.2 Example Erroneous System |
|
|
165 | (1) |
|
|
166 | (3) |
|
|
169 | (5) |
|
|
169 | (2) |
|
5.4.2 Comparing Existing Debug Methods |
|
|
171 | (3) |
|
|
174 | (4) |
|
5.5.1 Communication-Centric Debug |
|
|
175 | (1) |
|
|
175 | (1) |
|
5.5.3 Run/Stop-Based Debug |
|
|
176 | (1) |
|
5.5.4 Abstraction-Based Debug |
|
|
176 | (2) |
|
5.6 On-Chip Debug Infrastructure |
|
|
178 | (6) |
|
|
178 | (1) |
|
|
178 | (2) |
|
5.6.3 Computation-Specific Instrument |
|
|
180 | (1) |
|
5.6.4 Protocol-Specific Instrument |
|
|
181 | (1) |
|
5.6.5 Event Distribution Interconnect |
|
|
182 | (1) |
|
5.6.6 Debug Control Interconnect |
|
|
183 | (1) |
|
5.6.7 Debug Data Interconnect |
|
|
183 | (1) |
|
5.7 Off-Chip Debug Infrastructure |
|
|
184 | (6) |
|
|
184 | (1) |
|
5.7.2 Abstractions Used by Debugger Software |
|
|
184 | (6) |
|
|
190 | (3) |
|
|
193 | (1) |
|
|
194 | (1) |
|
|
194 | (7) |
6 System-Level Tools for NoC-Based Multi-Core Design |
|
201 | (42) |
|
|
|
|
|
202 | (4) |
|
|
204 | (2) |
|
6.2 Synthetic Traffic Models |
|
|
206 | (1) |
|
6.3 Graph Theoretical Analysis |
|
|
207 | (3) |
|
6.3.1 Generating Synthetic Graphs Using TGFF |
|
|
209 | (1) |
|
6.4 Task Mapping for SoC Applications |
|
|
210 | (6) |
|
6.4.1 Application Task Embedding and Quality Metrics |
|
|
210 | (4) |
|
6.4.2 SCOTCH Partitioning Tool |
|
|
214 | (2) |
|
6.5 OMNeT++ Simulation Framework |
|
|
216 | (1) |
|
|
217 | (14) |
|
6.6.1 Application Task Graphs |
|
|
217 | (1) |
|
6.6.2 Prospective NoC Topology Models |
|
|
218 | (1) |
|
6.6.3 Spidergon Network on Chip |
|
|
219 | (2) |
|
6.6.4 Task Graph Embedding and Analysis |
|
|
221 | (2) |
|
6.6.5 Simulation Models for Proposed NoC Topologies |
|
|
223 | (4) |
|
6.6.6 Mpeg4: A Realistic Scenario |
|
|
227 | (4) |
|
6.7 Conclusions and Extensions |
|
|
231 | (3) |
|
|
234 | (1) |
|
|
235 | (8) |
7 Compiler Techniques for Application Level Memory Optimization for MPSoC |
|
243 | (26) |
|
|
|
|
|
|
|
|
244 | (1) |
|
7.2 Loop Transformation for Single and Multiprocessors |
|
|
245 | (1) |
|
7.3 Program Transformation Concepts |
|
|
246 | (2) |
|
7.4 Memory Optimization Techniques |
|
|
248 | (2) |
|
|
249 | (1) |
|
|
249 | (1) |
|
|
249 | (1) |
|
7.5 MPSoC Memory Optimization Techniques |
|
|
250 | (5) |
|
|
251 | (1) |
|
7.5.2 Comparison of Lexicographically Positive and Positive Dependency |
|
|
252 | (1) |
|
|
253 | (1) |
|
|
254 | (1) |
|
|
255 | (1) |
|
|
255 | (1) |
|
|
256 | (1) |
|
7.7 Improvement in Optimization Techniques |
|
|
256 | (5) |
|
7.7.1 Parallel Processing Area and Partitioning |
|
|
256 | (3) |
|
7.7.2 Modulo Operator Elimination |
|
|
259 | (1) |
|
7.7.3 Unimodular Transformation |
|
|
260 | (1) |
|
|
261 | (2) |
|
7.8.1 Cache Ratio and Memory Space |
|
|
262 | (1) |
|
7.8.2 Processing Time and Code Size |
|
|
263 | (1) |
|
|
263 | (1) |
|
|
264 | (1) |
|
|
265 | (1) |
|
|
266 | (3) |
8 Programming Models for Multi-Core Embedded Software |
|
269 | (40) |
|
|
|
|
|
|
270 | (2) |
|
8.2 Thread Libraries for Multi-Threaded Programming |
|
|
272 | (4) |
|
8.3 Protections for Data Integrity in a Multi-Threaded Environment |
|
|
276 | (3) |
|
8.3.1 Mutual Exclusion Primitives for Deterministic Output |
|
|
276 | (2) |
|
8.3.2 Transactional Memory |
|
|
278 | (1) |
|
8.4 Programming Models for Shared Memory and Distributed Memory |
|
|
279 | (3) |
|
|
279 | (1) |
|
8.4.2 Thread Building Blocks |
|
|
280 | (1) |
|
8.4.3 Message Passing Interface |
|
|
281 | (1) |
|
8.5 Parallel Programming on Multiprocessors |
|
|
282 | (1) |
|
8.6 Parallel Programming Using Graphic Processors |
|
|
283 | (1) |
|
8.7 Model-Driven Code Generation for Multi-Core Systems |
|
|
284 | (2) |
|
|
285 | (1) |
|
8.8 Synchronous Programming Languages |
|
|
286 | (2) |
|
8.9 Imperative Synchronous Language: Esterel |
|
|
288 | (2) |
|
|
288 | (1) |
|
8.9.2 Multi-Core Implementations and Their Compilation Schemes |
|
|
289 | (1) |
|
8.10 Declarative Synchronous Language: LUSTRE |
|
|
290 | (2) |
|
|
291 | (1) |
|
8.10.2 Multi-Core Implementations from LUSTRE Specifications |
|
|
291 | (1) |
|
8.11 Multi-Rate Synchronous Language: SIGNAL |
|
|
292 | (7) |
|
|
292 | (1) |
|
8.11.2 Characterization and Compilation of SIGNAL |
|
|
293 | (1) |
|
8.11.3 SIGNAL Implementations on Distributed Systems |
|
|
294 | (2) |
|
8.11.4 Multi-Threaded Programming Models for SIGNAL |
|
|
296 | (3) |
|
8.12 Programming Models for Real-Time Software |
|
|
299 | (2) |
|
8.12.1 Real-Time Extensions to Synchronous Languages |
|
|
300 | (1) |
|
8.13 Future Directions for Multi-Core Programming |
|
|
301 | (1) |
|
|
302 | (3) |
|
|
305 | (4) |
9 Operating System Support for Multi-Core Systems-on-Chips |
|
309 | (28) |
|
|
|
|
310 | (1) |
|
9.2 Ideal Software Organization |
|
|
311 | (2) |
|
9.3 Programming Challenges |
|
|
313 | (1) |
|
|
314 | (8) |
|
9.4.1 Board Support Package |
|
|
314 | (3) |
|
9.4.2 General Purpose Operating System |
|
|
317 | (5) |
|
9.5 Real-Time and Component-Based Operating System Models |
|
|
322 | (7) |
|
9.5.1 Automated Application Code Generation and RTOS Modeling |
|
|
322 | (4) |
|
9.5.2 Component-Based Operating System |
|
|
326 | (3) |
|
|
329 | (1) |
|
|
330 | (2) |
|
|
332 | (1) |
|
|
333 | (4) |
10 Autonomous Power Management in Embedded Multi-Cores |
|
337 | (32) |
|
|
|
|
|
|
|
338 | (4) |
|
10.1.1 Why Is Autonomous Power Management Necessary? |
|
|
339 | (3) |
|
10.2 Survey of Autonomous Power Management Techniques |
|
|
342 | (5) |
|
|
342 | (1) |
|
|
343 | (1) |
|
10.2.3 Dynamic Voltage and Frequency Scaling |
|
|
343 | (1) |
|
|
344 | (1) |
|
|
345 | (1) |
|
10.2.6 Commercial Power Management Tools |
|
|
346 | (1) |
|
10.3 Power Management and RTOS |
|
|
347 | (2) |
|
10.4 Power-Smart RTOS and Processor Simulators |
|
|
349 | (2) |
|
10.4.1 Chip Multi-Threading (CMT) Architecture Simulator |
|
|
350 | (1) |
|
10.5 Autonomous Power Saving in Multi-Core Processors |
|
|
351 | (7) |
|
10.5.1 Opportunities to Save Power |
|
|
353 | (1) |
|
10.5.2 Strategies to Save Power |
|
|
354 | (2) |
|
10.5.3 Case Study: Power Saving in Intel Centrino |
|
|
356 | (2) |
|
10.6 Power Saving Algorithms |
|
|
358 | (2) |
|
10.6.1 Local PMU Algorithm |
|
|
358 | (1) |
|
10.6.2 Global PMU Algorithm |
|
|
358 | (2) |
|
|
360 | (2) |
|
|
362 | (1) |
|
|
363 | (6) |
11 Multi-Core System-on-Chip in Real World Products |
|
369 | (30) |
|
|
|
|
|
|
370 | (1) |
|
11.2 Overview of picoArray Architecture |
|
|
371 | (4) |
|
11.2.1 Basic Processor Architecture |
|
|
371 | (2) |
|
11.2.2 Communications Interconnect |
|
|
373 | (1) |
|
11.2.3 Peripherals and Hardware Functional Accelerators |
|
|
373 | (2) |
|
|
375 | (6) |
|
11.3.1 picoVhdl Parser (Analyzer, Elaborator, Assembler) |
|
|
376 | (1) |
|
|
376 | (2) |
|
|
378 | (3) |
|
11.3.4 Design Partitioning for Multiple Devices |
|
|
381 | (1) |
|
|
381 | (1) |
|
|
381 | (1) |
|
11.4 picoArray Debug and Analysis |
|
|
381 | (7) |
|
|
382 | (1) |
|
|
383 | (1) |
|
|
383 | (2) |
|
|
385 | (2) |
|
|
387 | (1) |
|
|
387 | (1) |
|
11.5 Hardening Process in Practice |
|
|
388 | (4) |
|
11.5.1 Viterbi Decoder Hardening |
|
|
389 | (3) |
|
|
392 | (4) |
|
|
396 | (1) |
|
|
396 | (1) |
|
|
397 | (2) |
12 Embedded Multi-Core Processing for Networking |
|
399 | (66) |
|
|
|
|
400 | (3) |
|
12.2 Overview of Proposed NPU Architectures |
|
|
403 | (9) |
|
12.2.1 Multi-Core Embedded Systems for Multi-Service Broadband Access and Multimedia Home Networks |
|
|
403 | (2) |
|
12.2.2 SoC Integration of Network Components and Examples of Commercial Access NPUs |
|
|
405 | (2) |
|
12.2.3 NPU Architectures for Core Network Nodes and High-Speed Networking and Switching |
|
|
407 | (5) |
|
12.3 Programmable Packet Processing Engines |
|
|
412 | (10) |
|
|
413 | (5) |
|
12.3.2 Multi-Threading Support |
|
|
418 | (3) |
|
12.3.3 Specialized Instruction Set Architectures |
|
|
421 | (1) |
|
12.4 Address Lookup and Packet Classification Engines |
|
|
422 | (9) |
|
12.4.1 Classification Techniques |
|
|
424 | (2) |
|
|
426 | (5) |
|
12.5 Packet Buffering and Queue Management Engines |
|
|
431 | (11) |
|
12.5.1 Performance Issues |
|
|
433 | (2) |
|
12.5.2 Design of Specialized Core for Implementation of Queue Management in Hardware |
|
|
435 | (7) |
|
|
442 | (11) |
|
12.6.1 Data Structures in Scheduling Architectures |
|
|
443 | (1) |
|
|
444 | (6) |
|
12.6.3 Traffic Scheduling |
|
|
450 | (3) |
|
|
453 | (2) |
|
|
455 | (4) |
|
|
459 | (6) |
Index |
|
465 | |