Preface |
|
vii | |
|
|
xiii | |
|
|
xvii | |
|
1 Multicore Systems Design Methodology |
|
|
1 | (16) |
|
|
1 | (1) |
|
1.2 MCSoCs Design Problems |
|
|
2 | (2) |
|
1.3 Multicore architecture platform |
|
|
4 | (1) |
|
1.4 Application specific MCSoC design method |
|
|
5 | (1) |
|
1.5 QueueCore architecture |
|
|
6 | (6) |
|
1.5.1 Hardware pipeline structure |
|
|
7 | (1) |
|
1.5.2 Floating point organization |
|
|
8 | (4) |
|
1.6 QueueCore synthesis and evaluation results |
|
|
12 | (2) |
|
|
12 | (1) |
|
|
13 | (1) |
|
|
14 | (3) |
|
2 Design for Low Power Systems |
|
|
17 | (18) |
|
|
17 | (2) |
|
2.2 Power Aware Technological-level Design optimizations |
|
|
19 | (3) |
|
2.2.1 Factors affecting CMOS power consumption |
|
|
19 | (1) |
|
2.2.2 Reducing voltage and frequency |
|
|
20 | (1) |
|
2.2.3 Reducing capacitance |
|
|
21 | (1) |
|
2.3 Power Aware Logic-level Design Optimizations |
|
|
22 | (2) |
|
|
22 | (1) |
|
|
23 | (1) |
|
|
23 | (1) |
|
2.4 Power-Aware System Level Design Optimizations |
|
|
24 | (10) |
|
2.4.1 Hardware system architecture power consumption optimizations |
|
|
24 | (4) |
|
2.4.2 Operating system power consumption optimization |
|
|
28 | (1) |
|
2.4.3 Application, compilation techniques and algorithm |
|
|
29 | (1) |
|
2.4.4 Energy reduction in network protocols |
|
|
30 | (4) |
|
|
34 | (1) |
|
3 Network-on-Chip for Multi- and Many-Core Systems |
|
|
35 | (26) |
|
|
35 | (2) |
|
|
37 | (11) |
|
3.2.1 Switching Technique |
|
|
38 | (1) |
|
|
39 | (1) |
|
3.2.3 Pipeline Processing |
|
|
40 | (8) |
|
|
48 | (7) |
|
3.3.1 Interconnection Topologies |
|
|
48 | (4) |
|
3.3.2 Topological Properties |
|
|
52 | (3) |
|
|
55 | (4) |
|
3.4.1 Deadlocks and Livelocks |
|
|
55 | (1) |
|
|
56 | (3) |
|
|
59 | (2) |
|
4 Parallelizing Compiler for High Performance Computing |
|
|
61 | (18) |
|
4.1 Instruction Level Parallelism |
|
|
61 | (3) |
|
4.2 Parallel Queue Compiler |
|
|
64 | (1) |
|
4.3 Parallel Queue Compiler Frame Work |
|
|
65 | (13) |
|
4.3.1 1-offset P-Code Generation Phase |
|
|
66 | (2) |
|
4.3.2 Offset Calculation Phase |
|
|
68 | (2) |
|
4.3.3 Instruction Scheduling Phase |
|
|
70 | (1) |
|
4.3.4 Natural Instruction Level Parallelism Extraction: Statement Merging Transformation |
|
|
71 | (2) |
|
4.3.5 Assembly Generation Phase |
|
|
73 | (1) |
|
|
74 | (4) |
|
|
78 | (1) |
|
5 Dual-Execution Processor Architecture for Embedded Computing |
|
|
79 | (28) |
|
|
79 | (2) |
|
|
81 | (16) |
|
|
82 | (1) |
|
|
83 | (1) |
|
|
84 | (1) |
|
5.2.4 Dynamic switching mechanism |
|
|
84 | (1) |
|
5.2.5 Calculation of produced and consumed data |
|
|
85 | (1) |
|
5.2.6 Queue-Stack Computation unit |
|
|
85 | (1) |
|
5.2.7 Sources-Results computing mechanism |
|
|
86 | (3) |
|
|
89 | (2) |
|
|
91 | (1) |
|
5.2.10 Shared storage mechanism |
|
|
92 | (1) |
|
5.2.11 Covop instruction execution mechanism |
|
|
93 | (1) |
|
5.2.12 Interrupt handling mechanism |
|
|
94 | (3) |
|
5.3 Sub-routine call handling mechanism |
|
|
97 | (3) |
|
5.4 Hardware design and Evaluation results |
|
|
100 | (6) |
|
5.4.1 DEP System pipeline control |
|
|
101 | (1) |
|
5.4.2 Hardware Design Result |
|
|
102 | (3) |
|
|
105 | (1) |
|
|
106 | (1) |
|
6 Low Power Embedded Core Architecture |
|
|
107 | (20) |
|
|
107 | (1) |
|
6.2 Produced Order Queue Computing Overview |
|
|
108 | (3) |
|
6.3 QC-2 Core Architecture |
|
|
111 | (8) |
|
6.3.1 Instruction Set Design Considerations |
|
|
111 | (1) |
|
6.3.2 Instruction Pipeline Structure |
|
|
112 | (2) |
|
6.3.3 Dynamic Operands Addresses Calculation |
|
|
114 | (1) |
|
6.3.4 QC-2 FPA Organization |
|
|
115 | (3) |
|
6.3.5 Circular Queue-Register Structure |
|
|
118 | (1) |
|
6.4 Synthesis of the QC-2 Core |
|
|
119 | (2) |
|
|
119 | (2) |
|
6.5 Results and Discussions |
|
|
121 | (4) |
|
6.5.1 Execution Speedup and Code Analysis |
|
|
121 | (1) |
|
|
122 | (2) |
|
6.5.3 Speed and Power Consumption Comparison with Synthesizable CPU Cores |
|
|
124 | (1) |
|
|
125 | (2) |
|
7 Reconfigurable Multicore Architectures |
|
|
127 | (32) |
|
|
127 | (3) |
|
7.1.1 Performance-Driven Approaches in Software |
|
|
128 | (1) |
|
7.1.2 Performance-Driven Approaches in Hardware |
|
|
129 | (1) |
|
7.1.3 Potential of FPGA SOCs |
|
|
130 | (1) |
|
7.2 Runtime Hardware Adaptation |
|
|
130 | (10) |
|
7.2.1 Processor Array Architectures |
|
|
131 | (4) |
|
7.2.2 Datapath Array Architectures |
|
|
135 | (3) |
|
7.2.3 Single Processor Architectures |
|
|
138 | (2) |
|
7.3 Summary of Hardware Adaptation |
|
|
140 | (2) |
|
7.4 Runtime Software Adaptation |
|
|
142 | (11) |
|
|
143 | (5) |
|
7.4.2 Dynamic Instruction Merging |
|
|
148 | (4) |
|
7.4.3 Summary of Software Adaptation |
|
|
152 | (1) |
|
7.5 Future Directions in Runtime Adaptation |
|
|
153 | (4) |
|
7.5.1 Future Hardware Adaptation |
|
|
153 | (2) |
|
7.5.2 Future Software Adaptation |
|
|
155 | (1) |
|
7.5.3 Future Reconfiguration Infrastructures in FPGA SOCs |
|
|
156 | (1) |
|
|
157 | (2) |
Bibliography |
|
159 | (18) |
Index |
|
177 | |