Preface |
|
xv | |
Contributors |
|
xvii | |
|
1 Low Power Multicore Processors for Embedded Systems |
|
|
1 | (60) |
|
|
1.1 Multicore Chip with Highly Efficient Cores |
|
|
1 | (4) |
|
1.2 SuperH™ RISC Engine Family (SH) Processor Cores |
|
|
5 | (4) |
|
1.2.1 History of SH Processor Cores |
|
|
5 | (2) |
|
1.2.2 Highly Efficient ISA |
|
|
7 | (1) |
|
1.2.3 Asymmetric In-Order Dual-Issue Superscalar Architecture |
|
|
8 | (1) |
|
1.3 SH-X: A Highly Efficient CPU Core |
|
|
9 | (11) |
|
1.3.1 Microarchitecture Selections |
|
|
11 | (1) |
|
1.3.2 Improved Superpipeline Structure |
|
|
12 | (1) |
|
1.3.3 Branch Prediction and Out-of-Order Branch Issue |
|
|
13 | (2) |
|
1.3.4 Low Power Technologies |
|
|
15 | (2) |
|
1.3.5 Performance and Efficiency Evaluations |
|
|
17 | (3) |
|
1.4 SH-X FPU: A Highly Efficient FPU |
|
|
20 | (13) |
|
1.4.1 FPU Architecture of SH Processors |
|
|
21 | (3) |
|
1.4.2 Implementation of SH-X FPU |
|
|
24 | (5) |
|
1.4.3 Performance Evaluations with 3D Graphics Benchmark |
|
|
29 | (4) |
|
1.5 SH-X2: Frequency and Efficiency Enhanced Core |
|
|
33 | (1) |
|
1.5.1 Frequency Enhancement |
|
|
33 | (1) |
|
1.5.2 Low Power Technologies |
|
|
34 | (1) |
|
1.6 SH-X3: Multicore Architecture Extension |
|
|
34 | (13) |
|
1.6.1 SH-X3 Core Specifications |
|
|
34 | (1) |
|
1.6.2 Symmetric and Asymmetric Multiprocessor Support |
|
|
35 | (1) |
|
1.6.3 Core Snoop Sequence Optimization |
|
|
36 | (3) |
|
1.6.4 Dynamic Power Management |
|
|
39 | (1) |
|
1.6.5 RP-1 Prototype Chip |
|
|
40 | (1) |
|
1.6.5.1 RP-1 Specifications |
|
|
40 | (1) |
|
1.6.5.2 Chip Integration and Evaluations |
|
|
41 | (2) |
|
1.6.6 RP-2 Prototype Chip |
|
|
43 | (1) |
|
1.6.6.1 RP-2 Specifications |
|
|
43 | (1) |
|
1.6.6.2 Power Domain and Partial Power Off |
|
|
43 | (2) |
|
1.6.6.3 Synchronization Support Hardware |
|
|
45 | (2) |
|
1.6.6.4 Chip Integration and Evaluations |
|
|
47 | (1) |
|
1.7 SH-X4: ISA and Address Space Extension |
|
|
47 | (14) |
|
1.7.1 SH-X4 Core Specifications |
|
|
48 | (1) |
|
1.7.2 Efficient ISA Extension |
|
|
49 | (3) |
|
1.7.3 Address Space Extension |
|
|
52 | (1) |
|
|
53 | (1) |
|
1.7.5 RP-X Prototype Chip |
|
|
54 | (1) |
|
1.7.5.1 RP-X Specifications |
|
|
54 | (2) |
|
1.7.5.2 Chip Integration and Evaluations |
|
|
56 | (1) |
|
|
57 | (4) |
|
2 Special-Purpose Hardware for Computational Biology |
|
|
61 | (24) |
|
|
2.1 Molecular Dynamics Simulations on Graphics Processing Units |
|
|
62 | (10) |
|
2.1.1 Molecular Mechanics Force Fields |
|
|
63 | (1) |
|
|
63 | (1) |
|
|
63 | (1) |
|
|
63 | (1) |
|
2.1.1.4 Van der Waal's Term |
|
|
64 | (1) |
|
|
65 | (1) |
|
2.1.2 Graphics Processing Units for MD Simulations |
|
|
65 | (1) |
|
2.1.2.1 GPU Architecture Case Study: NVIDIA Fermi |
|
|
66 | (3) |
|
2.1.2.2 Force Computation on GPUs |
|
|
69 | (3) |
|
2.2 Special-Purpose Hardware and Network Topologies for MD Simulations |
|
|
72 | (5) |
|
2.2.1 High-Throughput Interaction Subsystem |
|
|
72 | (2) |
|
2.2.1.1 Pairwise Point Interaction Modules |
|
|
74 | (1) |
|
2.2.1.2 Particle Distribution Network |
|
|
74 | (1) |
|
2.2.2 Hardware Description of the Flexible Subsystem |
|
|
75 | (2) |
|
2.2.3 Performance and Conclusions |
|
|
77 | (1) |
|
2.3 Quantum MC Applications on Field-Programmable Gate Arrays |
|
|
77 | (5) |
|
2.3.1 Energy Computation and WF Kernels |
|
|
78 | (1) |
|
2.3.2 Hardware Architecture |
|
|
79 | (1) |
|
|
79 | (1) |
|
2.3.3 PE and WF Computation Kernels |
|
|
80 | (1) |
|
2.3.3.1 Distance Computation Unit (DCU) |
|
|
81 | (1) |
|
2.3.3.2 Calculate Function Unit and Accumulate Function Kernels |
|
|
81 | (1) |
|
2.4 Conclusions and Future Directions |
|
|
82 | (3) |
|
|
82 | (3) |
|
|
85 | (22) |
|
|
|
|
85 | (1) |
|
|
86 | (2) |
|
3.3 Graphics Modules Design |
|
|
88 | (7) |
|
|
88 | (1) |
|
|
89 | (1) |
|
3.3.2.1 Geometry Transformation |
|
|
90 | (1) |
|
3.3.2.2 Unified Multifunction Unit |
|
|
90 | (1) |
|
|
91 | (1) |
|
|
92 | (3) |
|
3.4 System Power Management |
|
|
95 | (4) |
|
3.4.1 Multiple Power-Domain Management |
|
|
95 | (3) |
|
3.4.2 Power Management Unit |
|
|
98 | (1) |
|
3.5 Implementation Results |
|
|
99 | (3) |
|
3.5.1 Chip Implementation |
|
|
99 | (1) |
|
|
100 | (2) |
|
|
102 | (5) |
|
|
105 | (2) |
|
4 Low-Cost VLSI Architecture for Random Block-Based Access of Pixels in Modern Image Sensors |
|
|
107 | (20) |
|
|
|
|
107 | (1) |
|
|
108 | (1) |
|
4.3 The iBRIDGE-BB Architecture |
|
|
109 | (7) |
|
4.3.1 Configuring the iBRIDGE-BB |
|
|
110 | (1) |
|
4.3.2 Operation of the iBRIDGE-BB |
|
|
110 | (2) |
|
4.3.3 Description of Internal Blocks |
|
|
112 | (1) |
|
|
112 | (1) |
|
|
112 | (2) |
|
4.3.3.3 Memory Addressing and Control |
|
|
114 | (1) |
|
4.3.3.4 Random Access Memory (RAM) |
|
|
114 | (1) |
|
4.3.3.5 Column and Row Calculator |
|
|
115 | (1) |
|
4.3.3.6 Physical Memory Address Generator |
|
|
115 | (1) |
|
|
116 | (1) |
|
4.4 Hardware Implementation |
|
|
116 | (7) |
|
4.4.1 Verification in Field-Programmable Gate Array |
|
|
116 | (2) |
|
4.4.2 Application in Image Compression |
|
|
118 | (3) |
|
4.4.3 Application-Specific Integrated Circuit (ASIC) Synthesis and Performance Analysis |
|
|
121 | (2) |
|
|
123 | (4) |
|
|
123 | (2) |
|
|
125 | (2) |
|
5 Embedded Computing Systems on FPGAs |
|
|
127 | (12) |
|
|
|
128 | (1) |
|
5.2 FPGA Configuration Technology |
|
|
129 | (4) |
|
5.2.1 Traditional SRAM-Based FPGAs |
|
|
130 | (1) |
|
|
130 | (1) |
|
5.2.1.2 Challenges for SRAM-Based Embedded Computing System |
|
|
131 | (1) |
|
5.2.1.3 Other SRAM-Based FPGAs |
|
|
132 | (1) |
|
|
133 | (1) |
|
|
133 | (2) |
|
5.3.1 Synthesis and Design Tools |
|
|
134 | (1) |
|
|
135 | (1) |
|
5.4 Final Summary of Challenges and Opportunities for Embedded Computing Design on FPGAs |
|
|
135 | (4) |
|
|
136 | (3) |
|
6 FPGA-Based Emulation Support for Design Space Exploration |
|
|
139 | (30) |
|
|
|
|
|
139 | (1) |
|
|
140 | (4) |
|
6.2.1 FPGA-Only Emulation Techniques |
|
|
141 | (1) |
|
6.2.2 FPGA-Based Cosimulation Techniques |
|
|
142 | (2) |
|
6.2.3 FPGA-Based Emulation for DSE Purposes: A Limiting Factor |
|
|
144 | (1) |
|
6.3 A Tool for Energy-Aware FPGA-Based Emulation: The MADNESS Project Experience |
|
|
144 | (3) |
|
6.3.1 Models for Prospective ASIC Implementation |
|
|
146 | (1) |
|
6.3.2 Performance Extraction |
|
|
147 | (1) |
|
6.4 Enabling FPGA-Based DSE: Runtime-Reconfigurable Emulators |
|
|
147 | (14) |
|
6.4.1 Enabling Fast NoC Topology Selection |
|
|
148 | (1) |
|
6.4.1.1 WCT Definition Algorithm |
|
|
148 | (3) |
|
6.4.1.2 The Extended Topology Builder |
|
|
151 | (1) |
|
6.4.1.3 Hardware Support for Runtime Reconfiguration |
|
|
152 | (1) |
|
6.4.1.4 Software Support for Runtime Reconfiguration |
|
|
153 | (1) |
|
6.4.2 Enabling Fast ASIP Configuration Selection |
|
|
154 | (1) |
|
6.4.2.1 The Reference Design Flow |
|
|
155 | (1) |
|
6.4.2.2 The Extended Design Flow |
|
|
156 | (1) |
|
6.4.2.3 The WCC Synthesis Algorithm |
|
|
157 | (1) |
|
6.4.2.4 Hardware Support for Runtime Reconfiguration |
|
|
158 | (3) |
|
6.4.2.5 Software Support for Runtime Reconfiguration |
|
|
161 | (1) |
|
|
161 | (8) |
|
6.5.1 Hardware Overhead Due to Runtime Configurability |
|
|
164 | (2) |
|
|
166 | (3) |
|
7 FPGA Coprocessing Solution for Real-Time Protein Identification Using Tandem Mass Spectrometry |
|
|
169 | (16) |
|
|
|
|
|
169 | (2) |
|
7.2 Protein Identification by Sequence Database Searching Using MS/MS Data |
|
|
171 | (3) |
|
7.3 Reconfigurable Computing Platform |
|
|
174 | (2) |
|
7.4 FPGA Implementation of the MS/MS Search Engine |
|
|
176 | (4) |
|
7.4.1 Protein Database Encoding |
|
|
176 | (1) |
|
7.4.2 Overview of the Database Search Engine |
|
|
177 | (1) |
|
7.4.3 Search Processor Architecture |
|
|
178 | (2) |
|
|
180 | (1) |
|
|
180 | (5) |
|
|
181 | (1) |
|
|
181 | (4) |
|
8 Real-Time Configurable Phase-Coherent Pipelines |
|
|
185 | (26) |
|
|
|
8.1 Introduction and Purpose |
|
|
185 | (3) |
|
8.1.1 Efficiency of Pipelined Computation |
|
|
185 | (1) |
|
8.1.2 Direct Datapath (Systolic Array) |
|
|
186 | (1) |
|
8.1.3 Custom Soft Processors |
|
|
186 | (1) |
|
8.1.4 Implementation Framework (e.g., C to VHDL) |
|
|
186 | (1) |
|
|
187 | (1) |
|
8.1.6 Pipeline Data-Feeding Considerations |
|
|
187 | (1) |
|
8.1.7 Purpose of Configurable Phase-Coherent Pipeline Approach |
|
|
187 | (1) |
|
8.2 History and Related Methods |
|
|
188 | (3) |
|
8.2.1 Issues in Tracking Data through Pipelines |
|
|
188 | (1) |
|
8.2.2 Decentralized Tag-Based Control |
|
|
189 | (1) |
|
8.2.3 Tags in Instruction Pipelines |
|
|
189 | (1) |
|
8.2.4 Similar Techniques in Nonpipelined Applications |
|
|
190 | (1) |
|
8.2.5 Development-Friendly Approach |
|
|
190 | (1) |
|
8.3 Implementation Framework |
|
|
191 | (13) |
|
8.3.1 Dynamically Configurable Pipeline |
|
|
191 | (1) |
|
8.3.1.1 Catching up with Synthesis of In-Line Operations |
|
|
192 | (1) |
|
8.3.1.2 Reconfiguration Example with Sparse Data Input |
|
|
193 | (2) |
|
|
195 | (1) |
|
|
195 | (1) |
|
8.3.2.2 Data Entity Record Types |
|
|
195 | (1) |
|
|
196 | (1) |
|
|
197 | (1) |
|
|
197 | (1) |
|
|
198 | (1) |
|
|
198 | (1) |
|
|
198 | (1) |
|
8.3.2.9 Reusing Functional Units |
|
|
199 | (1) |
|
8.3.2.10 Tag-Controlled Single-Adder Implementation |
|
|
199 | (1) |
|
|
200 | (1) |
|
8.3.3 Phase-Coherent Resource Allocation |
|
|
200 | (1) |
|
8.3.3.1 Determining the Reuse Interval |
|
|
201 | (1) |
|
8.3.3.2 Buffering Burst Data |
|
|
201 | (1) |
|
8.3.3.3 Allocation to Phases |
|
|
202 | (1) |
|
|
202 | (1) |
|
8.3.3.5 Allocation Algorithm |
|
|
202 | (1) |
|
|
203 | (1) |
|
8.3.3.7 External Interface Units |
|
|
204 | (1) |
|
8.4 Prototype Implementation |
|
|
204 | (3) |
|
8.4.1 Coordinate Conversion and Regridding |
|
|
205 | (1) |
|
|
206 | (1) |
|
8.4.3 Experimental Results |
|
|
206 | (1) |
|
8.5 Assessment Compared with Related Methods |
|
|
207 | (4) |
|
|
208 | (3) |
|
9 Low Overhead Radiation Hardening Techniques for Embedded Architectures |
|
|
211 | (28) |
|
|
|
|
|
211 | (2) |
|
9.2 Recently Proposed SEU Tolerance Techniques |
|
|
213 | (10) |
|
9.2.1 Radiation Hardened Latch Design |
|
|
214 | (1) |
|
9.2.2 Radiation-Hardened Circuit Design Using Differential Cascode Voltage Swing Logic |
|
|
215 | (3) |
|
9.2.3 SEU Detection and Correction Using Decoupled Ground Bus |
|
|
218 | (5) |
|
9.3 Radiation-Hardened Reconfigurable Array with Instruction Rollback |
|
|
223 | (11) |
|
9.3.1 Overview of the MORA Architecture |
|
|
223 | (4) |
|
9.3.2 Single-Cycle Instruction Rollback |
|
|
227 | (3) |
|
9.3.3 MORA RC with Rollback Mechanism |
|
|
230 | (2) |
|
9.3.4 Impact of the Rollback Scheme on Throughput of the Architecture |
|
|
232 | (2) |
|
9.3.5 Comparison of Proposed Schemes with Competing SEU Hardening Schemes |
|
|
234 | (1) |
|
|
234 | (5) |
|
|
236 | (3) |
|
10 Hybrid Partially Adaptive Fault-Tolerant Routing for 3D Networks-on-Chip |
|
|
239 | (20) |
|
|
|
|
239 | (1) |
|
|
240 | (2) |
|
10.3 Proposed 4NP-First Routing Scheme |
|
|
242 | (8) |
|
|
242 | (1) |
|
10.3.2 4NP-First Overview |
|
|
243 | (1) |
|
10.3.3 Turn Restriction Checks |
|
|
244 | (2) |
|
10.3.4 Prioritized Valid Path Selection |
|
|
246 | (1) |
|
10.3.4.1 Prioritized Valid Path Selection for Case 1 |
|
|
246 | (1) |
|
10.3.4.2 Prioritized Valid Path Selection for Case 2 |
|
|
246 | (2) |
|
10.3.5 4NP-First Router Implementation |
|
|
248 | (2) |
|
|
250 | (5) |
|
10.4.1 Experimental Setup |
|
|
250 | (1) |
|
10.4.2 Comparison with Existing FT Routing Schemes |
|
|
251 | (4) |
|
|
255 | (4) |
|
|
256 | (3) |
|
11 Interoperability in Electronic Systems |
|
|
259 | (14) |
|
|
|
259 | (2) |
|
11.2 The Basis for Interoperability: The OSI Model |
|
|
261 | (2) |
|
|
263 | (3) |
|
|
266 | (2) |
|
11.5 Partitioning the System |
|
|
268 | (2) |
|
11.6 Examples of Interoperable Systems |
|
|
270 | (3) |
|
12 Software Modeling Approaches for Presilicon System Performance Analysis |
|
|
273 | (18) |
|
|
|
|
273 | (2) |
|
|
275 | (8) |
|
12.2.1 High-Level Software Description |
|
|
276 | (2) |
|
12.2.2 Transaction Trace File |
|
|
278 | (4) |
|
12.2.3 Stochastic Traffic Generator |
|
|
282 | (1) |
|
|
283 | (5) |
|
12.3.1 Audio Power Consumption: High-Level Software Description |
|
|
284 | (1) |
|
12.3.2 Cache Analysis: Transaction Trace File |
|
|
285 | (2) |
|
12.3.3 GPU Traffic Characterization: Stochastic Traffic Generator |
|
|
287 | (1) |
|
|
288 | (3) |
|
|
288 | (3) |
|
13 Advanced Encryption Standard (AES) Implementation in Embedded Systems |
|
|
291 | (28) |
|
|
|
|
|
291 | (1) |
|
|
292 | (1) |
|
13.2.1 Addition in Finite Field |
|
|
293 | (1) |
|
13.2.2 Multiplication in Finite Field |
|
|
293 | (1) |
|
|
293 | (7) |
|
13.3.1 Shift Rows/Inverse Shift Rows |
|
|
295 | (1) |
|
13.3.2 Byte Substitution and Inverse Byte Substitution |
|
|
295 | (1) |
|
13.3.2.1 Multiplicative Inverse Calculation |
|
|
296 | (1) |
|
13.3.2.2 Affine Transformation |
|
|
297 | (1) |
|
13.3.3 Mix Columns/Inverse Mix Columns Steps |
|
|
298 | (1) |
|
13.3.4 Key Expansion and Add Round Key Step |
|
|
299 | (1) |
|
13.4 Hardware Implementations for AES |
|
|
300 | (6) |
|
13.4.1 Composite Field Arithmetic S-BOX |
|
|
302 | (3) |
|
13.4.2 Very High Speed AES Design |
|
|
305 | (1) |
|
13.5 High-Speed AES Encryptor with Efficient Merging Techniques |
|
|
306 | (9) |
|
13.5.1 The Integrated-BOX |
|
|
306 | (4) |
|
13.5.2 Key Expansion Unit |
|
|
310 | (1) |
|
13.5.3 The AES Encryptor with the Merging Technique |
|
|
311 | (1) |
|
13.5.4 Results and Comparison |
|
|
312 | (1) |
|
13.5.4.1 Subpipelining Simulation Results |
|
|
312 | (2) |
|
13.5.4.2 Comparison with Previous Designs |
|
|
314 | (1) |
|
|
315 | (4) |
|
|
315 | (4) |
|
14 Reconfigurable Architecture for Cryptography over Binary Finite Fields |
|
|
319 | (44) |
|
|
|
|
|
319 | (1) |
|
|
320 | (13) |
|
14.2.1 Elliptic Curve Cryptography |
|
|
320 | (1) |
|
14.2.1.1 Finite Field Arithmetic |
|
|
321 | (4) |
|
14.2.1.2 Elliptic Curve Arithmetic |
|
|
325 | (4) |
|
14.2.2 Advanced Encryption Standard |
|
|
329 | (4) |
|
14.2.3 Random Number Generators |
|
|
333 | (1) |
|
14.3 Reconfigurable Processor |
|
|
333 | (17) |
|
14.3.1 Processing Unit for Elliptic Curve Cryptography |
|
|
335 | (1) |
|
|
336 | (1) |
|
|
336 | (1) |
|
14.3.1.3 Multiplication Logic |
|
|
336 | (1) |
|
14.3.1.4 Reduction and Squaring Logic |
|
|
337 | (5) |
|
14.3.2 Processing Unit for the AES |
|
|
342 | (2) |
|
14.3.3 Random Number Generator |
|
|
344 | (3) |
|
14.3.4 Microinstructions and Access Arbiter |
|
|
347 | (3) |
|
|
350 | (8) |
|
14.4.1 Individual Components |
|
|
351 | (5) |
|
14.4.2 Complete Processor Evaluation |
|
|
356 | (2) |
|
|
358 | (5) |
|
|
359 | (4) |
Index |
|
363 | |