|
|
1 | (16) |
|
1.1 Trends in the Semiconductor Industry |
|
|
2 | (2) |
|
1.2 Energy Considerations in Power-Limited and Battery-Powered Systems |
|
|
4 | (5) |
|
1.3 Wide Power-Performance Tradeoff and System Requirements |
|
|
9 | (3) |
|
1.3.1 Importance of Wide Power-Performance Tradeoff in Duty-Cycled and Always-on Systems |
|
|
9 | (2) |
|
1.3.2 Wide Voltage Scaling |
|
|
11 | (1) |
|
1.4 Challenges in Wide Voltage Scaling and Motivation |
|
|
12 | (2) |
|
|
14 | (1) |
|
|
14 | (3) |
|
2 Reconfigurable Microarchitecures Down to Pipestage and Memory Bank Level |
|
|
17 | (38) |
|
2.1 Pipestage as Basic Building Block of Synchronous Microarchitectures |
|
|
18 | (6) |
|
2.1.1 Background on Pipeline Stages and Timing Constraints |
|
|
18 | (3) |
|
2.1.2 Pipelining for Microarchitecture Speed-Up |
|
|
21 | (3) |
|
2.2 Elementary Microarchitectures |
|
|
24 | (2) |
|
2.3 Impact of Logic Depth on Energy |
|
|
26 | (2) |
|
2.4 Dynamically Adaptable Pipelines |
|
|
28 | (8) |
|
2.4.1 Wide Dynamic Voltage Frequency Scaling |
|
|
28 | (2) |
|
2.4.2 Dynamically Adaptable Pipeline |
|
|
30 | (5) |
|
2.4.3 Run-Time Pipeline Adaptation via Augmented DVFS Look-Up Table |
|
|
35 | (1) |
|
2.5 Microprocessor Microarchitectures: Opportunities and Challenges Under Reconfiguration |
|
|
36 | (4) |
|
2.5.1 Wide DVFS in Microprocessors and Considerations at the Application Level |
|
|
36 | (1) |
|
2.5.2 Control Flow and Hazards in Microprocessor Microarchitectures with Different Pipedepths |
|
|
37 | (2) |
|
2.5.3 Limitations of Re-pipelining in Existing Microprocessor Architectures |
|
|
39 | (1) |
|
2.6 Enabling Microarchitectural Reconfiguration in Microprocessors |
|
|
40 | (2) |
|
2.7 Dynamically Adaptable Time-Interleaved Microprocessors |
|
|
42 | (5) |
|
2.8 Static Random Access Memory (SRAM) |
|
|
47 | (3) |
|
2.9 Methods for SRAM Speed-Up via Reconfigurable Array Organization |
|
|
50 | (1) |
|
|
51 | (1) |
|
|
51 | (4) |
|
3 Automated Design Flows and Run-Time Optimization for Reconfigurable Microarchitecures |
|
|
55 | (38) |
|
3.1 Prior Art in Reconfigurable Microarchitectures |
|
|
56 | (2) |
|
3.2 Overview of Systematic Methodologies and Design Flows for Microarchitectural Reconfiguration |
|
|
58 | (1) |
|
3.3 Automated Design Flow for Pipeline-Level Reconfiguration: Re-pipelining and Retiming (Steps 1-2) |
|
|
59 | (4) |
|
3.3.1 Re-pipelining (Step 1) |
|
|
59 | (2) |
|
|
61 | (2) |
|
3.4 Automated Design Flow for Pipeline-Level Reconfiguration: Register Identification (Step 3, Phase I) |
|
|
63 | (3) |
|
3.4.1 Netlist to Skeleton Graph (Step 3.1, Phase I) |
|
|
63 | (2) |
|
3.4.2 Weighted Skeleton Graph (Step 3.2, Phase I) |
|
|
65 | (1) |
|
3.5 Automated Design Flow for Pipeline-Level Reconfiguration: Register Identification in Linear Pipelines (Step 3, Phase II) |
|
|
66 | (1) |
|
3.6 Automated Design Flow for Pipeline-Level Reconfiguration: Register Identification in Non-linear Pipelines (Step 3, Phase II). |
|
|
67 | (8) |
|
3.6.1 Graph Feedforward Cutsets and Properties |
|
|
68 | (2) |
|
3.6.2 Cutset Identification (Step 3.3B) |
|
|
70 | (3) |
|
3.6.3 Cutset-to-Pipeline Mapping (Step 3.3C) |
|
|
73 | (2) |
|
3.7 Automated Design Flow for Pipeline-Level Reconfiguration: Bypassable Registers Choice (Step 3, Phase III) |
|
|
75 | (1) |
|
3.8 Automated Design Flow for Pipeline-Level Reconfiguration: Bypassable Register Replacement (Step 4) |
|
|
76 | (2) |
|
3.9 Automated Design Flow Extension to Thread-Level Time-Interleaved Reconfiguration |
|
|
78 | (2) |
|
3.10 SRAM Reconfiguration at the Bank Level |
|
|
80 | (9) |
|
3.10.1 Design Considerations on Memory Reconfiguration |
|
|
80 | (1) |
|
3.10.2 Background on Low-Power and Reconfigurable Memories |
|
|
81 | (4) |
|
3.10.3 Row Aggregation Technique and Reconfiguration for Selective Performance Enhancement Beyond Nominal Voltage |
|
|
85 | (1) |
|
3.10.4 Embedding Reconfigurable Row Aggregation Through Minor Modifications of Existing Compiled Memories |
|
|
85 | (4) |
|
|
89 | (1) |
|
|
90 | (3) |
|
4 Case Studies of Reconfigurable Microarchitectures: Accelerators, Microprocessors, and Memories |
|
|
93 | (22) |
|
4.1 Fast Fourier Transform (FFT) Accelerator |
|
|
94 | (11) |
|
4.1.1 Microarchitecture and Design of Its Dynamically Adaptable Pipeline Counterpart |
|
|
94 | (2) |
|
4.1.2 Measurement Results on a Single Die |
|
|
96 | (2) |
|
4.1.3 Impact of Variations and Comparison of Measurement Results Across Multiple Dice |
|
|
98 | (1) |
|
4.1.4 Overhead Due to Microarchitecture Reconfiguration |
|
|
99 | (6) |
|
4.2 Finite Impulse Response Filter (FIR) and Fixed-Point Multiplier |
|
|
105 | (2) |
|
4.3 Reconfigurable Thread-Level in ARM Cortex Microcontroller and SRAM Row Aggregation |
|
|
107 | (5) |
|
4.3.1 Reconfigurable Memory with Selective Row Aggregation |
|
|
109 | (2) |
|
4.3.2 ARM Cortex-MO Microcontroller |
|
|
111 | (1) |
|
|
112 | (1) |
|
|
113 | (2) |
|
5 Reconfigurable Clock Networks, Automated Design Flows, Run-Time Optimization, and Case Study |
|
|
115 | (30) |
|
5.1 Impact of Clock Network Topology on Clock Skew, Performance, and Hold Margin under Wide Voltage Scaling |
|
|
116 | (6) |
|
5.1.1 Impact of Clock Skew on Performance, Robustness Against Hold Violations, and Energy Efficiency |
|
|
116 | (2) |
|
5.1.2 Impact of Supply Voltage on Clock Network Optimization and Clock Skew in Conventional Static Clock Networks |
|
|
118 | (3) |
|
5.1.3 Prior Art in Clock Networks for Low- or Wide-Voltage Operation |
|
|
121 | (1) |
|
5.2 Reconfigurable Clock Networks: Principles and Fundamentals |
|
|
122 | (2) |
|
5.3 Bypassable Repeaters and Other Clock Cells |
|
|
124 | (4) |
|
5.4 Gate-Boostable Clock Root Repeater |
|
|
128 | (1) |
|
5.5 Automated Design Flows for Reconfigurable Clock Networks and Integration with DVFS |
|
|
129 | (5) |
|
5.5.1 Automated Clock Tree Design and Level Balance Principle |
|
|
129 | (3) |
|
5.5.2 Optimal Configuration Selection and Integration with DVFS |
|
|
132 | (2) |
|
5.6 Case Study: Reconfigurable Clock Network in FFT Accelerator |
|
|
134 | (8) |
|
|
134 | (4) |
|
5.6.2 Clock Skew Measurement Results |
|
|
138 | (3) |
|
5.6.3 Improvements in Performance, Robustness, and Energy Offered by Reconfigurable Clock Networks |
|
|
141 | (1) |
|
|
142 | (1) |
|
|
143 | (2) |
|
|
145 | (4) |
|
|
148 | (1) |
Appendix |
|
149 | (14) |
Index |
|
163 | |