|
|
1 | (4) |
|
|
1 | (1) |
|
1.2 Contributions and Structure of This Thesis |
|
|
2 | (3) |
|
|
4 | (1) |
|
|
5 | (18) |
|
|
6 | (8) |
|
|
6 | (2) |
|
2.1.2 Generation Through Automation |
|
|
8 | (1) |
|
2.1.3 Steps of High Level Synthesis |
|
|
9 | (2) |
|
2.1.4 HLS: A Brief Retrospection |
|
|
11 | (1) |
|
2.1.5 The Current Generation of HLS |
|
|
12 | (2) |
|
2.2 High Level Synthesis for Cryptographic Workloads |
|
|
14 | (1) |
|
2.3 ASIC Design Flow Setup |
|
|
15 | (4) |
|
2.3.1 The Standard Cell Digital Design How |
|
|
16 | (1) |
|
2.3.2 ADL Based Design Flow |
|
|
17 | (1) |
|
|
17 | (2) |
|
2.4 Experimental Setup for CPU-GPGPUs Environment |
|
|
19 | (1) |
|
|
19 | (4) |
|
|
19 | (4) |
|
|
23 | (28) |
|
3.1 Berkeley Dwarfs for Parallel Computing |
|
|
23 | (1) |
|
3.2 Cryptology Background |
|
|
24 | (4) |
|
|
25 | (1) |
|
|
26 | (1) |
|
|
27 | (1) |
|
3.3 Block Ciphers: Major Ingredient of Symmetric Key Cryptography |
|
|
28 | (3) |
|
3.3.1 Transformations Under Modes Of Operation |
|
|
28 | (2) |
|
3.3.2 Basic Building Blocks for Symmetric Key Cryptography |
|
|
30 | (1) |
|
3.4 Cipher Algorithmic Configuration Space |
|
|
31 | (15) |
|
|
31 | (6) |
|
|
37 | (8) |
|
|
45 | (1) |
|
|
46 | (5) |
|
|
46 | (5) |
|
4 High Level Synthesis for Symmetric Key Cryptography |
|
|
51 | (40) |
|
4.1 CRYKET (CRYptographic Kernels Toolkit) |
|
|
52 | (1) |
|
|
53 | (18) |
|
4.2.1 Design Specification Compilation |
|
|
53 | (1) |
|
4.2.2 Specification Validation and Formal Model Creation |
|
|
54 | (3) |
|
4.2.3 Software Generation Engine |
|
|
57 | (1) |
|
4.2.4 Hardware Generation Engine |
|
|
58 | (5) |
|
4.2.5 Results and Analysis: Software Efficiency |
|
|
63 | (1) |
|
4.2.6 Results and Analysis: Hardware Efficiency |
|
|
64 | (7) |
|
|
71 | (18) |
|
4.3.1 Design Specification Compilation |
|
|
72 | (1) |
|
4.3.2 Specification Validation and Formal Model Creation |
|
|
73 | (3) |
|
4.3.3 Software Generation Engine |
|
|
76 | (2) |
|
4.3.4 Hardware Generation Engine |
|
|
78 | (4) |
|
|
82 | (4) |
|
4.3.6 Comparison with Manual Implementations |
|
|
86 | (3) |
|
|
89 | (2) |
|
|
89 | (2) |
|
5 Manual Optimizations for Efficient Designs |
|
|
91 | (36) |
|
5.1 Optimization Strategies |
|
|
91 | (1) |
|
5.1.1 Memory Bank Structure Optimizations |
|
|
91 | (1) |
|
5.1.2 Unification of Multiple Cryptographic Proposals |
|
|
91 | (1) |
|
5.2 Memory Bank Structure Optimizations |
|
|
92 | (15) |
|
5.2.1 Reviewing Known Techniques |
|
|
93 | (1) |
|
5.2.2 Optimized Memory Utilization for HC-128 |
|
|
93 | (1) |
|
5.2.3 Design Space Exploration of HC-128 Accelerator |
|
|
94 | (4) |
|
5.2.4 State Split Optimizations for HC-128 |
|
|
98 | (6) |
|
5.2.5 Performance Evaluation |
|
|
104 | (3) |
|
5.3 Integrated Implementation of Multiple Cryptographic Primitives |
|
|
107 | (16) |
|
|
107 | (1) |
|
|
107 | (1) |
|
5.3.3 Contribution: HiPAcc-LTE-Integrated Accelerator for SNOW 3G and ZUC |
|
|
107 | (1) |
|
5.3.4 Structural Comparison |
|
|
108 | (2) |
|
5.3.5 Integrating the Main LFSR |
|
|
110 | (1) |
|
5.3.6 Integrating the FSM |
|
|
111 | (4) |
|
5.3.7 ASIC Implementation of HiPAcc-LTE |
|
|
115 | (8) |
|
|
123 | (4) |
|
|
124 | (3) |
|
|
127 | (42) |
|
|
127 | (1) |
|
|
128 | (1) |
|
6.3 CoARX: A Coprocessor for ARX-Based Cryptographic Algorithms |
|
|
129 | (14) |
|
|
129 | (3) |
|
6.3.2 Design Space Exploration |
|
|
132 | (4) |
|
6.3.3 Mapping of the ARX Algorithms |
|
|
136 | (4) |
|
6.3.4 Implementation and Benchmarking |
|
|
140 | (3) |
|
6.4 RC4-AccSuite: A Hardware Acceleration Suite for RC4-like Stream Ciphers |
|
|
143 | (22) |
|
6.4.1 RC4 Stream Cipher Algorithm |
|
|
144 | (1) |
|
|
144 | (2) |
|
|
146 | (1) |
|
6.4.4 High-Level Architecture of RC4-AccSuite |
|
|
146 | (2) |
|
6.4.5 Performance Enhancement by Memory Replication Technique |
|
|
148 | (4) |
|
6.4.6 Resource Economization in RC4-AccSuite |
|
|
152 | (6) |
|
6.4.7 Implementation and Benchmarking |
|
|
158 | (7) |
|
|
165 | (4) |
|
|
166 | (3) |
|
|
169 | (26) |
|
|
169 | (1) |
|
|
170 | (1) |
|
7.3 The Compute Unified Device Architecture (CUDA) Overview |
|
|
170 | (2) |
|
7.3.1 Kernel Execution Model |
|
|
171 | (1) |
|
|
172 | (1) |
|
7.4 Block Ciphers Performance Acceleration on GPUs |
|
|
172 | (1) |
|
7.5 Mapping Salsa20 Stream Cipher on GPUs |
|
|
173 | (11) |
|
7.5.1 Analyzing Parallelism Opportunities of Salsa20 |
|
|
173 | (1) |
|
7.5.2 Batch Processing Framework |
|
|
174 | (2) |
|
7.5.3 CUDA Coding Guidelines |
|
|
176 | (1) |
|
7.5.4 Optimization for Salsa20 |
|
|
177 | (1) |
|
7.5.5 Autotuning for Throughput Optimization |
|
|
177 | (3) |
|
7.5.6 Results and Analysis |
|
|
180 | (4) |
|
7.6 Mapping HC-128 Stream Cipher on GPUs |
|
|
184 | (10) |
|
7.6.1 Hurdles in Parallelization of HC Ciphers |
|
|
185 | (2) |
|
7.6.2 Optimization Strategies |
|
|
187 | (3) |
|
7.6.3 Experimental Analysis |
|
|
190 | (4) |
|
|
194 | (1) |
|
|
194 | (1) |
|
8 Efficient Cryptanalytic Hardware |
|
|
195 | (20) |
|
|
195 | (1) |
|
|
196 | (1) |
|
8.2.1 Attacks Against SHA-1 |
|
|
196 | (1) |
|
8.2.2 Reported Hardware Attacks |
|
|
197 | (1) |
|
|
197 | (10) |
|
|
197 | (2) |
|
8.3.2 Kraken Architecture |
|
|
199 | (8) |
|
8.4 Performance Analysis and Comparisons |
|
|
207 | (5) |
|
|
207 | (1) |
|
|
207 | (1) |
|
8.4.3 Cost-Performance Approximation with Memories |
|
|
208 | (2) |
|
8.4.4 Power Consumption Aggregates |
|
|
210 | (1) |
|
8.4.5 Mapping Kraken on FPGAs |
|
|
210 | (1) |
|
8.4.6 Comparison with Other Implementations |
|
|
210 | (2) |
|
|
212 | (3) |
|
|
213 | (2) |
|
9 Conclusion and Future Work |
|
|
215 | (4) |
|
|
216 | (3) |
Appendix A RunFein Generated AES-128 Code |
|
219 | (2) |
Appendix B RunFein GUI Snapshots |
|
221 | (2) |
Appendix C Description of Some ARX Based Cryptographic Functions |
|
223 | (12) |
Appendix D Overview of SNOW 3G and ZUC Stream Ciphers |
|
235 | |