Acronyms |
|
ix | |
Preface |
|
xi | |
About the authors |
|
xv | |
|
|
1 | (76) |
|
|
3 | (16) |
|
|
3 | (11) |
|
1.1.1 Memory wall and powerwall |
|
|
3 | (2) |
|
1.1.2 Semiconductor memory |
|
|
5 | (6) |
|
1.1.3 Nonvolatile IMC architecture |
|
|
11 | (3) |
|
1.2 Challenges and contributions |
|
|
14 | (2) |
|
|
16 | (3) |
|
2 The need of in-memory computing |
|
|
19 | (26) |
|
|
19 | (1) |
|
2.2 Neuromorphic computing devices |
|
|
20 | (4) |
|
2.2.1 Resistive random-access memory |
|
|
21 | (1) |
|
2.2.2 Spin-transfer-torque magnetic random-access memory |
|
|
22 | (1) |
|
2.2.3 Phase change memory |
|
|
23 | (1) |
|
2.3 Characteristics of NVM devices for neuromorphic computing |
|
|
24 | (1) |
|
2.4 IMC architectures for machine learning |
|
|
25 | (15) |
|
2.4.1 Operating principles of IMC architectures |
|
|
26 | (1) |
|
2.4.2 Analog and digitized fashion of IMC |
|
|
27 | (2) |
|
|
29 | (5) |
|
|
34 | (1) |
|
2.4.5 Literature review of IMC |
|
|
34 | (6) |
|
2.5 Analysis of IMC architectures |
|
|
40 | (5) |
|
3 The background of ReRAM devices |
|
|
45 | (32) |
|
3.1 ReRAM device and SPICE model |
|
|
45 | (9) |
|
3.1.1 Drift-type ReRAM device |
|
|
45 | (7) |
|
3.1.2 Diffusive-type ReRAM device |
|
|
52 | (2) |
|
3.2 ReRAM-crossbar structure |
|
|
54 | (5) |
|
3.2.1 Analog and digitized ReRAM crossbar |
|
|
55 | (2) |
|
3.2.2 Connection of ReRAM crossbar |
|
|
57 | (2) |
|
3.3 ReRAM-based oscillator |
|
|
59 | (2) |
|
3.4 Write-in scheme for multibit ReRAM storage |
|
|
61 | (9) |
|
|
61 | (1) |
|
3.4.2 Multi-threshold resistance for data storage |
|
|
62 | (1) |
|
|
63 | (2) |
|
|
65 | (2) |
|
3.4.5 Encoding and 3-bit storage |
|
|
67 | (3) |
|
3.5 Logic functional units with ReRAM |
|
|
70 | (1) |
|
|
70 | (1) |
|
|
70 | (1) |
|
3.6 ReRAM for logic operations |
|
|
71 | (6) |
|
3.6.1 Simulation settings |
|
|
72 | (1) |
|
3.6.2 ReRAM-based circuits |
|
|
73 | (1) |
|
3.6.3 ReRAM as a computational unit-cum-memory |
|
|
74 | (3) |
|
Part II Machine learning accelerators |
|
|
77 | (88) |
|
4 The background of machine learning algorithms |
|
|
79 | (20) |
|
4.1 SVM-based machine learning |
|
|
79 | (1) |
|
4.2 Single-layer feedforward neural network-based machine learning |
|
|
80 | (7) |
|
4.2.1 Single-layer feedforward network |
|
|
80 | (4) |
|
4.2.2 L2-norm-gradient-based learning |
|
|
84 | (3) |
|
4.3 DCNN-based machine learning |
|
|
87 | (6) |
|
4.3.1 Deep learning for multilayer neural network |
|
|
87 | (1) |
|
4.3.2 Convolutional neural network |
|
|
87 | (1) |
|
4.3.3 Binary convolutional neural network |
|
|
88 | (5) |
|
4.4 TNN-based machine learning |
|
|
93 | (6) |
|
4.4.1 Tensor-train decomposition and compression |
|
|
93 | (1) |
|
4.4.2 Tensor-train-based neural network |
|
|
94 | (2) |
|
|
96 | (3) |
|
5 XIMA: the in-ReRAM machine learning architecture |
|
|
99 | (16) |
|
5.1 ReRAM network-based ML operations |
|
|
99 | (9) |
|
5.1.1 ReR AM-crossbar network |
|
|
99 | (7) |
|
5.1.2 Coupled ReRAM oscillator network |
|
|
106 | (2) |
|
5.2 ReRAM network-based in-memory ML accelerator |
|
|
108 | (7) |
|
5.2.1 Distributed ReRAM-crossbar in-memory architecture |
|
|
109 | (2) |
|
|
111 | (4) |
|
6 The mapping of machine learning algorithms on XIMA |
|
|
115 | (50) |
|
6.1 Machine learning algorithms on XIMA |
|
|
115 | (26) |
|
6.1.1 SLFN-based learning and inference acceleration |
|
|
115 | (2) |
|
6.1.2 BCNN-based inference acceleration on passive array |
|
|
117 | (4) |
|
6.1.3 BCNN-based inference acceleration on 1S1R array |
|
|
121 | (1) |
|
6.1.4 L2-norm gradient-based learning and inference acceleration |
|
|
122 | (4) |
|
6.1.5 Experimental evaluation of machine learning algorithms on XIMA architecture |
|
|
126 | (15) |
|
6.2 Machine learning algorithms on 3D XIMA |
|
|
141 | (24) |
|
6.2.1 On-chip design for SLFN |
|
|
141 | (4) |
|
6.2.2 On-chip design for TNNs |
|
|
145 | (6) |
|
6.2.3 Experimental evaluation of machine learning algorithms on 3D CMOS-ReRAM |
|
|
151 | (14) |
|
|
165 | (52) |
|
7 Large-scale case study: accelerator for ResNet |
|
|
167 | (22) |
|
|
167 | (1) |
|
7.2 Deep neural network with quantization |
|
|
168 | (6) |
|
|
168 | (2) |
|
7.2.2 Quantized convolution and residual block |
|
|
170 | (2) |
|
|
172 | (1) |
|
7.2.4 Quantized activation function and pooling |
|
|
172 | (1) |
|
7.2.5 Quantized deep neural network overview |
|
|
173 | (1) |
|
|
173 | (1) |
|
7.3 Device for in-memory computing |
|
|
174 | (3) |
|
|
174 | (2) |
|
7.3.2 Customized DAC and ADC circuits |
|
|
176 | (1) |
|
7.3.3 In-memory computing architecture |
|
|
176 | (1) |
|
7.4 Quantized ResNet on ReRAM crossbar |
|
|
177 | (3) |
|
|
177 | (1) |
|
7.4.2 Overall architecture |
|
|
178 | (2) |
|
|
180 | (9) |
|
7.5.1 Experiment settings |
|
|
180 | (1) |
|
|
181 | (1) |
|
|
182 | (3) |
|
7.5.4 Performance analysis |
|
|
185 | (4) |
|
8 Large-scale case study: accelerator for compressive sensing |
|
|
189 | (26) |
|
|
189 | (3) |
|
|
192 | (2) |
|
8.2.1 Compressive sensing and isometric distortion |
|
|
192 | (1) |
|
8.2.2 Optimized near-isometric embedding |
|
|
192 | (2) |
|
8.3 Boolean embedding for signal acquisition front end |
|
|
194 | (3) |
|
8.3.1 CMOS-based Boolean embedding circuit |
|
|
194 | (1) |
|
8.3.2 ReRAM crossbar-based Boolean embedding circuit |
|
|
195 | (2) |
|
8.3.3 Problem formulation |
|
|
197 | (1) |
|
|
197 | (3) |
|
8.4.1 Orthogonal rotation |
|
|
198 | (1) |
|
|
199 | (1) |
|
8.4.3 Overall optimization algorithm |
|
|
199 | (1) |
|
8.5 Row generation algorithm |
|
|
200 | (3) |
|
8.5.1 Elimination of norm equality constraint |
|
|
200 | (1) |
|
8.5.2 Convex relaxation of orthogonal constraint |
|
|
201 | (1) |
|
8.5.3 Overall optimization algorithm |
|
|
202 | (1) |
|
|
203 | (12) |
|
|
203 | (1) |
|
8.6.2 IH algorithm on high-D ECG signals |
|
|
204 | (3) |
|
8.6.3 Row generation algorithm on low-D image patches |
|
|
207 | (3) |
|
8.6.4 Hardware performance evaluation |
|
|
210 | (5) |
|
9 Conclusions: wrap-up, open questions and challenges |
|
|
215 | (2) |
|
|
215 | (1) |
|
|
216 | (1) |
References |
|
217 | (20) |
Index |
|
237 | |