Biographies |
|
xi | |
Preface for the English version |
|
xiii | |
Preface |
|
xv | |
Motivation for this book |
|
xv | |
Value of an AI computing systems course |
|
xvi | |
Content of the AI computing systems course |
|
xvii | |
Writing of this book |
|
xx | |
|
|
1 | (16) |
|
1.1 Artificial intelligence |
|
|
1 | (8) |
|
1.1.1 What is artificial intelligence? |
|
|
1 | (1) |
|
|
1 | (3) |
|
|
4 | (5) |
|
|
9 | (4) |
|
1.2.1 What are AI computing systems? |
|
|
9 | (1) |
|
1.2.2 The necessity of AICSs |
|
|
9 | (1) |
|
|
10 | (3) |
|
|
13 | (2) |
|
|
15 | (2) |
|
|
16 | (1) |
|
Chapter 2 Fundamentals of neural networks |
|
|
17 | (36) |
|
2.1 From machine learning to neural networks |
|
|
17 | (12) |
|
|
17 | (1) |
|
|
18 | (4) |
|
|
22 | (2) |
|
2.1.4 Two-layer neural network: multilayer perceptron |
|
|
24 | (2) |
|
2.1.5 Deep neural networks (deep learning) |
|
|
26 | (1) |
|
2.1.6 The history of neural networks |
|
|
27 | (2) |
|
2.2 Neural network training |
|
|
29 | (5) |
|
2.2.1 Forward propagation |
|
|
30 | (2) |
|
2.2.2 Backward propagation |
|
|
32 | (2) |
|
2.3 Neural network design: the principle |
|
|
34 | (8) |
|
|
34 | (1) |
|
2.3.2 Activation function |
|
|
35 | (4) |
|
|
39 | (3) |
|
2.4 Overfitting and regularization |
|
|
42 | (6) |
|
|
42 | (1) |
|
|
43 | (5) |
|
|
48 | (2) |
|
|
50 | (3) |
|
|
50 | (3) |
|
|
53 | (70) |
|
3.1 Convolutional neural networks for image processing |
|
|
53 | (12) |
|
|
55 | (1) |
|
3.1.2 Convolutional layer |
|
|
55 | (7) |
|
|
62 | (1) |
|
3.1.4 Fully connected layer |
|
|
63 | (1) |
|
|
63 | (1) |
|
|
63 | (2) |
|
3.2 CNN-based classification algorithms |
|
|
65 | (20) |
|
|
66 | (4) |
|
|
70 | (3) |
|
|
73 | (8) |
|
|
81 | (4) |
|
3.3 CNN-based object detection algorithms |
|
|
85 | (16) |
|
|
85 | (3) |
|
|
88 | (7) |
|
|
95 | (3) |
|
|
98 | (2) |
|
|
100 | (1) |
|
3.4 Sequence models: recurrent neural networks |
|
|
101 | (8) |
|
|
101 | (5) |
|
|
106 | (2) |
|
|
108 | (1) |
|
|
109 | (1) |
|
3.5 Generative adversarial networks |
|
|
109 | (6) |
|
|
110 | (1) |
|
|
110 | (3) |
|
|
113 | (2) |
|
|
115 | (6) |
|
3.6.1 CNN-based image style transfer |
|
|
116 | (3) |
|
3.6.2 Real-time style transfer |
|
|
119 | (2) |
|
|
121 | (2) |
|
|
121 | (2) |
|
Chapter 4 Fundamentals of programming frameworks |
|
|
123 | (44) |
|
4.1 Necessities of programming frameworks |
|
|
124 | (1) |
|
4.2 Fundamentals of programming frameworks |
|
|
124 | (2) |
|
4.2.1 Generic programming frameworks |
|
|
124 | (1) |
|
|
125 | (1) |
|
4.3 TensorFlow: model and tutorial |
|
|
126 | (16) |
|
4.3.1 Computational graph |
|
|
126 | (2) |
|
|
128 | (1) |
|
|
129 | (3) |
|
|
132 | (5) |
|
|
137 | (3) |
|
|
140 | (2) |
|
|
142 | (1) |
|
4.4 Deep learning inference in TensorFlow |
|
|
142 | (6) |
|
|
143 | (1) |
|
4.4.2 Define the basic operations |
|
|
144 | (2) |
|
4.4.3 Create neural network models |
|
|
146 | (2) |
|
|
148 | (1) |
|
4.5 Deep learning training in TensorFlow |
|
|
148 | (15) |
|
|
148 | (6) |
|
|
154 | (7) |
|
|
161 | (2) |
|
4.5.4 Image style transfer training |
|
|
163 | (1) |
|
|
163 | (4) |
|
|
164 | (3) |
|
Chapter 5 Programming framework principles |
|
|
167 | (40) |
|
5.1 TensorFlow design principles |
|
|
167 | (1) |
|
|
167 | (1) |
|
|
168 | (1) |
|
|
168 | (1) |
|
5.2 TensorFlow computational graph mechanism |
|
|
168 | (16) |
|
5.2.1 Computational graph |
|
|
169 | (8) |
|
5.2.2 Local execution of a computational graph |
|
|
177 | (6) |
|
5.2.3 Distributed execution of computational graphs |
|
|
183 | (1) |
|
5.3 TensorFlow system implementation |
|
|
184 | (14) |
|
5.3.1 Overall architecture |
|
|
184 | (2) |
|
5.3.2 Computational graph execution module |
|
|
186 | (2) |
|
5.3.3 Device abstraction and management |
|
|
188 | (4) |
|
5.3.4 Network and communication |
|
|
192 | (3) |
|
5.3.5 Operator definition |
|
|
195 | (3) |
|
5.4 Programming framework comparison |
|
|
198 | (6) |
|
|
199 | (4) |
|
|
203 | (1) |
|
|
204 | (1) |
|
|
204 | (1) |
|
|
204 | (3) |
|
|
205 | (2) |
|
Chapter 6 Deep learning processors |
|
|
207 | (40) |
|
6.1 Deep learning processors (DLPs) |
|
|
207 | (5) |
|
6.1.1 The purpose of DLPs |
|
|
207 | (1) |
|
6.1.2 The development history of DLPs |
|
|
208 | (3) |
|
6.1.3 The design motivation |
|
|
211 | (1) |
|
6.2 Deep learning algorithm analysis |
|
|
212 | (9) |
|
6.2.1 Computational characteristics |
|
|
212 | (4) |
|
6.2.2 Memory access patterns |
|
|
216 | (5) |
|
|
221 | (11) |
|
6.3.1 Instruction set architecture |
|
|
222 | (3) |
|
|
225 | (2) |
|
|
227 | (3) |
|
|
230 | (1) |
|
6.3.5 Mapping from algorithm to chip |
|
|
231 | (1) |
|
|
232 | (1) |
|
|
232 | (7) |
|
6.4.1 Scalar MAC-based computing unit |
|
|
233 | (2) |
|
|
235 | (2) |
|
|
237 | (2) |
|
6.5 Performance evaluation |
|
|
239 | (3) |
|
6.5.1 Performance metrics |
|
|
239 | (1) |
|
|
240 | (1) |
|
6.5.3 Factors affecting performance |
|
|
241 | (1) |
|
|
242 | (2) |
|
6.6.1 The GPU architecture |
|
|
242 | (1) |
|
6.6.2 The FPGA architecture |
|
|
243 | (1) |
|
6.6.3 Comparison of DLPs, GPU, and FPGA |
|
|
244 | (1) |
|
|
244 | (3) |
|
|
245 | (2) |
|
Chapter 7 Architecture for AI computing systems |
|
|
247 | (24) |
|
7.1 Single-core deep learning processor |
|
|
247 | (10) |
|
7.1.1 Overall architecture |
|
|
248 | (1) |
|
|
249 | (4) |
|
|
253 | (3) |
|
|
256 | (1) |
|
7.1.5 Summary of single-core deep learning processor |
|
|
256 | (1) |
|
7.2 The multicore deep learning processor |
|
|
257 | (10) |
|
7.2.1 The DLP-M architecture |
|
|
258 | (1) |
|
7.2.2 The cluster architecture |
|
|
258 | (6) |
|
7.2.3 Interconnection architecture |
|
|
264 | (2) |
|
7.2.4 Summary of multicore deep learning processors |
|
|
266 | (1) |
|
|
267 | (4) |
|
|
268 | (3) |
|
Chapter 8 AI programming language for AI computing systems |
|
|
271 | (108) |
|
8.1 Necessity of AI programming language |
|
|
271 | (9) |
|
|
272 | (1) |
|
|
273 | (5) |
|
|
278 | (1) |
|
|
278 | (2) |
|
8.2 Abstraction of AI programming language |
|
|
280 | (5) |
|
8.2.1 Abstract hardware architecture |
|
|
280 | (1) |
|
8.2.2 Typical AI computing system |
|
|
280 | (2) |
|
|
282 | (1) |
|
|
283 | (1) |
|
|
283 | (2) |
|
|
285 | (10) |
|
8.3.1 Heterogeneous programming model |
|
|
285 | (5) |
|
8.3.2 General AI programming model |
|
|
290 | (5) |
|
8.4 Fundamentals of AI programming language |
|
|
295 | (9) |
|
|
295 | (2) |
|
|
297 | (2) |
|
8.4.3 Macros, constants, and built-in variables |
|
|
299 | (1) |
|
|
299 | (1) |
|
|
300 | (1) |
|
|
301 | (1) |
|
|
302 | (1) |
|
8.4.8 Serial program example |
|
|
302 | (1) |
|
8.4.9 Parallel program example |
|
|
303 | (1) |
|
8.5 Programming interface of AI applications |
|
|
304 | (8) |
|
8.5.1 Kernel function interface |
|
|
305 | (2) |
|
|
307 | (2) |
|
|
309 | (3) |
|
8.6 Debugging AI applications |
|
|
312 | (19) |
|
8.6.1 Functional debugging method |
|
|
313 | (5) |
|
8.6.2 Function debugging interface |
|
|
318 | (3) |
|
8.6.3 Function debugging tool |
|
|
321 | (3) |
|
8.6.4 Precision debugging method |
|
|
324 | (1) |
|
8.6.5 Function debugging practice |
|
|
325 | (6) |
|
8.7 Optimizing AI applications |
|
|
331 | (15) |
|
8.7.1 Performance tuning method |
|
|
332 | (3) |
|
8.7.2 Performance tuning interface |
|
|
335 | (2) |
|
8.7.3 Performance tuning tools |
|
|
337 | (3) |
|
8.7.4 Performance tuning practice |
|
|
340 | (6) |
|
8.8 System development on AI programming language |
|
|
346 | (28) |
|
8.8.1 High-performance library operator development |
|
|
347 | (4) |
|
8.8.2 Programming framework operator development |
|
|
351 | (6) |
|
8.8.3 System development and optimization practice |
|
|
357 | (17) |
|
|
374 | (5) |
|
Chapter 9 Practice: AI computing systems |
|
|
379 | (18) |
|
9.1 Basic practice: image style transfer |
|
|
379 | (11) |
|
9.1.1 Operator implementation based on AI programming language |
|
|
379 | (3) |
|
9.1.2 Implementation of image style transfer |
|
|
382 | (4) |
|
9.1.3 Image style transfer practice |
|
|
386 | (4) |
|
9.2 Advanced practice: object detection |
|
|
390 | (4) |
|
9.2.1 Operator implementation based on AI programming language |
|
|
390 | (3) |
|
9.2.2 Implementation of object detection |
|
|
393 | (1) |
|
|
394 | (3) |
|
APPENDIX A Fundamentals of computer architecture |
|
|
397 | (6) |
|
A.1 The instruction set of general-purpose CPUs |
|
|
397 | (2) |
|
A.2 Memory hierarchy in computing systems |
|
|
399 | (4) |
|
|
399 | (2) |
|
|
401 | (2) |
|
APPENDIX B Experimental environment |
|
|
403 | (4) |
|
|
403 | (2) |
|
|
403 | (1) |
|
|
403 | (1) |
|
|
404 | (1) |
|
B.1.4 Unzipping the file package |
|
|
404 | (1) |
|
B.1.5 Setting environment variables |
|
|
404 | (1) |
|
|
405 | (2) |
References |
|
407 | (10) |
Final words |
|
417 | (4) |
Index |
|
421 | |