|
|
xiii | |
|
|
xvii | |
|
|
1 | (10) |
|
1.1 What Is Machine Learning On Devices? |
|
|
1 | (2) |
|
1.2 On-Device Learning and Tinyml Systems |
|
|
3 | (2) |
|
1.2.1 Property of On-Device Learning |
|
|
3 | (1) |
|
1.2.2 Objectives of TinyML Systems |
|
|
4 | (1) |
|
1.3 Challenges for Realistic Implementation |
|
|
5 | (1) |
|
1.4 Problem Statement of Building Tinyml Systems |
|
|
6 | (1) |
|
1.5 Deployment Prospects and Downstream Applications |
|
|
7 | (2) |
|
1.5.1 Evaluation Metrics for Practical Methods |
|
|
7 | (1) |
|
1.5.2 Intelligent Medical Diagnosis |
|
|
8 | (1) |
|
1.5.3 AI-Enhanced Motion Tracking |
|
|
8 | (1) |
|
1.5.4 Domain-Specific Acceleration Chips |
|
|
8 | (1) |
|
1.6 The Scope and Organization of This Book |
|
|
9 | (2) |
|
Chapter 2 Fundamentals: On-Device Learning Paradigm |
|
|
11 | (14) |
|
|
11 | (4) |
|
2.1.1 Drawbacks of In-Cloud Learning |
|
|
12 | (1) |
|
2.1.2 Rise of On-Device Learning |
|
|
12 | (1) |
|
2.1.3 Bit Precision and Data Quantization |
|
|
13 | (1) |
|
|
14 | (1) |
|
2.1.5 Why Not Existing Quantization Methods? |
|
|
14 | (1) |
|
2.2 Basic Training Algorithms |
|
|
15 | (3) |
|
2.2.1 Stochastic Gradient Descent |
|
|
15 | (1) |
|
2.2.2 Mini-Batch Stochastic Gradient Descent |
|
|
16 | (1) |
|
2.2.3 Training of Neural Networks |
|
|
17 | (1) |
|
2.3 Parameter Synchronization for Distributed Training |
|
|
18 | (2) |
|
2.3.1 Parameter Server Paradigm |
|
|
18 | (1) |
|
2.3.2 Parameter Synchronization Pace |
|
|
19 | (1) |
|
2.3.3 Heterogeneity-Aware Distributed Training |
|
|
19 | (1) |
|
2.4 Multi-Client On-Device Learning |
|
|
20 | (2) |
|
2.4.1 Preliminary Experiments |
|
|
20 | (1) |
|
|
20 | (1) |
|
2.4.2.1 Training Convergence Efficiency |
|
|
20 | (1) |
|
2.4.2.2 Synchronization Frequency |
|
|
21 | (1) |
|
2.4.2.3 Communication Traffic |
|
|
21 | (1) |
|
|
22 | (1) |
|
2.5 Developing Kits and Evaluation Platforms |
|
|
22 | (1) |
|
|
22 | (1) |
|
|
22 | (1) |
|
|
22 | (1) |
|
|
23 | (2) |
|
Chapter 3 Preliminary: Theories and Algorithms |
|
|
25 | (20) |
|
3.1 Elements of Neural Networks |
|
|
25 | (2) |
|
3.1.1 Fully Connected Network |
|
|
25 | (1) |
|
3.1.2 Convolutional Neural Network |
|
|
26 | (1) |
|
3.1.3 Attention-Based Neural Network |
|
|
26 | (1) |
|
3.2 Model-Oriented Optimization Algorithms |
|
|
27 | (4) |
|
|
27 | (3) |
|
3.2.2 Quantization Strategy for Transformer |
|
|
30 | (1) |
|
3.3 Practice on Simple Convolutional Neural Networks |
|
|
31 | (14) |
|
3.3.1 PyTorch Installation |
|
|
31 | (1) |
|
|
31 | (1) |
|
|
32 | (1) |
|
|
32 | (2) |
|
3.3.3 Construction of CNN Model |
|
|
34 | (1) |
|
3.3.3.1 Convolutional Layers |
|
|
34 | (1) |
|
3.3.3.2 Activation Layers |
|
|
35 | (1) |
|
|
35 | (2) |
|
3.3.3.4 Fully Connected Layers |
|
|
37 | (1) |
|
3.3.3.5 Structure of LeNet-5 |
|
|
38 | (1) |
|
|
39 | (1) |
|
|
40 | (1) |
|
|
41 | (1) |
|
3.3.6.1 CUDA Installation |
|
|
42 | (1) |
|
3.3.6.2 Programming for GPU |
|
|
42 | (1) |
|
3.3.7 Load Pre-Trained CNNs |
|
|
43 | (2) |
|
Chapter 4 Model-Level Design: Computation Acceleration and Communication Saving |
|
|
45 | (22) |
|
4.1 Optimization of Network Architecture |
|
|
45 | (11) |
|
4.1.1 Network-Aware Parameter Pruning |
|
|
46 | (1) |
|
|
46 | (1) |
|
|
47 | (1) |
|
|
47 | (1) |
|
|
48 | (1) |
|
4.1.2 Knowledge Distillation |
|
|
48 | (1) |
|
4.1.2.1 Combination of Loss Functions |
|
|
49 | (1) |
|
4.1.2.2 Tuning of Hyper-Parameters |
|
|
50 | (1) |
|
4.1.2.3 Usage of Model Training |
|
|
50 | (1) |
|
|
51 | (1) |
|
|
51 | (1) |
|
4.1.3.1 Transfer Learning |
|
|
51 | (1) |
|
4.1.3.2 Layer-Wise Freezing and Updating |
|
|
52 | (1) |
|
4.1.3.3 Model-Wise Feature Sharing |
|
|
53 | (1) |
|
|
54 | (1) |
|
4.1.4 Neural Architecture Search |
|
|
54 | (1) |
|
4.1.4.1 Search Space of HW-NAS |
|
|
55 | (1) |
|
4.1.4.2 Targeted Hardware Platforms |
|
|
56 | (1) |
|
4.1.4.3 Trend of Current HW-NAS Methods |
|
|
56 | (1) |
|
4.2 Optimization of Training Algorithm |
|
|
56 | (10) |
|
4.2.1 Low Rank Factorization |
|
|
57 | (1) |
|
4.2.2 Data-Adaptive Regularization |
|
|
57 | (1) |
|
|
57 | (1) |
|
4.2.2.2 On-Device Network Sparsification |
|
|
58 | (1) |
|
4.2.2.3 Block-Wise Regularization |
|
|
59 | (1) |
|
|
59 | (1) |
|
4.2.3 Data Representation and Numerical Quantization |
|
|
60 | (1) |
|
4.2.3.1 Elements of Quantization |
|
|
61 | (2) |
|
4.2.3.2 Post-Training Quantization |
|
|
63 | (1) |
|
4.2.3.3 Quantization-Aware Training |
|
|
63 | (3) |
|
|
66 | (1) |
|
|
66 | (1) |
|
Chapter 5 Hardware-Level Design: Neural Engines and Tensor Accelerators |
|
|
67 | (16) |
|
5.1 On-Chip Resource Scheduling |
|
|
67 | (3) |
|
5.1.1 Embedded Memory Controlling |
|
|
67 | (1) |
|
5.1.2 Underlying Computational Primitives |
|
|
68 | (1) |
|
5.1.3 Low-Level Arithmetical Instructions |
|
|
68 | (1) |
|
5.1.4 MIMO-Based Communication |
|
|
69 | (1) |
|
5.2 Domain-Specific Hardware Acceleration |
|
|
70 | (2) |
|
5.2.1 Multiple Processing Primitives Scheduling |
|
|
70 | (1) |
|
5.2.2 I/O Connection Optimization |
|
|
70 | (1) |
|
|
71 | (1) |
|
5.2.4 Topology Construction |
|
|
71 | (1) |
|
5.3 Cross-Device Energy Efficiency |
|
|
72 | (3) |
|
5.3.1 Multi-Client Collaboration |
|
|
72 | (1) |
|
5.3.2 Efficiency Analysis |
|
|
72 | (2) |
|
5.3.3 Problem Formulation for Energy Saving |
|
|
74 | (1) |
|
5.3.4 Algorithm Design and Pipeline Overview |
|
|
75 | (1) |
|
5.4 Distributed On-Device Learning |
|
|
75 | (6) |
|
5.4.1 Community-Aware Synchronous Parallel |
|
|
76 | (1) |
|
5.4.2 Infrastructure Design |
|
|
77 | (1) |
|
|
77 | (1) |
|
|
77 | (1) |
|
5.4.4.1 Distance Metric Learning |
|
|
77 | (1) |
|
5.4.4.2 Asynchronous Advantage Actor-Critic |
|
|
78 | (1) |
|
5.4.4.3 Agent Learning Methodology |
|
|
79 | (1) |
|
5.4.5 Distributed Training Controller |
|
|
80 | (1) |
|
5.4.5.1 Intra-Community Synchronization |
|
|
80 | (1) |
|
5.4.5.2 Inter-Community Synchronization |
|
|
80 | (1) |
|
5.4.5.3 Communication Traffic Aggregation |
|
|
81 | (1) |
|
|
81 | (2) |
|
Chapter 6 Infrastructure-Level Design: Serverless and Decentralized Machine Learning |
|
|
83 | (16) |
|
|
83 | (11) |
|
6.1.1 Definition of Serverless Computing |
|
|
83 | (3) |
|
6.1.2 Architecture of Serverless Computing |
|
|
86 | (1) |
|
6.1.2.1 Virtualization Layer |
|
|
86 | (1) |
|
6.1.2.2 Encapsulation Layer |
|
|
87 | (1) |
|
6.1.2.3 System Orchestration Layer |
|
|
88 | (2) |
|
6.1.2.4 System Coordination Layer |
|
|
90 | (1) |
|
6.1.3 Benefits of Serverless Computing |
|
|
90 | (1) |
|
6.1.4 Challenges of Serverless Computing |
|
|
91 | (1) |
|
6.1.4.1 Programming and Modeling |
|
|
91 | (1) |
|
6.1.4.2 Pricing and Cost Prediction |
|
|
91 | (1) |
|
|
91 | (1) |
|
6.1.4.4 Intra-Communications of Functions |
|
|
92 | (1) |
|
|
93 | (1) |
|
6.1.4.6 Security and Privacy |
|
|
93 | (1) |
|
6.2 Serverless Machine Learning |
|
|
94 | (3) |
|
|
94 | (1) |
|
6.2.2 Machine Learning and Data Management |
|
|
94 | (1) |
|
6.2.3 Training Large Models in Serverless Computing |
|
|
94 | (1) |
|
6.2.3.1 Data Transfer and Parallelism in Serverless Computing |
|
|
95 | (1) |
|
6.2.3.2 Data Parallelism for Model Training in Serverless Computing |
|
|
95 | (1) |
|
6.2.3.3 Optimizing Parallelism Structure in Serverless Training |
|
|
96 | (1) |
|
6.2.4 Cost-Efficiency in Serverless Computing |
|
|
96 | (1) |
|
|
97 | (2) |
|
Chapter 7 System-Level Design: From Standalone to Clusters |
|
|
99 | (24) |
|
7.1 Staleness-Aware Pipelining |
|
|
99 | (4) |
|
|
100 | (1) |
|
|
100 | (1) |
|
|
101 | (1) |
|
7.1.2.2 Non-Linear Neural Networks |
|
|
101 | (1) |
|
|
102 | (1) |
|
7.1.4 Extension of Training Parallelism |
|
|
102 | (1) |
|
|
103 | (1) |
|
7.2 Introduction to Federated Learning |
|
|
103 | (3) |
|
7.3 Training With Non-IID Data |
|
|
106 | (6) |
|
7.3.1 The Definition of Non-IID Data |
|
|
107 | (1) |
|
7.3.2 Enabling Technologies for Non-IID Data |
|
|
108 | (1) |
|
|
108 | (1) |
|
7.3.2.2 Robust Aggregation Methods |
|
|
109 | (2) |
|
7.3.2.3 Other Optimized Methods |
|
|
111 | (1) |
|
7.4 Large-Scale Collaborative Learning |
|
|
112 | (3) |
|
|
113 | (1) |
|
7.4.2 Decentralized P2P Scheme |
|
|
113 | (1) |
|
7.4.3 Collective Communication-Based AllReduce |
|
|
114 | (1) |
|
7.4.4 Data Flow-Based Graph |
|
|
114 | (1) |
|
7.5 Personalized Learning |
|
|
115 | (2) |
|
7.5.1 Data-Based Approaches |
|
|
115 | (1) |
|
7.5.2 Model-Based Approaches |
|
|
115 | (1) |
|
7.5.2.1 Single Model-Based Methods |
|
|
116 | (1) |
|
7.5.2.2 Multiple Model-Based Methods |
|
|
116 | (1) |
|
7.6 Practice On Fl Implementation |
|
|
117 | (5) |
|
|
117 | (1) |
|
|
118 | (2) |
|
7.6.3 Local Model Training |
|
|
120 | (1) |
|
7.6.4 Global Model Aggregation |
|
|
121 | (1) |
|
|
121 | (1) |
|
|
122 | (1) |
|
Chapter 8 Application: Image-Based Visual Perception |
|
|
123 | (22) |
|
|
123 | (4) |
|
8.1.1 Traditional Image Classification Methods |
|
|
123 | (1) |
|
8.1.2 Deep Learning-Based Image Classification Methods |
|
|
124 | (3) |
|
|
127 | (1) |
|
8.2 Image Restoration and Super-Resolution |
|
|
127 | (8) |
|
|
127 | (2) |
|
8.2.2 A Unified Framework for Image Restoration and Super-Resolution |
|
|
129 | (1) |
|
8.2.3 A Demo of Single Image Super-Resolution |
|
|
130 | (1) |
|
8.2.3.1 Networks Architecture |
|
|
131 | (1) |
|
8.2.3.2 Local Aware Attention |
|
|
132 | (1) |
|
8.2.3.3 Global Aware Attention |
|
|
133 | (1) |
|
|
134 | (1) |
|
8.3 Self-Attention and Vision Transformers |
|
|
135 | (3) |
|
8.4 Environment Perception: Image Segmentation and Object Detection |
|
|
138 | (6) |
|
|
138 | (1) |
|
8.4.1.1 Traditional Object Detection Model |
|
|
138 | (1) |
|
8.4.1.2 Deep Learning-Based Object Detection Model |
|
|
139 | (2) |
|
|
141 | (1) |
|
8.4.2.1 Semantic Segmentation |
|
|
141 | (1) |
|
8.4.2.2 Instance Segmentation |
|
|
142 | (1) |
|
8.4.2.3 Panoramic Segmentation |
|
|
143 | (1) |
|
|
144 | (1) |
|
Chapter 9 Application: Video-Based Real-Time Processing |
|
|
145 | (16) |
|
9.1 Video Recognition: Evolving From Images |
|
|
145 | (4) |
|
|
146 | (1) |
|
|
147 | (1) |
|
9.1.2.1 Two-Stream Networks |
|
|
147 | (1) |
|
|
148 | (1) |
|
9.2 Motion Tracking: Learn From Time-Spatial Sequences |
|
|
149 | (4) |
|
9.2.1 Deep Learning-Based Tracking |
|
|
149 | (1) |
|
9.2.2 Optical Flow-Based Tracking |
|
|
150 | (3) |
|
9.3 Pose Estimation: Key Point Extraction |
|
|
153 | (3) |
|
9.3.1 2D-Based Extraction |
|
|
153 | (1) |
|
9.3.1.1 Single Person Estimation |
|
|
153 | (1) |
|
9.3.1.2 Multiple Human Estimation |
|
|
154 | (1) |
|
9.3.2 3D-Based Extraction |
|
|
155 | (1) |
|
9.4 Practice: Real-Time Mobile Human Pose Tracking |
|
|
156 | (3) |
|
9.4.1 Prerequisites and Data Preparation |
|
|
156 | (2) |
|
9.4.2 Hyper-Parameter Configuration and Model Training |
|
|
158 | (1) |
|
9.4.3 Realistic Inference and Performance Evaluation |
|
|
158 | (1) |
|
|
159 | (2) |
|
Chapter 10 Application: Privacy, Security, Robustness and Trustworthiness in Edge Al |
|
|
161 | (26) |
|
10.1 Privacy Protection Methods |
|
|
161 | (15) |
|
10.1.1 Homomorphic Encryption-Enabled Methods |
|
|
162 | (5) |
|
10.1.2 Differential Privacy-Enabled Methods |
|
|
167 | (3) |
|
10.1.3 Secure Multi-Party Computation |
|
|
170 | (2) |
|
10.1.4 Lightweight Private Computation Techniques for Edge AI |
|
|
172 | (1) |
|
10.1.4.1 Example 1: Lightweight and Secure Decision Tree Classification |
|
|
173 | (1) |
|
10.1.4.2 Example 2: Lightweight and Secure SVM Classification |
|
|
173 | (3) |
|
10.2 Security and Robustness |
|
|
176 | (4) |
|
|
176 | (2) |
|
|
178 | (2) |
|
|
180 | (1) |
|
|
180 | (5) |
|
10.3.1 Blockchain and Swarm Learning |
|
|
180 | (3) |
|
10.3.2 Trusted Execution Environment and Federated Learning |
|
|
183 | (2) |
|
|
185 | (2) |
Bibliography |
|
187 | (60) |
Index |
|
247 | |