Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Machine Learning on Commodity Tiny Devices: Theory and Practice

Qihua Zhou, Song Guo

Formaat: 268 pages
Ilmumisaeg: 13-Dec-2022
Kirjastus: CRC Press
Keel: eng
ISBN-13: 9781000780383

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 90,99 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
Raamatukogudele

Formaat: 268 pages
Ilmumisaeg: 13-Dec-2022
Kirjastus: CRC Press
Keel: eng
ISBN-13: 9781000780383

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

This book aims at the tiny machine learning (TinyML) software and hardware synergy for edge intelligence applications. This book presents on-device learning techniques covering model-level neural network design, algorithm-level training optimization and hardware-level instruction acceleration.

Analyzing the limitations of conventional in-cloud computing would reveal that on-device learning is a promising research direction to meet the requirements of edge intelligence applications. As to the cutting-edge research of TinyML, implementing a high-efficiency learning framework and enabling system-level acceleration is one of the most fundamental issues. This book presents a comprehensive discussion of the latest research progress and provides system-level insights on designing TinyML frameworks, including neural network design, training algorithm optimization and domain-specific hardware acceleration. It identifies the main challenges when deploying TinyML tasks in the real world and guides the researchers to deploy a reliable learning system.

This book will be of interest to students and scholars in the field of edge intelligence, especially to those with sufficient professional Edge AI skills. It will also be an excellent guide for researchers to implement high-performance TinyML systems.

List of Figures

xiii

List of Tables

xvii

Chapter 1 Introduction

(10)

1.1 What Is Machine Learning On Devices?

(2)

1.2 On-Device Learning and Tinyml Systems

(2)

1.2.1 Property of On-Device Learning

(1)

1.2.2 Objectives of TinyML Systems

(1)

1.3 Challenges for Realistic Implementation

(1)

1.4 Problem Statement of Building Tinyml Systems

(1)

1.5 Deployment Prospects and Downstream Applications

(2)

1.5.1 Evaluation Metrics for Practical Methods

(1)

1.5.2 Intelligent Medical Diagnosis

(1)

1.5.3 AI-Enhanced Motion Tracking

(1)

1.5.4 Domain-Specific Acceleration Chips

(1)

1.6 The Scope and Organization of This Book

(2)

Chapter 2 Fundamentals: On-Device Learning Paradigm

(14)

2.1 Motivation

(4)

2.1.1 Drawbacks of In-Cloud Learning

(1)

2.1.2 Rise of On-Device Learning

(1)

2.1.3 Bit Precision and Data Quantization

(1)

2.1.4 Potential Gains

(1)

2.1.5 Why Not Existing Quantization Methods?

(1)

2.2 Basic Training Algorithms

(3)

2.2.1 Stochastic Gradient Descent

(1)

2.2.2 Mini-Batch Stochastic Gradient Descent

(1)

2.2.3 Training of Neural Networks

(1)

2.3 Parameter Synchronization for Distributed Training

(2)

2.3.1 Parameter Server Paradigm

(1)

2.3.2 Parameter Synchronization Pace

(1)

2.3.3 Heterogeneity-Aware Distributed Training

(1)

2.4 Multi-Client On-Device Learning

(2)

2.4.1 Preliminary Experiments

(1)

2.4.2 Observations

(1)

2.4.2.1 Training Convergence Efficiency

(1)

2.4.2.2 Synchronization Frequency

(1)

2.4.2.3 Communication Traffic

(1)

2.4.3 Summary

(1)

2.5 Developing Kits and Evaluation Platforms

(1)

2.5.1 Devices

(1)

2.5.2 Benchmarks

(1)

2.5.3 Pipeline

(1)

2.6
Chapter Summary

(2)

Chapter 3 Preliminary: Theories and Algorithms

(20)

3.1 Elements of Neural Networks

(2)

3.1.1 Fully Connected Network

(1)

3.1.2 Convolutional Neural Network

(1)

3.1.3 Attention-Based Neural Network

(1)

3.2 Model-Oriented Optimization Algorithms

(4)

3.2.1 Tiny Transformer

(3)

3.2.2 Quantization Strategy for Transformer

(1)

3.3 Practice on Simple Convolutional Neural Networks

(14)

3.3.1 PyTorch Installation

(1)

3.3.1.1 On macOS

(1)

3.3.1.2 On Windows

(1)

3.3.2 CIFAR-10 Dataset

(2)

3.3.3 Construction of CNN Model

(1)

3.3.3.1 Convolutional Layers

(1)

3.3.3.2 Activation Layers

(1)

3.3.3.3 Pooling Layers

(2)

3.3.3.4 Fully Connected Layers

(1)

3.3.3.5 Structure of LeNet-5

(1)

3.3.4 Model Training

(1)

3.3.5 Model Testing

(1)

3.3.6 GPU Acceleration

(1)

3.3.6.1 CUDA Installation

(1)

3.3.6.2 Programming for GPU

(1)

3.3.7 Load Pre-Trained CNNs

(2)

Chapter 4 Model-Level Design: Computation Acceleration and Communication Saving

(22)

4.1 Optimization of Network Architecture

(11)

4.1.1 Network-Aware Parameter Pruning

(1)

4.1.1.1 Pruning Steps

(1)

4.1.1.2 Pruning Strategy

(1)

4.1.1.3 Pruning Metrics

(1)

4.1.1.4 Summary

(1)

4.1.2 Knowledge Distillation

(1)

4.1.2.1 Combination of Loss Functions

(1)

4.1.2.2 Tuning of Hyper-Parameters

(1)

4.1.2.3 Usage of Model Training

(1)

4.1.2.4 Summary

(1)

4.1.3 Model Fine-Tuning

(1)

4.1.3.1 Transfer Learning

(1)

4.1.3.2 Layer-Wise Freezing and Updating

(1)

4.1.3.3 Model-Wise Feature Sharing

(1)

4.1.3.4 Summary

(1)

4.1.4 Neural Architecture Search

(1)

4.1.4.1 Search Space of HW-NAS

(1)

4.1.4.2 Targeted Hardware Platforms

(1)

4.1.4.3 Trend of Current HW-NAS Methods

(1)

4.2 Optimization of Training Algorithm

(10)

4.2.1 Low Rank Factorization

(1)

4.2.2 Data-Adaptive Regularization

(1)

4.2.2.1 Core Formulation

(1)

4.2.2.2 On-Device Network Sparsification

(1)

4.2.2.3 Block-Wise Regularization

(1)

4.2.2.4 Summary

(1)

4.2.3 Data Representation and Numerical Quantization

(1)

4.2.3.1 Elements of Quantization

(2)

4.2.3.2 Post-Training Quantization

(1)

4.2.3.3 Quantization-Aware Training

(3)

4.2.3.4 Summary

(1)

4.3
Chapter Summary

(1)

Chapter 5 Hardware-Level Design: Neural Engines and Tensor Accelerators

(16)

5.1 On-Chip Resource Scheduling

(3)

5.1.1 Embedded Memory Controlling

(1)

5.1.2 Underlying Computational Primitives

(1)

5.1.3 Low-Level Arithmetical Instructions

(1)

5.1.4 MIMO-Based Communication

(1)

5.2 Domain-Specific Hardware Acceleration

(2)

5.2.1 Multiple Processing Primitives Scheduling

(1)

5.2.2 I/O Connection Optimization

(1)

5.2.3 Cache Management

(1)

5.2.4 Topology Construction

(1)

5.3 Cross-Device Energy Efficiency

(3)

5.3.1 Multi-Client Collaboration

(1)

5.3.2 Efficiency Analysis

(2)

5.3.3 Problem Formulation for Energy Saving

(1)

5.3.4 Algorithm Design and Pipeline Overview

(1)

5.4 Distributed On-Device Learning

(6)

5.4.1 Community-Aware Synchronous Parallel

(1)

5.4.2 Infrastructure Design

(1)

5.4.3 Community Manager

(1)

5.4.4 Weight Learner

(1)

5.4.4.1 Distance Metric Learning

(1)

5.4.4.2 Asynchronous Advantage Actor-Critic

(1)

5.4.4.3 Agent Learning Methodology

(1)

5.4.5 Distributed Training Controller

(1)

5.4.5.1 Intra-Community Synchronization

(1)

5.4.5.2 Inter-Community Synchronization

(1)

5.4.5.3 Communication Traffic Aggregation

(1)

5.5
Chapter Summary

(2)

Chapter 6 Infrastructure-Level Design: Serverless and Decentralized Machine Learning

(16)

6.1 Serverless Computing

(11)

6.1.1 Definition of Serverless Computing

(3)

6.1.2 Architecture of Serverless Computing

(1)

6.1.2.1 Virtualization Layer

(1)

6.1.2.2 Encapsulation Layer

(1)

6.1.2.3 System Orchestration Layer

(2)

6.1.2.4 System Coordination Layer

(1)

6.1.3 Benefits of Serverless Computing

(1)

6.1.4 Challenges of Serverless Computing

(1)

6.1.4.1 Programming and Modeling

(1)

6.1.4.2 Pricing and Cost Prediction

(1)

6.1.4.3 Scheduling

(1)

6.1.4.4 Intra-Communications of Functions

(1)

6.1.4.5 Data Caching

(1)

6.1.4.6 Security and Privacy

(1)

6.2 Serverless Machine Learning

(3)

6.2.1 Introduction

(1)

6.2.2 Machine Learning and Data Management

(1)

6.2.3 Training Large Models in Serverless Computing

(1)

6.2.3.1 Data Transfer and Parallelism in Serverless Computing

(1)

6.2.3.2 Data Parallelism for Model Training in Serverless Computing

(1)

6.2.3.3 Optimizing Parallelism Structure in Serverless Training

(1)

6.2.4 Cost-Efficiency in Serverless Computing

(1)

6.3
Chapter Summary

(2)

Chapter 7 System-Level Design: From Standalone to Clusters

(24)

7.1 Staleness-Aware Pipelining

(4)

7.1.1 Data Parallelism

100

(1)

7.1.2 Model Parallelism

100

(1)

7.1.2.1 Linear Models

101

(1)

7.1.2.2 Non-Linear Neural Networks

101

(1)

7.1.3 Hybrid Parallelism

102

(1)

7.1.4 Extension of Training Parallelism

102

(1)

7.1.5 Summary

103

(1)

7.2 Introduction to Federated Learning

103

(3)

7.3 Training With Non-IID Data

106

(6)

7.3.1 The Definition of Non-IID Data

107

(1)

7.3.2 Enabling Technologies for Non-IID Data

108

(1)

7.3.2.1 Data Sharing

108

(1)

7.3.2.2 Robust Aggregation Methods

109

(2)

7.3.2.3 Other Optimized Methods

111

(1)

7.4 Large-Scale Collaborative Learning

112

(3)

7.4.1 Parameter Server

113

(1)

7.4.2 Decentralized P2P Scheme

113

(1)

7.4.3 Collective Communication-Based AllReduce

114

(1)

7.4.4 Data Flow-Based Graph

114

(1)

7.5 Personalized Learning

115

(2)

7.5.1 Data-Based Approaches

115

(1)

7.5.2 Model-Based Approaches

115

(1)

7.5.2.1 Single Model-Based Methods

116

(1)

7.5.2.2 Multiple Model-Based Methods

116

(1)

7.6 Practice On Fl Implementation

117

(5)

7.6.1 Prerequisites

117

(1)

7.6.2 Data Distribution

118

(2)

7.6.3 Local Model Training

120

(1)

7.6.4 Global Model Aggregation

121

(1)

7.6.5 A Simple Example

121

(1)

7.7
Chapter Summary

122

(1)

Chapter 8 Application: Image-Based Visual Perception

123

(22)

8.1 Image Classification

123

(4)

8.1.1 Traditional Image Classification Methods

123

(1)

8.1.2 Deep Learning-Based Image Classification Methods

124

(3)

8.1.3 Conclusion

127

(1)

8.2 Image Restoration and Super-Resolution

127

(8)

8.2.1 Overview

127

(2)

8.2.2 A Unified Framework for Image Restoration and Super-Resolution

129

(1)

8.2.3 A Demo of Single Image Super-Resolution

130

(1)

8.2.3.1 Networks Architecture

131

(1)

8.2.3.2 Local Aware Attention

132

(1)

8.2.3.3 Global Aware Attention

133

(1)

8.2.3.4 LARD Block

134

(1)

8.3 Self-Attention and Vision Transformers

135

(3)

8.4 Environment Perception: Image Segmentation and Object Detection

138

(6)

8.4.1 Object Detection

138

(1)

8.4.1.1 Traditional Object Detection Model

138

(1)

8.4.1.2 Deep Learning-Based Object Detection Model

139

(2)

8.4.2 Image Segmentation

141

(1)

8.4.2.1 Semantic Segmentation

141

(1)

8.4.2.2 Instance Segmentation

142

(1)

8.4.2.3 Panoramic Segmentation

143

(1)

8.5
Chapter Summary

144

(1)

Chapter 9 Application: Video-Based Real-Time Processing

145

(16)

9.1 Video Recognition: Evolving From Images

145

(4)

9.1.1 Challenges

146

(1)

9.1.2 Methodologies

147

(1)

9.1.2.1 Two-Stream Networks

147

(1)

9.1.2.2 3D CNNs

148

(1)

9.2 Motion Tracking: Learn From Time-Spatial Sequences

149

(4)

9.2.1 Deep Learning-Based Tracking

149

(1)

9.2.2 Optical Flow-Based Tracking

150

(3)

9.3 Pose Estimation: Key Point Extraction

153

(3)

9.3.1 2D-Based Extraction

153

(1)

9.3.1.1 Single Person Estimation

153

(1)

9.3.1.2 Multiple Human Estimation

154

(1)

9.3.2 3D-Based Extraction

155

(1)

9.4 Practice: Real-Time Mobile Human Pose Tracking

156

(3)

9.4.1 Prerequisites and Data Preparation

156

(2)

9.4.2 Hyper-Parameter Configuration and Model Training

158

(1)

9.4.3 Realistic Inference and Performance Evaluation

158

(1)

9.5
Chapter Summary

159

(2)

Chapter 10 Application: Privacy, Security, Robustness and Trustworthiness in Edge Al

161

(26)

10.1 Privacy Protection Methods

161

(15)

10.1.1 Homomorphic Encryption-Enabled Methods

162

(5)

10.1.2 Differential Privacy-Enabled Methods

167

(3)

10.1.3 Secure Multi-Party Computation

170

(2)

10.1.4 Lightweight Private Computation Techniques for Edge AI

172

(1)

10.1.4.1 Example 1: Lightweight and Secure Decision Tree Classification

173

(1)

10.1.4.2 Example 2: Lightweight and Secure SVM Classification

173

(3)

10.2 Security and Robustness

176

(4)

10.2.1 Practical Issues

176

(2)

10.2.2 Backdoor Attacks

178

(2)

10.2.3 Backdoor Defences

180

(1)

10.3 Trustworthiness

180

(5)

10.3.1 Blockchain and Swarm Learning

180

(3)

10.3.2 Trusted Execution Environment and Federated Learning

183

(2)

10.4
Chapter Summary

185

(2)

Bibliography

187

(60)

Index

247

Song Guo is a Full Professor leading the Edge Intelligence Lab and Research Group of Networking and Mobile Computing at the Hong Kong Polytechnic University. Professor Guo is a Fellow of the Canadian Academy of Engineering, Fellow of the IEEE, Fellow of the AAIA and Clarivate Highly Cited Researcher.

Qihua Zhou is a PhD student with the Department of Computing at the Hong Kong Polytechnic University. His research interests include distributed AI systems, large-scale parallel processing, TinyML systems and domain-specific accelerators.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97810007803836e.html

Märksõnad:

E-raamat: Machine Learning on Commodity Tiny Devices: Theory and Practice

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv