Muutke küpsiste eelistusi

Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence [Kõva köide]

Edited by , Edited by , Edited by , Edited by (Loyola University Chicago, Chicago, Illinois), Edited by
  • Formaat: Hardback, 436 pages, kõrgus x laius: 234x156 mm, kaal: 820 g, 58 Tables, black and white; 61 Line drawings, color; 39 Line drawings, black and white; 1 Halftones, color; 62 Illustrations, color; 39 Illustrations, black and white
  • Sari: Chapman & Hall/CRC Computer Vision
  • Ilmumisaeg: 23-Feb-2022
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 0367744708
  • ISBN-13: 9780367744700
  • Formaat: Hardback, 436 pages, kõrgus x laius: 234x156 mm, kaal: 820 g, 58 Tables, black and white; 61 Line drawings, color; 39 Line drawings, black and white; 1 Halftones, color; 62 Illustrations, color; 39 Illustrations, black and white
  • Sari: Chapman & Hall/CRC Computer Vision
  • Ilmumisaeg: 23-Feb-2022
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 0367744708
  • ISBN-13: 9780367744700
"Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems"--

Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. 



Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems.

Arvustused

On device AI has become increasingly important for reasons of latency, privacy and overall autonomy as computing becomes more and more ambient. Moreover, making AI, in particular computer vision, efficient and run well in low resource computing environments using frameworks like PyTorch is a priority of the industry to enable this. The IEEE Low-Power Computer Vision Challenge is one such effort that has and continues to push the field forward allowing us to make progress in this area. Facebook has been a proud sponsor and supporter of this challenge since 2018 and this book presents the winners solutions from previous challenges and can guide researchers, engineers, and students to design efficient on device AI. -- Joe Spisak, Product Lead at Facebook Artificial Intelligence

Computer vision is at the center of recent breakthroughs in artificial intelligence. Being able to process visual data in low-power computing environments will enable great advances in the field in areas such as edge computing and Internet of Things. This book presents work by experts in the field and their winning solutions. It is an indispensable resource for anyone interested creating AI technologies in resource constrained computing environments -- Mark Liao, Director, Institute of Information Science, Academia Sinica

From mobile phones to wearable health monitors, improved energy efficiency is the enabling technology of everything we take for granted today. Computer vision is at the center of artificial intelligence and machine learning. Today, artificial intelligence and low power are often at different ends of the spectrum. Low-power computer vision will enable greater adoption of the technologies in battery-powered IoT (Internet of Things) systems. This book collects the winners solutions of the Low-Power Computer Vision Challenge and provides insight on how to improve efficiency of artificial intelligence. -- Edwin Park, Principal Engineer at Qualcomm

Foreword xvii
Rebooting Computing and Low-Power Computer Vision xix
Editors xxi
Section I Introduction
Chapter 1 Book Introduction
3(14)
Yunq-Hsiang Lu
George K. Thiruvathukal
Jaeyoun Kim
Yiran Chen
Bo Chen
1.1 About The Book
4(1)
1.2
Chapter Summaries
4(13)
1.2.1 History of Low-Power Computer Vision Challenge
4(1)
1.2.2 Survey on Energy-Efficient Deep Neural Networks for Computer Vision
5(1)
1.2.3 Hardware Design and Software Practices for Efficient Neural Network Inference
6(1)
1.2.4 Progressive Automatic Design of Search Space for One-Shot Neural Architecture
6(1)
1.2.5 Fast Adjustable Threshold for Uniform Neural Network Quantization
7(1)
1.2.6 Power-efficient Neural Network Scheduling on Heterogeneous system on chips (SoCs)
8(1)
1.2.7 Efficient Neural Architecture Search
9(1)
1.2.8 Design Methodology for Low-Power Image Recognition Systems Design
10(1)
1.2.9 Guided Design for Efficient On-device Object Detection Model
11(1)
1.2.10 Quantizing Neural Networks for Low-Power Computer Vision
12(1)
1.2.11 A Practical Guide to Designing Efficient Mobile Architectures
13(1)
1.2.12 A Survey of Quantization Methods for Efficient Neural Network Inference
14(3)
Chapter 2 History of Low-Power Computer Vision Challenge
17(8)
Yung-Hsiang Lu
Xiao Hu
Yiran Chen
Joe Spisak
Gaurav Aggarwal
Mike Zheng Shou
George K. Thiruvathukal
2.1 Rebooting Computing
17(1)
2.2 Low-Power Image Recognition Challenge (LPIRC): 2015-2019
18(2)
2.3 Low-Power Computer Vision Challenge (LPCVC): 2020
20(1)
2.4 Winners
21(2)
2.5 Acknowledgments
23(2)
Chapter 3 Survey on Energy-Efficient Deep Neural Networks for Computer Vision
25(30)
Abhinav Goel
Caleb Tung
Xiao Hu
Haobo Wang
Yung-Hsiang Lu
George K. Thiruvathukal
3.1 Introduction
26(4)
3.2 Background
30(2)
3.2.1 Computation Intensity of Deep Neural Networks
30(1)
3.2.2 Low-Power Deep Neural Networks
31(1)
3.3 Parameter Quantization
32(3)
3.4 Deep Neural Network Pruning
35(2)
3.5 Deep Neural Network Layer And Filter Compression
37(2)
3.6 Parameter Matrix Decomposition Techniques
39(1)
3.7 Neural Architecture Search
40(2)
3.8 Knowledge Distillation
42(2)
3.9 Energy Consumption--accuracy Tradeoff With Deep Neural Networks
44(2)
3.10 Guidelines For Low-Power Computer Vision
46(2)
3.10.1 Relationship Between Low-Power Computer Vision Techniques
46(1)
3.10.2 Deep Neural Network And Resolution Scaling
47(1)
3.11 Evaluation Metrics
48(2)
3.11.1 Accuracy Measurements On Popular Datasets
48(1)
3.11.2 Memory Requirement And Number Of Operations
49(1)
3.11.3 On-Device Energy Consumption And Latency
50(1)
3.12 Summary And Conclusions
50(5)
Section II Competition Winners
Chapter 4 Hardware Design and Software Practices for Efficient Neural Network Inference
55(36)
Yu Wang
Xuefei Ning
Shulin Zeng
Yi Cai
Kaiyuan Guo
Hanbo Sun
Changcheng Tang
Tianyi Lu
Shuang Liang
Tianchen Zhao
4.1 Hardware And Software Design Framework For Efficient Neural Network Inference
56(4)
4.1.1 Introduction
56(2)
4.1.2 From Model to Instructions
58(2)
4.2 ISA-Based CNN Accelerator: Angel-Eye
60(15)
4.2.1 Hardware Architecture
61(4)
4.2.2 Compiler
65(4)
4.2.3 Runtime Workflow
69(1)
4.2.4 Extension Support of Upsampling Layers
69(2)
4.2.5 Evaluation
71(3)
4.2.6 Practice on DAC-SDC Low-Power Object Detection Challenge
74(1)
4.3 Neural Network Model Optimization
75(15)
4.3.1 Pruning and Quantization
75(1)
4.3.1.1 Network Pruning
76(2)
4.3.1.2 Network Quantization
78(1)
4.3.1.3 Evaluation and Practices
79(2)
4.3.2 Pruning with Hardware Cost Model
81(1)
4.3.2.1 Iterative Search-based Pruning Methods
81(1)
4.3.2.2 Local Programming-based Pruning and the Practice in LPCVC'19
82(3)
4.3.3 Architecture Search Framework
85(1)
4.3.3.1 Framework Design
85(3)
4.3.3.2 Case Study Using the aw_nas Framework: Black-box Search Space Tuning for Hardware-aware NAS
88(2)
4.4 Summary
90(1)
Chapter 5 Progressive Automatic Design of Search Space for One-Shot Neural Architecture Search
91(20)
Xin Xia
Xuefenq Xiao
Xinq Wang
5.1 Abstract
92(1)
5.2 Introduction
92(3)
5.3 Related Work
95(1)
5.4 Method
96(5)
5.4.1 Problem Formulation and Motivation
96(2)
5.4.2 Progressive Automatic Design of Search Space
98(3)
5.5 Experiments
101(9)
5.5.1 Dataset and Implement Details
101(2)
5.5.2 Comparison with State-of-the-art Methods
103(3)
5.5.3 Automatically Designed Search Space
106(3)
5.5.4 Ablation Studies
109(1)
5.6 Conclusion
110(1)
Chapter 6 Fast Adjustable Threshold for Uniform Neural Network Quantization
111(16)
Alexander Goncharenko
Andrey Denisov
Sergey Alyamkin
6.1 Introduction
112(1)
6.2 Related Work
113(3)
6.2.1 Quantization with Knowledge Distillation
115(1)
6.2.2 Quantization without Fine-tuning
115(1)
6.2.3 Quantization with Training/Fine-tuning
115(1)
6.3 Method Description
116(7)
6.3.1 Quantization with Threshold Fine-tuning
116(1)
6.3.1.1 Differentiable Quantization Threshold
116(2)
6.3.1.2 Batch Normalization Folding
118(1)
6.3.1.3 Threshold Scale
118(1)
6.3.1.4 Training of Asymmetric Thresholds
119(1)
6.3.1.5 Vector Quantization
120(1)
6.3.2 Training on the Unlabeled Data
120(1)
6.3.3 Quantization of Depth-wise Separable Convolution
121(1)
6.3.3.1 Scaling the Weights for MobileNet-V2 (with ReLU6)
122(1)
6.4 Experiments And Results
123(2)
6.4.1 Experiments Description
123(1)
6.4.1.1 Researched Architectures
123(1)
6.4.1.2 Training Procedure
124(1)
6.4.2 Results
124(1)
6.5 Conclusion
125(2)
Chapter 7 Power-efficient Neural Network Scheduling
127(46)
Ying Wang
Xuyi Cai
Xiandonq Zhao
7.1 Introduction To Neural Network Scheduling On Heterogeneous Socs
128(3)
7.1.1 Heterogeneous SoC
129(1)
7.1.2 Network Scheduling
130(1)
7.2 Coarse-Grained Scheduling For Neural Network Tasks: A Case Study Of Champion Solution In LPIRC 2016
131(9)
7.2.1 Introduction to the LPIRC2016 Mission and the Solutions
131(2)
7.2.2 Static Scheduling for the Image Recognition Task
133(1)
7.2.3 Manual Load Balancing for Pipelined Fast R-CNN
134(4)
7.2.4 The Result of Static Scheduling
138(2)
7.3 Fine-Grained Neural Network Scheduling On Power-Efficient Processors
140(14)
7.3.1 Network Scheduling on SUs: Compiler-Level Techniques
140(1)
7.3.2 Memory-Efficient Network Scheduling
141(1)
7.3.3 The Formulation of the Layer-Fusion Problem by Computational Graphs
142(3)
7.3.4 Cost Estimation of Fused Layer-Groups
145(4)
7.3.5 Hardware-Aware Network Fusion Algorithm (HaNF)
149(1)
7.3.6 Implementation of the Network Fusion Algorithm
150(2)
7.3.7 Evaluation of Memory Overhead
152(1)
7.3.8 Performance on Different Processors
153(1)
7.4 Scheduler-Friendly Network Quantizations
154(16)
7.4.1 The Problem of Layer Pipelining between CPU and Integer SUs
154(1)
7.4.2 Introduction to Neural Network Quantization for Integer Neural Accelerators
155(4)
7.4.3 Related Work of Neural Network Quantization
159(1)
7.4.4 Linear Symmetric Quantization for Low-Precision Integer Hardware
160(1)
7.4.5 Making Full Use of the Pre-Trained Parameters
161(1)
7.4.6 Low-Precision Representation and Quantization Algorithm
161(2)
7.4.7 BN Layer Fusion of Quantized Networks
163(1)
7.4.8 Bias and Scaling Factor Quantization for Low-Precision Integer Operation
164(1)
7.4.9 Evaluation Results
165(5)
7.5 Summary
170(3)
Chapter 8 Efficient Neural Network Architectures
173(18)
Han Cai
Song Han
8.1 Standard Convolution Layer
174(1)
8.2 Efficient Convolution Layers
175(1)
8.3 Manually Designed Efficient CNN Models
175(4)
8.4 Neural Architecture Search
179(3)
8.5 Hardware-Aware Neural Architecture Search
182(7)
8.5.1 Latency Prediction
184(1)
8.5.2 Specialized Models for Different Hardware
185(1)
8.5.3 Handling Many Platforms and Constraints
186(3)
8.6 Conclusion
189(2)
Chapter 9 Design Methodology for Low-Power Image Recognition Systems
191(30)
Soonhoi Ha
EunJin Jeong
Duseok Kang
Janqryul Kim
Donqhyun Kang
9.1 Design Methodology Used In Lpirc 2017
193(8)
9.1.1 Object Detection Networks
194(1)
9.1.2 Throughput Maximization by Pipelining
195(1)
9.1.3 Software Optimization Techniques
196(1)
9.1.3.1 Tucker Decomposition
197(1)
9.1.3.2 CPU Parallelization
198(1)
9.1.3.3 16-bit Quantization
198(2)
9.1.3.4 Post Processing
200(1)
9.2 Image Recognition Network Exploration
201(7)
9.2.1 Single Stage Detectors
202(2)
9.2.2 Software Optimization Techniques
204(1)
9.2.3 Post Processing
205(1)
9.2.4 Network Exploration
206(1)
9.2.5 Lpirc 2018 Solution
207(1)
9.3 Network Pipelining For Heterogeneous Processor Systems
208(9)
9.3.1 Network Pipelining Problem
209(2)
9.3.2 Network Pipelining Heuristic
211(2)
9.3.3 Software Framework for Network Pipelining
213(1)
9.3.4 Experimental Results
214(3)
9.4 Conclusion And Future Work
217(4)
Chapter 10 Guided Design for Efficient On-device Object Detection Model
221(14)
Tao Shenq
Yang Liu
10.1 Introduction
222(2)
10.1.1 LPIRC Track 1 in 2018 and 2019
223(1)
10.1.2 Three Awards for Amazon team
223(1)
10.2 Background
224(1)
10.3 Award-Winning Methods
225(7)
10.3.1 Quantization Friendly Model
225(1)
10.3.2 Network Architecture Optimization
226(1)
10.3.3 Training Hyper-parameters
226(1)
10.3.4 Optimal Model Architecture
227(1)
10.3.5 Neural Architecture Search
228(1)
10.3.6 Dataset Filtering
228(2)
10.3.7 Non-maximum Suppression Threshold
230(1)
10.3.8 Combination
231(1)
10.4 Conclusion
232(3)
Section III Invited Articles
Chapter 11 Quantizing Neural Networks
235(38)
Marios Fournarakis
Markus Nagel
Rana Ali Amjad
Yelysei Bondarenko
Mart Van Baalen
Tljmen Blankevoort
11.1 Introduction
236(2)
11.2 Quantization Fundamentals
238(10)
11.2.1 Hardware Background
238(2)
11.2.2 Uniform Affine Quantization
240(2)
11.2.2.1 Symmetric Uniform Quantization
242(1)
11.2.2.2 Power-of-two Quantizer
242(1)
11.2.2.3 Quantization Granularity
243(1)
11.2.3 Quantization Simulation
243(1)
11.2.3.1 Batch Normalization Folding
244(1)
11.2.3.2 Activation Function Fusing
245(1)
11.2.3.3 Other Layers and Quantization
246(1)
11.2.4 Practical Considerations
247(1)
11.2.4.1 Symmetric vs. Asymmetric Quantization
247(1)
11.2.4.2 Per-tensor and Per-channel Quantization
248(1)
11.3 Post-Training Quantization
248(14)
11.3.1 Quantization Range Setting
249(2)
11.3.2 Cross-Layer Equalization
251(4)
11.3.3 Bias Correction
255(1)
11.3.4 AdaRound
256(4)
11.3.5 Standard PTQ Pipeline
260(1)
11.3.6 Experiments
261(1)
11.4 Quantization-Aware Training
262(9)
11.4.1 Simulating Quantization for Backward Path
263(2)
11.4.2 Batch Normalization Folding and QAT
265(2)
11.4.3 Initialization for QAT
267(1)
11.4.4 Standard QAT Pipeline
268(2)
11.4.5 Experiments
270(1)
11.5 Summary And Conclusions
271(2)
Chapter 12 Building Efficient Mobile Architectures
273(18)
Mark Sandler
Andrew Howard
12.1 Introduction
274(2)
12.2 Architecture Parameterizations
276(5)
12.2.1 Network Width Multiplier
277(1)
12.2.2 Input Resolution Multiplier
277(1)
12.2.3 Data and Internal Resolution
278(1)
12.2.4 Network Depth Multiplier
279(1)
12.2.5 Adjusting Multipliers for Multi-criteria Optimizations
280(1)
12.3 Optimizing Early Layers
281(2)
12.4 Optimizing The Final Layers
283(2)
12.4.1 Adjusting the Resolution of the Final Spatial Layer
283(1)
12.4.2 Reducing the Size of the Embedding Layer
284(1)
12.5 Adjusting Non-Linearities: H-Swish And H-Sigmoid
285(2)
12.6 Putting It All Together
287(4)
Chapter 13 A Survey of Quantization Methods for Efficient Neural Network Inference
291(36)
Amir Gholami
Sehoon Kim
Zhen Dong
Zhewei Yao
Michael W. Mahoney
Kurt Keutzer
13.1 Introduction
292(4)
13.2 General History Of Quantization
296(2)
13.3 Basic Concepts Of Quantization
298(15)
13.3.1 Problem Setup and Notations
299(1)
13.3.2 Uniform Quantization
299(1)
13.3.3 Symmetric and Asymmetric Quantization
300(2)
13.3.4 Range Calibration Algorithms: Static vs. Dynamic Quantization
302(1)
13.3.5 Quantization Granularity
303(2)
13.3.6 Non-Uniform Quantization
305(1)
13.3.7 Fine-tuning Methods
306(1)
13.3.7.1 Quantization-Aware Training
306(3)
13.3.7.2 Post-Training Quantization
309(1)
13.3.7.3 Zero-shot Quantization
310(2)
13.3.8 Stochastic Quantization
312(1)
13.4 Advanced Concepts: Quantization Below 8 BITS
313(9)
13.4.1 Simulated and Integer-only Quantization
313(2)
13.4.2 Mixed-Precision Quantization
315(2)
13.4.3 Hardware Aware Quantization
317(1)
13.4.4 Distillation-Assisted Quantization
317(1)
13.4.5 Extreme Quantization
318(3)
13.4.6 Vector Quantization
321(1)
13.5 Quantization And Hardware Processors
322(1)
13.6 Future Directions For Research In Quantization
323(2)
13.7 Summary And Conclusions
325(2)
Bibliography 327(76)
Index 403
George K. Thiruvathukal is a professor of Computer Science at Loyola University Chicago, Illinois, USA. He is also a visiting faculty at Argonne National Laboratory. His research areas include high performance and distributed computing, software engineering, and programming languages.

Yung-Hsiang Lu is a professor of Electrical and Computer Engineering at Purdue University, Indiana, USA. He is the first director of Purdues John Martinson Engineering Entrepreneurial Center. He is a fellow of the IEEE and distinguished scientist of the ACM. His research interests include computer vision, mobile systems, and cloud computing.

Jaeyoun Kim is a technical program manager at Google, California, USA. He leads AI research projects, including MobileNets and TensorFlow Model Garden, to build state-of-the-art machine learning models and modeling libraries for computer vision and natural language processing.

Yiran Chen is a professor of Electrical and Computer Engineering at Duke University, North Carolina, USA. He is a fellow of the ACM and the IEEE. His research areas include new memory and storage systems, machine learning and neuromorphic computing, and mobile computing systems.

Bo Chen is the Director of AutoML at DJI, Guangdong, China. Before joining DJI, he was a researcher at Google, California, USA. His research interests are the optimization of neural network software and hardware as well as landing AI technology in products with stringent resource constraints.