About the Author |
|
xi | |
About the Technical Reviewer |
|
xiii | |
Acknowledgments |
|
xv | |
Introduction |
|
xvii | |
Foreword |
|
xix | |
Chapter 1 Introduction to Computer Vision and Deep Learning |
|
1 | (40) |
|
1.1 Technical requirements |
|
|
2 | (1) |
|
1.2 Image Processing using OpenCV |
|
|
3 | (3) |
|
1.2.1 Color detection using OpenCV |
|
|
4 | (2) |
|
1.3 Shape detection using OpenCV |
|
|
6 | (6) |
|
1.3.1 Face detection using OpenCV |
|
|
9 | (3) |
|
1.4 Fundamentals of Deep Learning |
|
|
12 | (20) |
|
1.4.1 The motivation behind Neural Network |
|
|
14 | (1) |
|
1.4.2 Layers in a Neural Network |
|
|
15 | (1) |
|
|
16 | (1) |
|
|
17 | (1) |
|
1.4.5 Connections and weight of ANN |
|
|
18 | (1) |
|
|
18 | (1) |
|
1.4.7 Activation functions |
|
|
19 | (6) |
|
|
25 | (1) |
|
|
26 | (2) |
|
|
28 | (1) |
|
|
29 | (2) |
|
|
31 | (1) |
|
1.5 How Deep Learning works? |
|
|
32 | (6) |
|
1.5.1 Popular Deep Learning libraries |
|
|
36 | (2) |
|
|
38 | (3) |
|
|
39 | (2) |
Chapter 2 Nuts and Bolts of Deep Learning for Computer Vision |
|
41 | (26) |
|
2.1 Technical requirements |
|
|
42 | (1) |
|
2.2 Deep Learning using TensorFlow and Keras |
|
|
42 | (1) |
|
|
43 | (10) |
|
2.3.1 What is a Convolutional Neural Network? |
|
|
45 | (1) |
|
2.3.2 What is convolution? |
|
|
46 | (5) |
|
2.3.3 What is a Pooling Layer? |
|
|
51 | (1) |
|
2.3.4 What is a Fully Connected Layer? |
|
|
52 | (1) |
|
2.4 Developing a DL solution using CNN |
|
|
53 | (11) |
|
|
64 | (3) |
|
|
66 | (1) |
Chapter 3 Image Classification Using LeNet |
|
67 | (36) |
|
3.1 Technical requirements |
|
|
68 | (1) |
|
3.2 Deep Learning architectures |
|
|
68 | (1) |
|
|
69 | (1) |
|
|
70 | (1) |
|
|
71 | (1) |
|
|
72 | (3) |
|
3.7 Boosted LeNet-4 architecture |
|
|
75 | (1) |
|
3.8 Creating image classification models using LeNet |
|
|
76 | (1) |
|
3.9 MNIST classification using LeNet |
|
|
77 | (7) |
|
3.10 German traffic sign identification using LeNet |
|
|
84 | (16) |
|
|
100 | (3) |
|
|
101 | (2) |
Chapter 4 VGGNet and AlexNet Networks |
|
103 | (38) |
|
4.1 Technical requirements |
|
|
104 | (1) |
|
4.2 AlexNet and VGG Neural Networks |
|
|
104 | (1) |
|
4.3 What is AlexNet Neural Network? |
|
|
105 | (2) |
|
4.4 What is VGG Neural Network? |
|
|
107 | (1) |
|
|
107 | (3) |
|
4.6 Difference between VGG16 and VGG19 |
|
|
110 | (1) |
|
4.7 Developing solutions using AlexNet and VGG |
|
|
111 | (2) |
|
4.8 Working on CIFAR-10 using AlexNet |
|
|
113 | (15) |
|
4.9 Working on CIFAR-10 using VGG |
|
|
128 | (8) |
|
4.10 Comparing AlexNet and VGG |
|
|
136 | (1) |
|
4.11 Working with CIFAR-100 |
|
|
137 | (1) |
|
|
138 | (3) |
|
|
139 | (2) |
Chapter 5 Object Detection Using Deep Learning |
|
141 | (46) |
|
5.1 Technical requirements |
|
|
142 | (1) |
|
|
142 | (4) |
|
5.2.1 Object classification vs. object localization vs. object detection |
|
|
143 | (1) |
|
5.2.2 Use cases of Object Detection |
|
|
144 | (2) |
|
5.3 Object Detection methods |
|
|
146 | (1) |
|
5.4 Deep Learning frameworks for Object Detection |
|
|
147 | (3) |
|
5.4.1 Sliding window approach for Object Detection |
|
|
148 | (2) |
|
5.5 Bounding box approach |
|
|
150 | (2) |
|
5.6 Intersection over Union (IoU) |
|
|
152 | (2) |
|
|
154 | (1) |
|
|
155 | (2) |
|
5.9 Deep Learning architectures |
|
|
157 | (3) |
|
5.9.1 Region-based CNN (R-CNN) |
|
|
157 | (3) |
|
|
160 | (2) |
|
|
162 | (3) |
|
5.12 You Only Look Once (YOLO) |
|
|
165 | (7) |
|
5.12.1 Salient features of YOLO |
|
|
166 | (1) |
|
5.12.2 Loss function in YOLO |
|
|
167 | (2) |
|
|
169 | (3) |
|
5.13 Single Shot MultiBox Detector (SSD) |
|
|
172 | (5) |
|
|
177 | (2) |
|
5.15 Python implementation |
|
|
179 | (3) |
|
|
182 | (5) |
|
|
184 | (3) |
Chapter 6 Face Recognition and Gesture Recognition |
|
187 | (34) |
|
|
188 | (1) |
|
|
188 | (29) |
|
6.2.1 Applications of face recognition |
|
|
190 | (2) |
|
6.2.2 Process of face recognition |
|
|
192 | (2) |
|
6.2.3 DeepFace solution by Facebook |
|
|
194 | (5) |
|
6.2.4 FaceNet for face recognition |
|
|
199 | (7) |
|
6.2.5 Python implementation using FaceNet |
|
|
206 | (2) |
|
6.2.6 Python solution for gesture recognition |
|
|
208 | (9) |
|
|
217 | (4) |
|
|
219 | (2) |
Chapter 7 Video Analytics Using Deep Learning |
|
221 | (36) |
|
|
222 | (1) |
|
|
222 | (1) |
|
7.3 Use cases of video analytics |
|
|
223 | (2) |
|
7.4 Vanishing gradient and exploding gradient problem |
|
|
225 | (5) |
|
|
230 | (13) |
|
7.5.1 ResNet and skip connection |
|
|
230 | (4) |
|
|
234 | (3) |
|
7.5.3 GoogLeNet architecture |
|
|
237 | (2) |
|
7.5.4 Improvements in Inception v2 |
|
|
239 | (4) |
|
|
243 | (1) |
|
7.7 Python solution using ResNet and Inception v3 |
|
|
244 | (10) |
|
|
254 | (3) |
|
|
255 | (2) |
Chapter 8 End-to-End Model Development |
|
257 | (40) |
|
8.1 Technical requirements |
|
|
258 | (1) |
|
8.2 Deep Learning project requirements |
|
|
258 | (4) |
|
8.3 Deep Learning project process |
|
|
262 | (1) |
|
8.4 Business problem definition |
|
|
263 | (7) |
|
8.4.1 Face detection for surveillance |
|
|
265 | (3) |
|
8.4.2 Source data or data discovery phase |
|
|
268 | (2) |
|
8.5 Data ingestion or data management |
|
|
270 | (2) |
|
8.6 Data preparation and augmentation |
|
|
272 | (7) |
|
|
274 | (5) |
|
8.7 Deep Learning modeling process |
|
|
279 | (10) |
|
|
282 | (2) |
|
8.7.2 Common mistakes/challenges and boosting performance |
|
|
284 | (5) |
|
8.8 Model deployment and maintenance |
|
|
289 | (5) |
|
|
294 | (3) |
|
|
296 | (1) |
References |
|
297 | (6) |
|
Major activation functions and layers used in CNN |
|
|
297 | (1) |
|
|
298 | (5) |
Index |
|
303 | |