Preface |
|
xv | |
Acknowledgments |
|
xvi | |
About this book |
|
xvii | |
About the author |
|
xx | |
1 Introducing deep learning: why you should learn it |
|
3 | (6) |
|
Welcome to Grokking Deep Learning |
|
|
3 | (1) |
|
Why you should learn deep learning |
|
|
4 | (1) |
|
Will this be difficult to learn? |
|
|
5 | (1) |
|
Why you should read this book |
|
|
5 | (2) |
|
What you need to get started |
|
|
7 | (1) |
|
You'll probably need some Python knowledge |
|
|
8 | (1) |
|
|
8 | (1) |
2 Fundamental concepts: how do machines learn? |
|
9 | (12) |
|
|
10 | (1) |
|
What is machine learning? |
|
|
11 | (1) |
|
Supervised machine learning |
|
|
12 | (1) |
|
Unsupervised machine learning |
|
|
13 | (1) |
|
Parametric vs. nonparametric learning |
|
|
14 | (1) |
|
Supervised parametric learning |
|
|
15 | (2) |
|
Unsupervised parametric learning |
|
|
17 | (1) |
|
|
18 | (1) |
|
|
19 | (2) |
3 Introduction to neural prediction: forward propagation |
|
21 | (26) |
|
|
22 | (2) |
|
A simple neural network making a prediction |
|
|
24 | (1) |
|
What is a neural network? |
|
|
25 | (1) |
|
What does this neural network do? |
|
|
26 | (2) |
|
Making a prediction with multiple inputs |
|
|
28 | (2) |
|
Multiple inputs: What does this neural network do? |
|
|
30 | (5) |
|
Multiple inputs: Complete runnable code |
|
|
35 | (1) |
|
Making a prediction with multiple outputs |
|
|
36 | (2) |
|
Predicting with multiple inputs and outputs |
|
|
38 | (2) |
|
Multiple inputs and outputs: How does it work? |
|
|
40 | (2) |
|
Predicting on predictions |
|
|
42 | (2) |
|
|
44 | (2) |
|
|
46 | (1) |
4 Introduction to neural learning: gradient descent |
|
47 | (32) |
|
Predict, compare, and learn |
|
|
48 | (1) |
|
|
48 | (1) |
|
|
49 | (1) |
|
Compare: Does your network make good predictions? |
|
|
50 | (1) |
|
|
51 | (1) |
|
What's the simplest form of neural learning? |
|
|
52 | (2) |
|
|
54 | (1) |
|
Characteristics of hot and cold learning |
|
|
55 | (1) |
|
Calculating both direction and amount from error |
|
|
56 | (2) |
|
One iteration of gradient descent |
|
|
58 | (2) |
|
Learning is just reducing error |
|
|
60 | (2) |
|
Let's watch several steps of learning |
|
|
62 | (2) |
|
Why does this work? What is weight_delta, really? |
|
|
64 | (2) |
|
Tunnel vision on one concept |
|
|
66 | (1) |
|
A box with rods poking out of it |
|
|
67 | (1) |
|
|
68 | (1) |
|
What you really need to know |
|
|
69 | (1) |
|
What you don't really need to know |
|
|
69 | (1) |
|
How to use a derivative to learn |
|
|
70 | (1) |
|
|
71 | (1) |
|
Breaking gradient descent |
|
|
72 | (1) |
|
Visualizing the overcorrections |
|
|
73 | (1) |
|
|
74 | (1) |
|
|
75 | (1) |
|
|
76 | (1) |
|
|
77 | (2) |
5 Learning multiple weights at a time: generalizing gradient descent |
|
79 | (20) |
|
Gradient descent learning with multiple inputs |
|
|
80 | (2) |
|
Gradient descent with multiple inputs explained |
|
|
82 | (4) |
|
Let's watch several steps of learning |
|
|
86 | (2) |
|
Freezing one weight: What does it do? |
|
|
88 | (2) |
|
Gradient descent learning with multiple outputs |
|
|
90 | (2) |
|
Gradient descent with multiple inputs and outputs |
|
|
92 | (2) |
|
What do these weights learn? |
|
|
94 | (2) |
|
Visualizing weight values |
|
|
96 | (1) |
|
Visualizing dot products (weighted sums) |
|
|
97 | (1) |
|
|
98 | (1) |
6 Building your first deep neural network: introduction to backpropagation |
|
99 | (34) |
|
|
100 | (2) |
|
|
102 | (1) |
|
Matrices and the matrix relationship |
|
|
103 | (3) |
|
Creating a matrix or two in Python |
|
|
106 | (1) |
|
Building a neural network |
|
|
107 | (1) |
|
Learning the whole dataset |
|
|
108 | (1) |
|
Full, batch, and stochastic gradient descent |
|
|
109 | (1) |
|
Neural networks learn correlation |
|
|
110 | (1) |
|
|
111 | (2) |
|
|
113 | (1) |
|
Edge case: Conflicting pressure |
|
|
114 | (2) |
|
Learning indirect correlation |
|
|
116 | (1) |
|
|
117 | (1) |
|
Stacking neural networks: A review |
|
|
118 | (1) |
|
Backpropagation: Long-distance error attribution |
|
|
119 | (1) |
|
Backpropagation: Why does this work? |
|
|
120 | (1) |
|
|
121 | (1) |
|
Why the neural network still doesn't work |
|
|
122 | (1) |
|
The secret to sometimes correlation |
|
|
123 | (1) |
|
|
124 | (1) |
|
Your first deep neural network |
|
|
125 | (1) |
|
|
126 | (2) |
|
One iteration of backpropagation |
|
|
128 | (2) |
|
|
130 | (1) |
|
Why do deep networks matter? |
|
|
131 | (2) |
7 How to picture neural networks: in your head and on paper |
|
133 | (12) |
|
|
134 | (1) |
|
Correlation summarization |
|
|
135 | (1) |
|
The previously overcomplicated visualization |
|
|
136 | (1) |
|
The simplified visualization |
|
|
137 | (1) |
|
|
138 | (1) |
|
Let's see this network predict |
|
|
139 | (1) |
|
Visualizing using letters instead of pictures |
|
|
140 | (1) |
|
|
141 | (1) |
|
|
142 | (1) |
|
The importance of visualization tools |
|
|
143 | (2) |
8 Learning signal and ignoring noise: introduction to regularization and batching |
|
145 | (16) |
|
Three-layer network on MNIST |
|
|
146 | (2) |
|
|
148 | (1) |
|
Memorization vs. generalization |
|
|
149 | (1) |
|
Overfitting in neural networks |
|
|
150 | (1) |
|
Where overfitting comes from |
|
|
151 | (1) |
|
The simplest regularization: Early stopping |
|
|
152 | (1) |
|
Industry standard regularization: Dropout |
|
|
153 | (1) |
|
Why dropout works: Ensembling works |
|
|
154 | (1) |
|
|
155 | (2) |
|
Dropout evaluated on MNIST |
|
|
157 | (1) |
|
|
158 | (2) |
|
|
160 | (1) |
9 Modeling probabilities and nonlinearities: activation functions |
|
161 | (16) |
|
What is an activation function? |
|
|
162 | (3) |
|
Standard hidden-layer activation functions |
|
|
165 | (1) |
|
Standard output layer activation functions |
|
|
166 | (2) |
|
The core issue: Inputs have similarity |
|
|
168 | (1) |
|
|
169 | (1) |
|
Activation installation instructions |
|
|
170 | (2) |
|
Multiplying delta by the slope |
|
|
172 | (1) |
|
Converting output to slope (derivative) |
|
|
173 | (1) |
|
Upgrading the MNIST network |
|
|
174 | (3) |
10 Neural learning about edges and corners: intro to convolutional neural networks |
|
177 | (10) |
|
Reusing weights in multiple places |
|
|
178 | (1) |
|
|
179 | (2) |
|
A simple implementation in NumPy |
|
|
181 | (4) |
|
|
185 | (2) |
11 Neural networks that understand language: king - man + woman == ? |
|
187 | (22) |
|
What does it mean to understand language? |
|
|
188 | (1) |
|
Natural language processing (NLP) |
|
|
189 | (1) |
|
|
190 | (1) |
|
IMDB movie reviews dataset |
|
|
191 | (1) |
|
Capturing word correlation in input data |
|
|
192 | (1) |
|
|
193 | (1) |
|
Intro to an embedding layer |
|
|
194 | (2) |
|
|
196 | (1) |
|
|
197 | (2) |
|
Comparing word embeddings |
|
|
199 | (1) |
|
What is the meaning of a neuron? |
|
|
200 | (1) |
|
|
201 | (2) |
|
Meaning is derived from loss |
|
|
203 | (3) |
|
King - Man + Woman almost = Queen |
|
|
206 | (1) |
|
|
207 | (1) |
|
|
208 | (1) |
12 Neural networks that write like Shakespeare: recurrent layers for variable-length data |
|
209 | (22) |
|
The challenge of arbitrary length |
|
|
210 | (1) |
|
Do comparisons really matter? |
|
|
211 | (1) |
|
The surprising power of averaged word vectors |
|
|
212 | (1) |
|
How is information stored in these embeddings? |
|
|
213 | (1) |
|
How does a neural network use embeddings? |
|
|
214 | (1) |
|
The limitations of bag-of-words vectors |
|
|
215 | (1) |
|
Using identity vectors to sum word embeddings |
|
|
216 | (1) |
|
Matrices that change absolutely nothing |
|
|
217 | (1) |
|
Learning the transition matrices |
|
|
218 | (1) |
|
Learning to create useful sentence vectors |
|
|
219 | (1) |
|
Forward propagation in Python |
|
|
220 | (1) |
|
How do you backpropagate into this? |
|
|
221 | (1) |
|
|
222 | (1) |
|
|
223 | (1) |
|
Forward propagation with arbitrary length |
|
|
224 | (1) |
|
Backpropagation with arbitrary length |
|
|
225 | (1) |
|
Weight update with arbitrary length |
|
|
226 | (1) |
|
Execution and output analysis |
|
|
227 | (2) |
|
|
229 | (2) |
13 Introducing automatic optimization: let's build a deep learning framework |
|
231 | (34) |
|
What is a deep learning framework? |
|
|
232 | (1) |
|
|
233 | (1) |
|
Introduction to automatic gradient computation (autograd) |
|
|
234 | (2) |
|
|
236 | (1) |
|
Tensors that are used multiple times |
|
|
237 | (1) |
|
Upgrading autograd to support multiuse tensors |
|
|
238 | (2) |
|
How does addition backpropagation work? |
|
|
240 | (1) |
|
Adding support for negation |
|
|
241 | (1) |
|
Adding support for additional functions |
|
|
242 | (4) |
|
Using autograd to train a neural network |
|
|
246 | (2) |
|
Adding automatic optimization |
|
|
248 | (1) |
|
Adding support for layer types |
|
|
249 | (1) |
|
Layers that contain layers |
|
|
250 | (1) |
|
|
251 | (1) |
|
|
252 | (1) |
|
|
253 | (2) |
|
|
255 | (1) |
|
Adding indexing to autograd |
|
|
256 | (1) |
|
The embedding layer (revisited) |
|
|
257 | (1) |
|
|
258 | (2) |
|
The recurrent neural network layer |
|
|
260 | (3) |
|
|
263 | (2) |
14 Learning to write like Shakespeare: long short-term memory |
|
265 | (16) |
|
Character language modeling |
|
|
266 | (1) |
|
The need for truncated backpropagation |
|
|
267 | (1) |
|
Truncated backpropagation |
|
|
268 | (3) |
|
|
271 | (1) |
|
Vanishing and exploding gradients |
|
|
272 | (1) |
|
A toy example of RNN backpropagation |
|
|
273 | (1) |
|
Long short-term memory (LSTM) cells |
|
|
274 | (1) |
|
Some intuition about LSTM gates |
|
|
275 | (1) |
|
The long short-term memory layer |
|
|
276 | (1) |
|
Upgrading the character language model |
|
|
277 | (1) |
|
Training the LSTM character language model |
|
|
278 | (1) |
|
Tuning the LSTM character language model |
|
|
279 | (1) |
|
|
280 | (1) |
15 Deep learning on unseen data: introducing federated learning |
|
281 | (12) |
|
The problem of privacy in deep learning |
|
|
282 | (1) |
|
|
283 | (1) |
|
|
284 | (2) |
|
|
286 | (1) |
|
Hacking into federated learning |
|
|
287 | (1) |
|
|
288 | (1) |
|
|
289 | (1) |
|
Homomorphically encrypted federated learning |
|
|
290 | (1) |
|
|
291 | (2) |
16 Where to go from here: a brief guide |
|
293 | (8) |
|
|
294 | (1) |
|
Step 1: Start learning PyTorch |
|
|
294 | (1) |
|
Step 2: Start another deep learning course |
|
|
295 | (1) |
|
Step 3: Grab a mathy deep learning textbook |
|
|
295 | (1) |
|
Step 4: Start a blog, and teach deep learning |
|
|
296 | (1) |
|
|
297 | (1) |
|
Step 6: Implement academic papers |
|
|
297 | (1) |
|
Step 7: Acquire access to a GPU (or many) |
|
|
297 | (1) |
|
Step 8: Get paid to practice |
|
|
298 | (1) |
|
Step 9: Join an open source project |
|
|
298 | (1) |
|
Step 10: Develop your local community |
|
|
299 | (2) |
Index |
|
301 | |