Foreword |
|
xvii | |
Acknowledgments |
|
xxi | |
Introduction |
|
xxiii | |
Who Is This Book For? |
|
xxiv | |
About This Book |
|
xxiv | |
|
|
1 | (16) |
|
|
2 | (1) |
|
|
2 | (1) |
|
|
3 | (1) |
|
|
3 | (1) |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
7 | (1) |
|
|
7 | (3) |
|
Reading and Writing to Disk |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
12 | (2) |
|
|
14 | (1) |
|
|
15 | (2) |
|
|
17 | (24) |
|
|
18 | (1) |
|
|
18 | (1) |
|
|
19 | (1) |
|
Humans Are Bad at Probability |
|
|
19 | (2) |
|
|
21 | (1) |
|
|
21 | (3) |
|
|
24 | (1) |
|
|
25 | (1) |
|
|
25 | (1) |
|
|
26 | (4) |
|
|
30 | (1) |
|
|
31 | (1) |
|
Joint and Marginal Probability |
|
|
32 | (1) |
|
|
33 | (4) |
|
Chain Rule for Probability |
|
|
37 | (2) |
|
|
39 | (2) |
|
|
41 | (26) |
|
Probability Distributions |
|
|
41 | (1) |
|
Histograms and Probabilities |
|
|
42 | (3) |
|
Discrete Probability Distributions |
|
|
45 | (6) |
|
Continuous Probability Distributions |
|
|
51 | (4) |
|
|
55 | (3) |
|
|
58 | (1) |
|
|
59 | (1) |
|
|
60 | (1) |
|
|
61 | (1) |
|
Bayes' Theorem in Machine Learning |
|
|
62 | (3) |
|
|
65 | (2) |
|
|
67 | (36) |
|
|
68 | (1) |
|
|
68 | (1) |
|
|
68 | (1) |
|
|
68 | (1) |
|
|
68 | (1) |
|
Using Nominal Data in Deep Learning |
|
|
69 | (1) |
|
|
70 | (1) |
|
|
70 | (4) |
|
|
74 | (4) |
|
|
78 | (5) |
|
|
83 | (2) |
|
|
85 | (1) |
|
|
86 | (4) |
|
|
90 | (2) |
|
|
92 | (1) |
|
|
93 | (2) |
|
|
95 | (4) |
|
|
99 | (3) |
|
|
102 | (1) |
|
|
103 | (24) |
|
Scalars, Vectors, Matrices, and Tensors |
|
|
104 | (1) |
|
|
104 | (1) |
|
|
104 | (1) |
|
|
105 | (1) |
|
|
106 | (3) |
|
|
109 | (1) |
|
|
109 | (2) |
|
|
111 | (9) |
|
|
120 | (5) |
|
|
125 | (1) |
|
|
126 | (1) |
|
|
127 | (36) |
|
|
128 | (1) |
|
|
128 | (1) |
|
Transpose, Trace, and Powers |
|
|
129 | (2) |
|
|
131 | (1) |
|
|
132 | (2) |
|
|
134 | (3) |
|
|
137 | (2) |
|
Symmetric, Orthogonal, and Unitary Matrices |
|
|
139 | (1) |
|
Definiteness of a Symmetric Matrix |
|
|
140 | (1) |
|
Eigenvectors and Eigenvalues |
|
|
141 | (1) |
|
Finding Eigenvalues and Eigenvectors |
|
|
141 | (3) |
|
Vector Norms and Distance Metrics |
|
|
144 | (1) |
|
L-Norms and Distance Metrics |
|
|
145 | (1) |
|
|
146 | (2) |
|
|
148 | (3) |
|
Kullback-Leibler Divergence |
|
|
151 | (2) |
|
Principal Component Analysis |
|
|
153 | (4) |
|
Singular Value Decomposition and Pseudoinverse |
|
|
157 | (1) |
|
|
158 | (1) |
|
|
159 | (2) |
|
|
161 | (2) |
|
|
163 | (30) |
|
|
164 | (1) |
|
|
165 | (1) |
|
|
165 | (2) |
|
|
167 | (5) |
|
Rules for Trigonometric Functions |
|
|
172 | (3) |
|
Rules for Exponentials and Logarithms |
|
|
175 | (2) |
|
Minima and Maxima of Functions |
|
|
177 | (4) |
|
|
181 | (2) |
|
Mixed Partial Derivatives |
|
|
183 | (1) |
|
The Chain Rule for Partial Derivatives |
|
|
184 | (2) |
|
|
186 | (1) |
|
|
186 | (3) |
|
|
189 | (2) |
|
|
191 | (2) |
|
|
193 | (28) |
|
|
194 | (1) |
|
A Vector Function by a Scalar Argument |
|
|
194 | (2) |
|
A Scalar Function by a Vector Argument |
|
|
196 | (1) |
|
A Vector Function by a Vector |
|
|
197 | (1) |
|
A Matrix Function by a Scalar |
|
|
198 | (1) |
|
A Scalar Function by a Matrix |
|
|
198 | (1) |
|
|
199 | (1) |
|
A Scalar Function by a Vector |
|
|
199 | (3) |
|
A Vector Function by a Scalar |
|
|
202 | (1) |
|
A Vector Function by a Vector |
|
|
203 | (1) |
|
A Scalar Function by a Matrix |
|
|
203 | (2) |
|
|
205 | (1) |
|
|
205 | (6) |
|
|
211 | (6) |
|
Some Examples of Matrix Calculus Derivatives |
|
|
217 | (1) |
|
Derivative of Element-Wise Operations |
|
|
217 | (1) |
|
Derivative of the Activation Function |
|
|
218 | (2) |
|
|
220 | (1) |
|
9 Data Flow In Neural Networks |
|
|
221 | (22) |
|
|
222 | (1) |
|
Traditional Neural Networks |
|
|
222 | (1) |
|
Deep Convolutional Networks |
|
|
223 | (2) |
|
Data Flow in Traditional Neural Networks |
|
|
225 | (4) |
|
Data Flow in Convolutional Neural Networks |
|
|
229 | (1) |
|
|
229 | (5) |
|
|
234 | (3) |
|
|
237 | (2) |
|
|
239 | (1) |
|
Data Flow Through a Convolutional Neural Network |
|
|
239 | (3) |
|
|
242 | (1) |
|
|
243 | (28) |
|
|
244 | (1) |
|
|
245 | (1) |
|
Calculating the Partial Derivatives |
|
|
246 | (3) |
|
|
249 | (4) |
|
Training and Testing the Model |
|
|
253 | (1) |
|
Backpropagation for Fully Connected Networks |
|
|
254 | (1) |
|
Backpropagating the Error |
|
|
255 | (3) |
|
Calculating Partial Derivatives of the Weights and Biases |
|
|
258 | (2) |
|
|
260 | (4) |
|
|
264 | (3) |
|
|
267 | (2) |
|
|
269 | (2) |
|
|
271 | (32) |
|
|
272 | (1) |
|
Gradient Descent in One Dimension |
|
|
272 | (4) |
|
Gradient Descent in Two Dimensions |
|
|
276 | (6) |
|
Stochastic Gradient Descent |
|
|
282 | (2) |
|
|
284 | (1) |
|
|
284 | (1) |
|
|
285 | (2) |
|
|
287 | (2) |
|
Training Models with Momentum |
|
|
289 | (5) |
|
|
294 | (3) |
|
Adaptive Gradient Descent |
|
|
297 | (1) |
|
|
297 | (2) |
|
|
299 | (1) |
|
|
300 | (1) |
|
Some Thoughts About Optimizers |
|
|
301 | (2) |
|
|
303 | (1) |
Epilogue |
|
303 | (2) |
Appendix: Going Further |
|
305 | (1) |
Probability and Statistics |
|
305 | (1) |
Linear Algebra |
|
306 | (1) |
Calculus |
|
306 | (1) |
Deep Learning |
|
307 | (2) |
Index |
|
309 | |