Preface |
|
xi | |
Acknowledgments |
|
xii | |
About this book |
|
xiv | |
About the authors |
|
xvii | |
About the cover illustration |
|
xviii | |
|
PART 1 BASICS OF DEEP LEARNING |
|
|
1 | (90) |
|
1 Introduction to probabilistic deep learning |
|
|
3 | (22) |
|
1.1 A first look at probabilistic models |
|
|
4 | (2) |
|
1.2 A first brief look at deep learning (DL) |
|
|
6 | (2) |
|
|
8 | (1) |
|
|
8 | (8) |
|
Traditional approach to image classification |
|
|
9 | (3) |
|
Deep learning approach to image classification |
|
|
12 | (2) |
|
Non-probabilistic classification |
|
|
14 | (1) |
|
Probabilistic classification |
|
|
14 | (2) |
|
Bayesian probabilistic classification |
|
|
16 | (1) |
|
|
16 | (5) |
|
Non-probabilistic curve fitting |
|
|
17 | (1) |
|
Probabilistic curve fitting |
|
|
18 | (2) |
|
Bayesian probabilistic curve fitting |
|
|
20 | (1) |
|
1.5 When to use and when not to use DL? |
|
|
21 | (2) |
|
|
21 | (1) |
|
|
22 | (1) |
|
When to use and when not to use probabilistic models |
|
|
22 | (1) |
|
1.6 What you'll learn in this book |
|
|
23 | (2) |
|
2 Neural network architectures |
|
|
25 | (37) |
|
2.1 Fully connected neural networks (fcNNs) |
|
|
26 | (18) |
|
The biology that inspired the design of artificial NNs |
|
|
26 | (2) |
|
Getting started with implementing an NN |
|
|
28 | (10) |
|
Using a fully connected NN (fcNN) to classify images |
|
|
38 | (6) |
|
2.2 Convolutional NNs for image-like data |
|
|
44 | (12) |
|
Main ideas in a CNN architecture |
|
|
44 | (3) |
|
A minimal CNN for edge lovers |
|
|
47 | (3) |
|
Biological inspiration for a CNN architecture |
|
|
50 | (2) |
|
Building and understanding a CNN |
|
|
52 | (4) |
|
2.3 One-dimensional CNNs for ordered data |
|
|
56 | (6) |
|
Format of time-ordered data |
|
|
57 | (1) |
|
What's special about ordered data? |
|
|
58 | (1) |
|
Architectures for time-ordered data |
|
|
59 | (3) |
|
3 Principles of curve fitting |
|
|
62 | (29) |
|
3.1 "Hello world" in curve fitting |
|
|
63 | (6) |
|
Fitting a linear regression model based on a loss function |
|
|
65 | (4) |
|
3.2 Gradient descent method |
|
|
69 | (9) |
|
Loss with one free model parameter |
|
|
69 | (4) |
|
Loss with two free model parameters |
|
|
73 | (5) |
|
|
78 | (2) |
|
Mini-batch gradient descent |
|
|
78 | (1) |
|
Using SGD variants to speed up the learning |
|
|
79 | (1) |
|
Automatic differentiation |
|
|
79 | (1) |
|
3.4 Backpropagation in DL frameworks |
|
|
80 | (11) |
|
|
81 | (7) |
|
|
88 | (3) |
|
PART 2 MAXIMUM LIKELIHOOD APPROACHES FOR PROBABILISTIC DL MODELS |
|
|
91 | (106) |
|
4 Building loss functions with the likelihood approach |
|
|
93 | (35) |
|
4.1 Introduction to the MaxLike principle: The mother of all loss functions |
|
|
94 | (5) |
|
4.2 Deriving a loss function for a classification problem |
|
|
99 | (12) |
|
Binary classification problem |
|
|
99 | (6) |
|
Classification problems with more than two classes |
|
|
105 | (4) |
|
Relationship between NLL, cross entropy, and Kullback-Leibler divergence |
|
|
109 | (2) |
|
4.3 Deriving a loss function for regression problems |
|
|
111 | (17) |
|
Using a NN without hidden layers and one output neuron for modeling a linear relationship between input and output |
|
|
111 | (8) |
|
Using a NN with hidden layers to model non-linear relationships between input and output |
|
|
119 | (2) |
|
Using an NN with additional output for regression tasks with nonconstant variance |
|
|
121 | (7) |
|
5 Probabilistic deep learning models with Tensor Flow Probability |
|
|
128 | (29) |
|
5.1 Evaluating and comparing different probabilistic prediction models |
|
|
130 | (2) |
|
5.2 Introducing TensorFlow Probability (TFP) |
|
|
132 | (3) |
|
5.3 Modeling continuous data with TFP |
|
|
135 | (10) |
|
Fitting and evaluating a linear regression model with constant variance |
|
|
136 | (4) |
|
Fitting and evaluating a linear regression model with a nonconstant standard deviation |
|
|
140 | (5) |
|
5.4 Modeling count data with TensorFlow Probability |
|
|
145 | (12) |
|
The Poisson distribution for count data |
|
|
148 | (5) |
|
Extending the Poisson distribution to a zero-inflated Poisson (TIP) distribution |
|
|
153 | (4) |
|
6 Probabilistic deep learning models in the wild |
|
|
157 | (40) |
|
6.1 Flexible probability distributions in state-of-the-art DL models |
|
|
159 | (6) |
|
Multinomial distribution as a flexible distribution |
|
|
160 | (2) |
|
Making sense of discretized logistic mixture |
|
|
162 | (3) |
|
6.2 Case study: Bavarian roadkills |
|
|
165 | (1) |
|
6.3 Go with the flow: Introduction to normalizing flows (NFs) |
|
|
166 | (31) |
|
The principle idea of NFs |
|
|
168 | (2) |
|
The change of variable technique for probabilities |
|
|
170 | (5) |
|
|
175 | (2) |
|
Going deeper by chaining flows |
|
|
177 | (4) |
|
Transformation between higher dimensional spaces |
|
|
181 | (2) |
|
Using networks to control flows |
|
|
183 | (5) |
|
Fun with flows: Sampling faces |
|
|
188 | (9) |
|
PART 3 BAYESIAN APPROACHES FOR PROBABILISTIC |
|
|
|
|
197 | (32) |
|
7.1 What's wrong with non-Bayesian DL: The elephant in the room |
|
|
198 | (3) |
|
7.2 The first encounter with a Bayesian approach |
|
|
201 | (6) |
|
Bayesian model: The hacker's way |
|
|
202 | (4) |
|
|
206 | (1) |
|
7.3 The Bayesian approach for probabilistic models |
|
|
207 | (22) |
|
Training and prediction with a Bayesian model |
|
|
208 | (16) |
|
A coin toss as a Hello World example for Bayesian models 213* Revisiting the Bayesian linear regression model |
|
|
224 | (5) |
|
8 Bayesian neural networks |
|
|
229 | (35) |
|
8.1 Bayesian neural networks (BNNs) |
|
|
230 | (2) |
|
8.2 Variational inference (VI) as an approximative Bayes approach |
|
|
232 | (11) |
|
Looking under the hood of VI* |
|
|
233 | (5) |
|
Applying VI to the toy problem* |
|
|
238 | (5) |
|
8.3 Variational inference with TensorFlow Probability |
|
|
243 | (2) |
|
8.4 MC dropout as an approximate Bayes approach |
|
|
245 | (7) |
|
Classical dropout used during training |
|
|
246 | (3) |
|
MC dropout used during train and test times |
|
|
249 | (3) |
|
|
252 | (12) |
|
Regression case study on extrapolation |
|
|
252 | (4) |
|
Classification case study with novel classes |
|
|
256 | (8) |
Glossary of terms and abbreviations |
|
264 | (5) |
Index |
|
269 | |