Preface |
|
v | |
|
|
xiii | |
|
|
xv | |
|
1 Mathematical foundations |
|
|
1 | (40) |
|
|
1 | (3) |
|
|
1 | (2) |
|
|
3 | (1) |
|
|
4 | (1) |
|
|
5 | (7) |
|
|
7 | (1) |
|
|
7 | (1) |
|
Square and invertible matrices |
|
|
8 | (1) |
|
Eigenvalues and eigenvectors |
|
|
9 | (1) |
|
|
10 | (2) |
|
1.4 Probability and stochastic processes |
|
|
12 | (16) |
|
Sample spaces, events, measures and distributions |
|
|
12 | (2) |
|
Joint random variables: independence, conditionals, and marginals |
|
|
14 | (2) |
|
|
16 | (1) |
|
Expectation, generating functions and characteristic functions |
|
|
17 | (2) |
|
Empirical distribution function and sample expectations |
|
|
19 | (1) |
|
Transforming random variables |
|
|
20 | (1) |
|
Multivariate Gaussian and other limiting distributions |
|
|
21 | (2) |
|
|
23 | (2) |
|
|
25 | (3) |
|
1.5 Data compression and information theory |
|
|
28 | (6) |
|
The importance of the information map |
|
|
31 | (1) |
|
Mutual information and Kullback-Leibler (K-L) divergence |
|
|
32 | (2) |
|
|
34 | (2) |
|
|
35 | (1) |
|
|
36 | (1) |
|
1.8 Computational complexity |
|
|
37 | (4) |
|
Complexity order classes and big-O notation |
|
|
38 | (1) |
|
Tractable versus intractable problems: NP-completeness |
|
|
38 | (3) |
|
|
41 | (30) |
|
|
41 | (7) |
|
Continuous differentiable problems and critical points |
|
|
41 | (1) |
|
Continuous optimization under equality constraints: Lagrange multipliers |
|
|
42 | (2) |
|
Inequality constraints: duality and the Karush-Kuhn-Tucker conditions |
|
|
44 | (1) |
|
Convergence and convergence rates for iterative methods |
|
|
45 | (1) |
|
Non-differentiable continuous problems |
|
|
46 | (1) |
|
Discrete (combinatorial) optimization problems |
|
|
47 | (1) |
|
2.2 Analytical methods for continuous convex problems |
|
|
48 | (3) |
|
L2-norm objective functions |
|
|
49 | (1) |
|
Mixed L2-L1 norm objective functions |
|
|
50 | (1) |
|
2.3 Numerical methods for continuous convex problems |
|
|
51 | (8) |
|
Iteratively reweighted least squares (IRLS) |
|
|
51 | (2) |
|
|
53 | (1) |
|
Adapting the step sizes: line search |
|
|
54 | (2) |
|
|
56 | (2) |
|
Other gradient descent methods |
|
|
58 | (1) |
|
2.4 Non-differentiable continuous convex problems |
|
|
59 | (6) |
|
|
59 | (1) |
|
|
60 | (1) |
|
|
60 | (2) |
|
Primal-dual interior-point methods |
|
|
62 | (2) |
|
|
64 | (1) |
|
2.5 Continuous non-convex problems |
|
|
65 | (1) |
|
2.6 Heuristics for discrete (combinatorial) optimization |
|
|
66 | (5) |
|
|
67 | (1) |
|
|
67 | (1) |
|
|
68 | (1) |
|
|
69 | (2) |
|
|
71 | (22) |
|
3.1 Generating (uniform) random numbers |
|
|
71 | (1) |
|
3.2 Sampling from continuous distributions |
|
|
72 | (7) |
|
Quantile function (inverse CDF) and inverse transform sampling |
|
|
72 | (2) |
|
Random variable transformation methods |
|
|
74 | (1) |
|
|
74 | (1) |
|
Adaptive rejection sampling (ARS) for log-concave densities |
|
|
75 | (3) |
|
Special methods for particular distributions |
|
|
78 | (1) |
|
3.3 Sampling from discrete distributions |
|
|
79 | (2) |
|
Inverse transform sampling by sequential search |
|
|
79 | (1) |
|
Rejection sampling for discrete variables |
|
|
80 | (1) |
|
Binary search inversion for (large) finite sample spaces |
|
|
81 | (1) |
|
3.4 Sampling from general multivariate distributions |
|
|
81 | (12) |
|
|
82 | (1) |
|
|
83 | (2) |
|
|
85 | (3) |
|
|
88 | (5) |
|
4 Statistical modelling and inference |
|
|
93 | (40) |
|
|
93 | (2) |
|
Parametric versus nonparametric models |
|
|
93 | (1) |
|
Bayesian and non-Bayesian models |
|
|
94 | (1) |
|
4.2 Optimal probability inferences |
|
|
95 | (13) |
|
Maximum likelihood and minimum K-L divergence |
|
|
95 | (3) |
|
Loss functions and empirical risk estimation |
|
|
98 | (1) |
|
Maximum a-posteriori and regularization |
|
|
99 | (2) |
|
Regularization, model complexity and data compression |
|
|
101 | (4) |
|
Cross-validation and regularization |
|
|
105 | (2) |
|
|
107 | (1) |
|
|
108 | (2) |
|
4.4 Distributions associated with metrics and norms |
|
|
110 | (5) |
|
|
111 | (1) |
|
|
111 | (1) |
|
Covariance, weighted norms and Mahalanobis distance |
|
|
112 | (3) |
|
4.5 The exponential family (EF) |
|
|
115 | (9) |
|
Maximum entropy distributions |
|
|
115 | (1) |
|
Sufficient statistics and canonical EFs |
|
|
116 | (2) |
|
|
118 | (4) |
|
Prior and posterior predictive EFs |
|
|
122 | (1) |
|
Conjugate EF prior mixtures |
|
|
123 | (1) |
|
4.6 Distributions defined through quantiles |
|
|
124 | (2) |
|
4.7 Densities associated with piecewise linear loss functions |
|
|
126 | (3) |
|
4.8 Nonparametric density estimation |
|
|
129 | (1) |
|
4.9 Inference by sampling |
|
|
130 | (3) |
|
|
130 | (1) |
|
Assessing convergence in MCMC methods |
|
|
130 | (3) |
|
5 Probabilistic graphical models |
|
|
133 | (16) |
|
5.1 Statistical modelling with PGMs |
|
|
133 | (3) |
|
5.2 Exploring conditional independence in PGMs |
|
|
136 | (3) |
|
Hidden versus observed variables |
|
|
136 | (1) |
|
Directed connection and separation |
|
|
137 | (1) |
|
The Markov blanket of a node |
|
|
138 | (1) |
|
|
139 | (10) |
|
|
140 | (3) |
|
|
143 | (6) |
|
6 Statistical machine learning |
|
|
149 | (38) |
|
6.1 Feature and kernel functions |
|
|
149 | (1) |
|
|
150 | (4) |
|
Gibbs sampling for the mixture model |
|
|
150 | (2) |
|
|
152 | (2) |
|
|
154 | (8) |
|
Quadratic and linear discriminant analysis (QDA and LDA) |
|
|
155 | (1) |
|
|
156 | (2) |
|
Support vector machines (SVM) |
|
|
158 | (3) |
|
Classification loss functions and misclassification count |
|
|
161 | (1) |
|
Which classifier to choose? |
|
|
161 | (1) |
|
|
162 | (9) |
|
|
162 | (1) |
|
Bayesian and regularized linear regression |
|
|
163 | (1) |
|
Linear-in parameters regression |
|
|
164 | (1) |
|
Generalized linear models (GLMs) |
|
|
165 | (2) |
|
Nonparametric, nonlinear regression |
|
|
167 | (2) |
|
|
169 | (2) |
|
|
171 | (7) |
|
|
171 | (3) |
|
Soft K-means, mean shift and variants |
|
|
174 | (2) |
|
Semi-supervised clustering and classification |
|
|
176 | (1) |
|
Choosing the number of clusters |
|
|
177 | (1) |
|
|
178 | (1) |
|
6.6 Dimensionality reduction |
|
|
178 | (9) |
|
Principal components analysis (PCA) |
|
|
179 | (3) |
|
|
182 | (2) |
|
Nonlinear dimensionality reduction |
|
|
184 | (3) |
|
7 Linear-Gaussian systems and signal processing |
|
|
187 | (78) |
|
|
187 | (4) |
|
Delta signals and related functions |
|
|
187 | (2) |
|
Complex numbers, the unit root and complex exponentials |
|
|
189 | (1) |
|
Marginals and conditionals of linear-Gaussian models |
|
|
190 | (1) |
|
7.2 Linear, time-invariant (LTI) systems |
|
|
191 | (21) |
|
Convolution and impulse response |
|
|
191 | (1) |
|
The discrete-time Fourier transform (DTFT) |
|
|
192 | (6) |
|
Finite-length, periodic signals: the discrete Fourier transform (DFT) |
|
|
198 | (3) |
|
Continuous-time LTI systems |
|
|
201 | (2) |
|
|
203 | (2) |
|
|
205 | (1) |
|
Transfer function analysis of discrete-time LTI systems |
|
|
206 | (2) |
|
Fast Fourier transforms (FFT) |
|
|
208 | (4) |
|
7.3 LTI signal processing |
|
|
212 | (14) |
|
Rational filter design: FIR, IIR filtering |
|
|
212 | (8) |
|
|
220 | (2) |
|
Fourier filtering of very long signals |
|
|
222 | (2) |
|
Kernel regression as discrete convolution |
|
|
224 | (2) |
|
7.4 Exploiting statistical stability for linear-Gaussian DSP |
|
|
226 | (16) |
|
Discrete-time Gaussian processes (GPs) and DSP |
|
|
226 | (5) |
|
Nonparametric power spectral density (PSD) estimation |
|
|
231 | (5) |
|
Parametric PSD estimation |
|
|
236 | (2) |
|
Subspace analysis: using PCA in DSP |
|
|
238 | (4) |
|
7.5 The Kalman filter (KF) |
|
|
242 | (10) |
|
Junction tree algorithm (JT) for KF computations |
|
|
243 | (1) |
|
|
244 | (2) |
|
|
246 | (1) |
|
Incomplete data likelihood |
|
|
247 | (1) |
|
|
247 | (2) |
|
Baum-Welch parameter estimation |
|
|
249 | (2) |
|
Kalman filtering as signal subspace analysis |
|
|
251 | (1) |
|
7.6 Time-varying linear systems |
|
|
252 | (13) |
|
Short-time Fourier transform (STFT) and perfect reconstruction |
|
|
253 | (2) |
|
Continuous-time wavelet transforms (CWT) |
|
|
255 | (2) |
|
Discretization and the discrete wavelet transform (DWT) |
|
|
257 | (4) |
|
|
261 | (1) |
|
|
262 | (3) |
|
8 Discrete signals: sampling, quantization and coding |
|
|
265 | (34) |
|
8.1 Discrete-time sampling |
|
|
266 | (7) |
|
|
267 | (1) |
|
Uniform bandlimited sampling: Shannon-Whittaker interpolation |
|
|
267 | (3) |
|
Generalized uniform sampling |
|
|
270 | (3) |
|
|
273 | (15) |
|
|
275 | (3) |
|
Lloyd-Max and entropy-constrained quantizer design |
|
|
278 | (4) |
|
Statistical quantization and dithering |
|
|
282 | (4) |
|
|
286 | (2) |
|
8.3 Lossy signal compression |
|
|
288 | (5) |
|
|
288 | (1) |
|
Linear predictive coding (LPC) |
|
|
289 | (2) |
|
|
291 | (2) |
|
8.4 Compressive sensing (CS) |
|
|
293 | (6) |
|
|
294 | (1) |
|
Exact reconstruction by convex optimization |
|
|
295 | (1) |
|
Compressive sensing in practice |
|
|
296 | (3) |
|
9 Nonlinear and non-Gaussian signal processing |
|
|
299 | (14) |
|
9.1 Running window filters |
|
|
299 | (3) |
|
Maximum likelihood filters |
|
|
300 | (1) |
|
|
301 | (1) |
|
|
302 | (1) |
|
9.3 Global nonlinear filtering |
|
|
302 | (2) |
|
9.4 Hidden Markov models (HMMs) |
|
|
304 | (7) |
|
Junction tree (JT) for efficient HMM computations |
|
|
305 | (1) |
|
|
306 | (1) |
|
Baum-Welch parameter estimation |
|
|
306 | (3) |
|
Model evaluation and structured data classification |
|
|
309 | (1) |
|
Viterbi parameter estimation |
|
|
309 | (1) |
|
Avoiding numerical underflow in message passing |
|
|
310 | (1) |
|
9.5 Homomorphic signal processing |
|
|
311 | (2) |
|
10 Nonparametric Bayesian machine learning and signal processing |
|
|
313 | (32) |
|
|
313 | (5) |
|
Exchangeability and de Finetti's theorem |
|
|
314 | (2) |
|
Representations of stochastic processes |
|
|
316 | (1) |
|
Partitions and equivalence classes |
|
|
317 | (1) |
|
10.2 Gaussian processes (GP) |
|
|
318 | (9) |
|
From basis regression to kernel regression |
|
|
318 | (1) |
|
Distributions over function spaces: GPs |
|
|
319 | (2) |
|
Bayesian GP kernel regression |
|
|
321 | (4) |
|
GP regression and Wiener filtering |
|
|
325 | (1) |
|
|
326 | (1) |
|
10.3 Dirichlet processes (DP) |
|
|
327 | (18) |
|
The Dirichlet distribution: canonical prior for the categorical distribution |
|
|
328 | (3) |
|
Defining the Dirichlet and related processes |
|
|
331 | (3) |
|
Infinite mixture models (DPMMs) |
|
|
334 | (9) |
|
Can DP-based models actually infer the number of components? |
|
|
343 | (2) |
Bibliography |
|
345 | (8) |
Index |
|
353 | |