Muutke küpsiste eelistusi

Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics [Kõva köide]

(Professor of Mathematics, Aston University, Aston University, Birmingham)
  • Formaat: Hardback, 384 pages, kõrgus x laius x paksus: 250x194x25 mm, kaal: 982 g, 77 grayscale and 52 color line figures, 1 color halftone
  • Ilmumisaeg: 13-Aug-2019
  • Kirjastus: Oxford University Press
  • ISBN-10: 0198714939
  • ISBN-13: 9780198714934
  • Formaat: Hardback, 384 pages, kõrgus x laius x paksus: 250x194x25 mm, kaal: 982 g, 77 grayscale and 52 color line figures, 1 color halftone
  • Ilmumisaeg: 13-Aug-2019
  • Kirjastus: Oxford University Press
  • ISBN-10: 0198714939
  • ISBN-13: 9780198714934
This book describes in detail the fundamental mathematics and algorithms of machine learning (an example of artificial intelligence) and signal processing, two of the most important and exciting technologies in the modern information economy. Taking a gradual approach, it builds up concepts in a solid, step-by-step fashion so that the ideas and algorithms can be implemented in practical software applications.

Digital signal processing (DSP) is one of the 'foundational' engineering topics of the modern world, without which technologies such the mobile phone, television, CD and MP3 players, WiFi and radar, would not be possible. A relative newcomer by comparison, statistical machine learning is the theoretical backbone of exciting technologies such as automatic techniques for car registration plate recognition, speech recognition, stock market prediction, defect detection on assembly lines, robot guidance, and autonomous car navigation. Statistical machine learning exploits the analogy between intelligent information processing in biological brains and sophisticated statistical modelling and inference.

DSP and statistical machine learning are of such wide importance to the knowledge economy that both have undergone rapid changes and seen radical improvements in scope and applicability. Both make use of key topics in applied mathematics such as probability and statistics, algebra, calculus, graphs and networks. Intimate formal links between the two subjects exist and because of this many overlaps exist between the two subjects that can be exploited to produce new DSP tools of surprising utility, highly suited to the contemporary world of pervasive digital sensors and high-powered, yet cheap, computing hardware. This book gives a solid mathematical foundation to, and details the key concepts and algorithms in this important topic.

Arvustused

This book provides an excellent pathway for gaining first-class expertise in machine learning. It provides both the technical background that explains why certain approaches, but not others, are best practice in real world problems, and a framework for how to think about and approach new problems. I highly recommend it for people with a signal processing background who are seeking to become an expert in machine learning. * Alex 'Sandy' Pentland, Toshiba Professor of Media Arts and Sciences, Massachusetts Institute of Technology, * Over the past decade in signal processing, machine learning has gone from a disparate research field known only to people working on topics such as speech and image processing, to permeating all aspects of it. With this book, Prof. Little has taken an important step in unifying machine learning and signal processing. As a whole, this book covers many topics, new and old, that are important in their own right and equips the reader with a broader perspective than traditional signal processing textbooks. In particular, I would highlight the combination of statistical modeling, convex optimization, and graphs as particularly potent. Machine learning and signal processing are no longer separate, and there is no doubt in my mind that this is the way to teach signal processing in the future. * Mads Christensen, Full Professor in Audio Processing, Aalborg University, Denmark, * This book gives a solid mathematical foundation to, and details the key concepts and algorithmsin, this important topic. * MathSciNet *

Preface v
List of Algorithms
xiii
List of Figures
xv
1 Mathematical foundations
1(40)
1.1 Abstract algebras
1(3)
Groups
1(2)
Rings
3(1)
1.2 Metrics
4(1)
1.3 Vector spaces
5(7)
Linear operators
7(1)
Matrix algebra
7(1)
Square and invertible matrices
8(1)
Eigenvalues and eigenvectors
9(1)
Special matrices
10(2)
1.4 Probability and stochastic processes
12(16)
Sample spaces, events, measures and distributions
12(2)
Joint random variables: independence, conditionals, and marginals
14(2)
Bayes' rule
16(1)
Expectation, generating functions and characteristic functions
17(2)
Empirical distribution function and sample expectations
19(1)
Transforming random variables
20(1)
Multivariate Gaussian and other limiting distributions
21(2)
Stochastic processes
23(2)
Markov chains
25(3)
1.5 Data compression and information theory
28(6)
The importance of the information map
31(1)
Mutual information and Kullback-Leibler (K-L) divergence
32(2)
1.6 Graphs
34(2)
Special graphs
35(1)
1.7 Convexity
36(1)
1.8 Computational complexity
37(4)
Complexity order classes and big-O notation
38(1)
Tractable versus intractable problems: NP-completeness
38(3)
2 Optimization
41(30)
2.1 Preliminaries
41(7)
Continuous differentiable problems and critical points
41(1)
Continuous optimization under equality constraints: Lagrange multipliers
42(2)
Inequality constraints: duality and the Karush-Kuhn-Tucker conditions
44(1)
Convergence and convergence rates for iterative methods
45(1)
Non-differentiable continuous problems
46(1)
Discrete (combinatorial) optimization problems
47(1)
2.2 Analytical methods for continuous convex problems
48(3)
L2-norm objective functions
49(1)
Mixed L2-L1 norm objective functions
50(1)
2.3 Numerical methods for continuous convex problems
51(8)
Iteratively reweighted least squares (IRLS)
51(2)
Gradient descent
53(1)
Adapting the step sizes: line search
54(2)
Newton's method
56(2)
Other gradient descent methods
58(1)
2.4 Non-differentiable continuous convex problems
59(6)
Linear programming
59(1)
Quadratic programming
60(1)
Subgradient methods
60(2)
Primal-dual interior-point methods
62(2)
Path-following methods
64(1)
2.5 Continuous non-convex problems
65(1)
2.6 Heuristics for discrete (combinatorial) optimization
66(5)
Greedy search
67(1)
(Simple) tabu search
67(1)
Simulated annealing
68(1)
Random restarting
69(2)
3 Random sampling
71(22)
3.1 Generating (uniform) random numbers
71(1)
3.2 Sampling from continuous distributions
72(7)
Quantile function (inverse CDF) and inverse transform sampling
72(2)
Random variable transformation methods
74(1)
Rejection sampling
74(1)
Adaptive rejection sampling (ARS) for log-concave densities
75(3)
Special methods for particular distributions
78(1)
3.3 Sampling from discrete distributions
79(2)
Inverse transform sampling by sequential search
79(1)
Rejection sampling for discrete variables
80(1)
Binary search inversion for (large) finite sample spaces
81(1)
3.4 Sampling from general multivariate distributions
81(12)
Ancestral sampling
82(1)
Gibbs sampling
83(2)
Metropolis-Hastings
85(3)
Other MCMC methods
88(5)
4 Statistical modelling and inference
93(40)
4.1 Statistical models
93(2)
Parametric versus nonparametric models
93(1)
Bayesian and non-Bayesian models
94(1)
4.2 Optimal probability inferences
95(13)
Maximum likelihood and minimum K-L divergence
95(3)
Loss functions and empirical risk estimation
98(1)
Maximum a-posteriori and regularization
99(2)
Regularization, model complexity and data compression
101(4)
Cross-validation and regularization
105(2)
The bootstrap
107(1)
4.3 Bayesian inference
108(2)
4.4 Distributions associated with metrics and norms
110(5)
Least squares
111(1)
Least Lq-norms
111(1)
Covariance, weighted norms and Mahalanobis distance
112(3)
4.5 The exponential family (EF)
115(9)
Maximum entropy distributions
115(1)
Sufficient statistics and canonical EFs
116(2)
Conjugate priors
118(4)
Prior and posterior predictive EFs
122(1)
Conjugate EF prior mixtures
123(1)
4.6 Distributions defined through quantiles
124(2)
4.7 Densities associated with piecewise linear loss functions
126(3)
4.8 Nonparametric density estimation
129(1)
4.9 Inference by sampling
130(3)
MCMC inference
130(1)
Assessing convergence in MCMC methods
130(3)
5 Probabilistic graphical models
133(16)
5.1 Statistical modelling with PGMs
133(3)
5.2 Exploring conditional independence in PGMs
136(3)
Hidden versus observed variables
136(1)
Directed connection and separation
137(1)
The Markov blanket of a node
138(1)
5.3 Inference on PGMs
139(10)
Exact inference
140(3)
Approximate inference
143(6)
6 Statistical machine learning
149(38)
6.1 Feature and kernel functions
149(1)
6.2 Mixture modelling
150(4)
Gibbs sampling for the mixture model
150(2)
E-M for mixture models
152(2)
6.3 Classification
154(8)
Quadratic and linear discriminant analysis (QDA and LDA)
155(1)
Logistic regression
156(2)
Support vector machines (SVM)
158(3)
Classification loss functions and misclassification count
161(1)
Which classifier to choose?
161(1)
6.4 Regression
162(9)
Linear regression
162(1)
Bayesian and regularized linear regression
163(1)
Linear-in parameters regression
164(1)
Generalized linear models (GLMs)
165(2)
Nonparametric, nonlinear regression
167(2)
Variable selection
169(2)
6.5 Clustering
171(7)
If-means and variants
171(3)
Soft K-means, mean shift and variants
174(2)
Semi-supervised clustering and classification
176(1)
Choosing the number of clusters
177(1)
Other clustering methods
178(1)
6.6 Dimensionality reduction
178(9)
Principal components analysis (PCA)
179(3)
Probabilistic PCA (PPCA)
182(2)
Nonlinear dimensionality reduction
184(3)
7 Linear-Gaussian systems and signal processing
187(78)
7.1 Preliminaries
187(4)
Delta signals and related functions
187(2)
Complex numbers, the unit root and complex exponentials
189(1)
Marginals and conditionals of linear-Gaussian models
190(1)
7.2 Linear, time-invariant (LTI) systems
191(21)
Convolution and impulse response
191(1)
The discrete-time Fourier transform (DTFT)
192(6)
Finite-length, periodic signals: the discrete Fourier transform (DFT)
198(3)
Continuous-time LTI systems
201(2)
Heisenberg uncertainty
203(2)
Gibb's phenomena
205(1)
Transfer function analysis of discrete-time LTI systems
206(2)
Fast Fourier transforms (FFT)
208(4)
7.3 LTI signal processing
212(14)
Rational filter design: FIR, IIR filtering
212(8)
Digital filter recipes
220(2)
Fourier filtering of very long signals
222(2)
Kernel regression as discrete convolution
224(2)
7.4 Exploiting statistical stability for linear-Gaussian DSP
226(16)
Discrete-time Gaussian processes (GPs) and DSP
226(5)
Nonparametric power spectral density (PSD) estimation
231(5)
Parametric PSD estimation
236(2)
Subspace analysis: using PCA in DSP
238(4)
7.5 The Kalman filter (KF)
242(10)
Junction tree algorithm (JT) for KF computations
243(1)
Forward filtering
244(2)
Backward smoothing
246(1)
Incomplete data likelihood
247(1)
Viterbi decoding
247(2)
Baum-Welch parameter estimation
249(2)
Kalman filtering as signal subspace analysis
251(1)
7.6 Time-varying linear systems
252(13)
Short-time Fourier transform (STFT) and perfect reconstruction
253(2)
Continuous-time wavelet transforms (CWT)
255(2)
Discretization and the discrete wavelet transform (DWT)
257(4)
Wavelet design
261(1)
Applications of the DWT
262(3)
8 Discrete signals: sampling, quantization and coding
265(34)
8.1 Discrete-time sampling
266(7)
Bandlimited sampling
267(1)
Uniform bandlimited sampling: Shannon-Whittaker interpolation
267(3)
Generalized uniform sampling
270(3)
8.2 Quantization
273(15)
Rate-distortion theory
275(3)
Lloyd-Max and entropy-constrained quantizer design
278(4)
Statistical quantization and dithering
282(4)
Vector quantization
286(2)
8.3 Lossy signal compression
288(5)
Audio companding
288(1)
Linear predictive coding (LPC)
289(2)
Transform coding
291(2)
8.4 Compressive sensing (CS)
293(6)
Sparsity and incoherence
294(1)
Exact reconstruction by convex optimization
295(1)
Compressive sensing in practice
296(3)
9 Nonlinear and non-Gaussian signal processing
299(14)
9.1 Running window filters
299(3)
Maximum likelihood filters
300(1)
Change point detection
301(1)
9.2 Recursive filtering
302(1)
9.3 Global nonlinear filtering
302(2)
9.4 Hidden Markov models (HMMs)
304(7)
Junction tree (JT) for efficient HMM computations
305(1)
Viterbi decoding
306(1)
Baum-Welch parameter estimation
306(3)
Model evaluation and structured data classification
309(1)
Viterbi parameter estimation
309(1)
Avoiding numerical underflow in message passing
310(1)
9.5 Homomorphic signal processing
311(2)
10 Nonparametric Bayesian machine learning and signal processing
313(32)
10.1 Preliminaries
313(5)
Exchangeability and de Finetti's theorem
314(2)
Representations of stochastic processes
316(1)
Partitions and equivalence classes
317(1)
10.2 Gaussian processes (GP)
318(9)
From basis regression to kernel regression
318(1)
Distributions over function spaces: GPs
319(2)
Bayesian GP kernel regression
321(4)
GP regression and Wiener filtering
325(1)
Other GP-related topics
326(1)
10.3 Dirichlet processes (DP)
327(18)
The Dirichlet distribution: canonical prior for the categorical distribution
328(3)
Defining the Dirichlet and related processes
331(3)
Infinite mixture models (DPMMs)
334(9)
Can DP-based models actually infer the number of components?
343(2)
Bibliography 345(8)
Index 353
Max A. Little is Professor of Mathematics at Aston University, UK, and a world-leading expert in signal processing and machine learning. His research in machine learning for digital health is highly influential and is the basis of advances in basic and applied research into quantifying neurological disorders such as Parkinson disease. He has published over 60 articles in the scientific literature on the topic, two patents, and a textbook. He is an advisor to government and leading international corporations in topics such as machine learning for health.