|
|
xv | |
|
|
1 | (20) |
|
1.1 Classical versus high-dimensional theory |
|
|
1 | (1) |
|
1.2 What can go wrong in high dimensions? |
|
|
2 | (7) |
|
1.2.1 Linear discriminant analysis |
|
|
2 | (3) |
|
1.2.2 Covariance estimation |
|
|
5 | (2) |
|
1.2.3 Nonparametric regression |
|
|
7 | (2) |
|
1.3 What can help us in high dimensions? |
|
|
9 | (5) |
|
1.3.1 Sparsity in vectors |
|
|
10 | (1) |
|
1.3.2 Structure in covariance matrices |
|
|
11 | (1) |
|
1.3.3 Structured forms of regression |
|
|
12 | (2) |
|
1.4 What is the non-asymptotic viewpoint? |
|
|
14 | (1) |
|
|
15 | (4) |
|
1.5.1 Chapter structure and synopses |
|
|
15 | (2) |
|
1.5.2 Recommended background |
|
|
17 | (1) |
|
1.5.3 Teaching possibilities and a flow diagram |
|
|
17 | (2) |
|
1.6 Bibliographic details and background |
|
|
19 | (2) |
|
2 Basic tail and concentration bounds |
|
|
21 | (37) |
|
|
21 | (11) |
|
2.1.1 From Markov to Chernoff |
|
|
21 | (1) |
|
2.1.2 Sub-Gaussian variables and Hoeffding bounds |
|
|
22 | (3) |
|
2.1.3 Sub-exponential variables and Bernstein bounds |
|
|
25 | (6) |
|
2.1.4 Some one-sided results |
|
|
31 | (1) |
|
2.2 Martingale-based methods |
|
|
32 | (8) |
|
|
33 | (2) |
|
2.2.2 Concentration bounds for martingale difference sequences |
|
|
35 | (5) |
|
2.3 Lipschitz functions of Gaussian variables |
|
|
40 | (5) |
|
2.4 Appendix A: Equivalent versions of sub-Gaussian variables |
|
|
45 | (3) |
|
2.5 Appendix B: Equivalent versions of sub-exponential variables |
|
|
48 | (1) |
|
2.6 Bibliographic details and background |
|
|
49 | (1) |
|
|
50 | (8) |
|
3 Concentration of measure |
|
|
58 | (40) |
|
3.1 Concentration by en tropic techniques |
|
|
58 | (9) |
|
3.1.1 Entropy and its properties |
|
|
58 | (2) |
|
3.1.2 Herbst argument and its extensions |
|
|
60 | (2) |
|
3.1.3 Separately convex functions and the entropic method |
|
|
62 | (2) |
|
3.1.4 Tensorization and separately convex functions |
|
|
64 | (3) |
|
3.2 A geometric perspective on concentration |
|
|
67 | (9) |
|
3.2.1 Concentration functions |
|
|
67 | (3) |
|
3.2.2 Connection to Lipschitz functions |
|
|
70 | (2) |
|
3.2.3 From geometry to concentration |
|
|
72 | (4) |
|
3.3 Wasserstein distances and information inequalities |
|
|
76 | (11) |
|
3.3.1 Wasserstein distances |
|
|
76 | (2) |
|
3.3.2 Transportation cost and concentration inequalities |
|
|
78 | (2) |
|
3.3.3 Tensorization for transportation cost |
|
|
80 | (2) |
|
3.3.4 Transportation cost inequalities for Markov chains |
|
|
82 | (2) |
|
3.3.5 Asymmetric coupling cost |
|
|
84 | (3) |
|
3.4 Tail bounds for empirical processes |
|
|
87 | (4) |
|
3.4.1 A functional Hoeffding inequality |
|
|
87 | (2) |
|
3.4.2 A functional Bernstein inequality |
|
|
89 | (2) |
|
3.5 Bibliographic details and background |
|
|
91 | (1) |
|
|
92 | (6) |
|
4 Uniform laws of large numbers |
|
|
98 | (23) |
|
|
98 | (6) |
|
4.1.1 Uniform convergence of cumulative distribution functions |
|
|
98 | (3) |
|
4.1.2 Uniform laws for more general function classes |
|
|
101 | (3) |
|
4.2 A uniform law via Rademacher complexity |
|
|
104 | (5) |
|
4.2.1 Necessary conditions with Rademacher complexity |
|
|
107 | (2) |
|
4.3 Upper bounds on the Rademacher complexity |
|
|
109 | (8) |
|
4.3.1 Classes with polynomial discrimination |
|
|
109 | (2) |
|
4.3.2 Vapnik--Chervonenkis dimension |
|
|
111 | (4) |
|
4.3.3 Controlling the VC dimension |
|
|
115 | (2) |
|
4.4 Bibliographic details and background |
|
|
117 | (1) |
|
|
117 | (4) |
|
5 Metric entropy and its uses |
|
|
121 | (38) |
|
|
121 | (11) |
|
5.2 Gaussian and Rademacher complexity |
|
|
132 | (2) |
|
5.3 Metric entropy and sub-Gaussian processes |
|
|
134 | (9) |
|
5.3.1 Upper bound by one-step discretization |
|
|
135 | (2) |
|
5.3.2 Some examples of discretization bounds |
|
|
137 | (2) |
|
5.3.3 Chaining and Dudley's entropy integral |
|
|
139 | (4) |
|
5.4 Some Gaussian comparison inequalities |
|
|
143 | (5) |
|
5.4.1 A general comparison result |
|
|
143 | (2) |
|
5.4.2 Slepian and Sudakov--Fernique inequalities |
|
|
145 | (1) |
|
5.4.3 Gaussian contraction inequality |
|
|
146 | (2) |
|
5.5 Sudakov's lower bound |
|
|
148 | (2) |
|
5.6 Chaining and Orlicz processes |
|
|
150 | (3) |
|
5.7 Bibliographic details and background |
|
|
153 | (1) |
|
|
154 | (5) |
|
6 Random matrices and covariance estimation |
|
|
159 | (35) |
|
|
159 | (2) |
|
6.1.1 Notation and basic facts |
|
|
159 | (1) |
|
6.1.2 Set-up of covariance estimation |
|
|
160 | (1) |
|
6.2 Wishart matrices and their behavior |
|
|
161 | (4) |
|
6.3 Covariance matrices from sub-Gaussian ensembles |
|
|
165 | (3) |
|
6.4 Bounds for general matrices |
|
|
168 | (12) |
|
6.4.1 Background on matrix analysis |
|
|
168 | (1) |
|
6.4.2 Tail conditions for matrices |
|
|
169 | (3) |
|
6.4.3 Matrix Chernoff approach and independent decompositions |
|
|
172 | (2) |
|
6.4.4 Upper tail bounds for random matrices |
|
|
174 | (5) |
|
6.4.5 Consequences for covariance matrices |
|
|
179 | (1) |
|
6.5 Bounds for structured covariance matrices |
|
|
180 | (5) |
|
6.5.1 Unknown sparsity and thresholding |
|
|
180 | (3) |
|
6.5.2 Approximate sparsity |
|
|
183 | (2) |
|
6.6 Appendix: Proof of Theorem 6.1 |
|
|
185 | (3) |
|
6.7 Bibliographic details and background |
|
|
188 | (1) |
|
|
189 | (5) |
|
7 Sparse linear models in high dimensions |
|
|
194 | (42) |
|
7.1 Problem formulation and applications |
|
|
194 | (5) |
|
7.1.1 Different sparsity models |
|
|
194 | (2) |
|
7.1.2 Applications of sparse linear models |
|
|
196 | (3) |
|
7.2 Recovery in the noiseless setting |
|
|
199 | (7) |
|
|
200 | (1) |
|
7.2.2 Exact recovery and restricted nullspace |
|
|
200 | (2) |
|
7.2.3 Sufficient conditions for restricted nullspace |
|
|
202 | (4) |
|
7.3 Estimation in noisy settings |
|
|
206 | (10) |
|
7.3.1 Restricted eigenvalue condition |
|
|
207 | (2) |
|
7.3.2 Bounds on 2-error for hard sparse models |
|
|
209 | (4) |
|
7.3.3 Restricted nullspace and eigenvalues for random designs |
|
|
213 | (3) |
|
7.4 Bounds on prediction error |
|
|
216 | (2) |
|
7.5 Variable or subset selection |
|
|
218 | (6) |
|
7.5.1 Variable selection consistency for the Lasso |
|
|
219 | (3) |
|
7.5.2 Proof of Theorem 7.21 |
|
|
222 | (2) |
|
7.6 Appendix: Proof of Theorem 7.16 |
|
|
224 | (3) |
|
7.7 Bibliographic details and background |
|
|
227 | (2) |
|
|
229 | (7) |
|
8 Principal component analysis in high dimensions |
|
|
236 | (23) |
|
8.1 Principal components and dimension reduction |
|
|
236 | (6) |
|
8.1.1 Interpretations and uses of PCA |
|
|
237 | (4) |
|
8.1.2 Perturbations of eigenvalues and eigenspaces |
|
|
241 | (1) |
|
8.2 Bounds for generic eigenvectors |
|
|
242 | (6) |
|
8.2.1 A general deterministic result |
|
|
242 | (3) |
|
8.2.2 Consequences for a spiked ensemble |
|
|
245 | (3) |
|
8.3 Sparse principal component analysis |
|
|
248 | (7) |
|
8.3.1 A general deterministic result |
|
|
249 | (3) |
|
8.3.2 Consequences for the spiked model with sparsity |
|
|
252 | (3) |
|
8.4 Bibliographic details and background |
|
|
255 | (1) |
|
|
256 | (3) |
|
9 Decomposability and restricted strong convexity |
|
|
259 | (53) |
|
9.1 A general regularized M-estimator |
|
|
259 | (10) |
|
9.2 Decomposable regularizes and their utility |
|
|
269 | (7) |
|
9.2.1 Definition and some examples |
|
|
269 | (3) |
|
9.2.2 A key consequence of decomposability |
|
|
272 | (4) |
|
9.3 Restricted curvature conditions |
|
|
276 | (3) |
|
9.3.1 Restricted strong convexity |
|
|
277 | (2) |
|
9.4 Some general theorems |
|
|
279 | (7) |
|
9.4.1 Guarantees under restricted strong convexity |
|
|
280 | (4) |
|
9.4.2 Bounds under Φ-curvature |
|
|
284 | (2) |
|
9.5 Bounds for sparse vector regression |
|
|
286 | (4) |
|
9.5.1 Generalized linear models with sparsity |
|
|
286 | (1) |
|
9.5.2 Bounds under restricted strong convexity |
|
|
287 | (1) |
|
9.5.3 Bounds under ∞-curvature conditions |
|
|
288 | (2) |
|
9.6 Bounds for group-structured sparsity |
|
|
290 | (3) |
|
9.7 Bounds for overlapping decomposition-based norms |
|
|
293 | (4) |
|
9.8 Techniques for proving restricted strong convexity |
|
|
297 | (9) |
|
9.8.1 Lipschitz cost functions and Rademacher complexity |
|
|
298 | (4) |
|
9.8.2 A one-sided bound via truncation |
|
|
302 | (4) |
|
9.9 Appendix: Star-shaped property |
|
|
306 | (1) |
|
9.10 Bibliographic details and background |
|
|
306 | (1) |
|
|
307 | (5) |
|
10 Matrix estimation with rank constraints |
|
|
312 | (35) |
|
10.1 Matrix regression and applications |
|
|
312 | (5) |
|
10.2 Analysis of nuclear norm regularization |
|
|
317 | (4) |
|
10.2.1 Decomposability and subspaces |
|
|
317 | (2) |
|
10.2.2 Restricted strong convexity and error bounds |
|
|
319 | (1) |
|
10.2.3 Bounds under operator norm curvature |
|
|
320 | (1) |
|
10.3 Matrix compressed sensing |
|
|
321 | (5) |
|
10.4 Bounds for phase retrieval |
|
|
326 | (3) |
|
10.5 Multivariate regression with low-rank constraints |
|
|
329 | (1) |
|
|
330 | (7) |
|
10.7 Additive matrix decompositions |
|
|
337 | (4) |
|
10.8 Bibliographic details and background |
|
|
341 | (2) |
|
|
343 | (4) |
|
11 Graphical models for high-dimensional data |
|
|
347 | (36) |
|
|
347 | (5) |
|
|
347 | (3) |
|
11.1.2 Conditional independence |
|
|
350 | (1) |
|
11.1.3 Hammersley--Clifford equivalence |
|
|
351 | (1) |
|
11.1.4 Estimation of graphical models |
|
|
352 | (1) |
|
11.2 Estimation of Gaussian graphical models |
|
|
352 | (13) |
|
11.2.1 Graphical Lasso: 1-regularized maximum likelihood |
|
|
353 | (6) |
|
11.2.2 Neighborhood-based methods |
|
|
359 | (6) |
|
11.3 Graphical models in exponential form |
|
|
365 | (3) |
|
11.3.1 A general form of neighborhood regression |
|
|
366 | (1) |
|
11.3.2 Graph selection for Ising models |
|
|
367 | (1) |
|
11.4 Graphs with corrupted or hidden variables |
|
|
368 | (8) |
|
11.4.1 Gaussian graph estimation with corrupted data |
|
|
368 | (5) |
|
11.4.2 Gaussian graph selection with hidden variables |
|
|
373 | (3) |
|
11.5 Bibliographic details and background |
|
|
376 | (2) |
|
|
378 | (5) |
|
12 Reproducing kernel Hilbert spaces |
|
|
383 | (33) |
|
12.1 Basics of Hilbert spaces |
|
|
383 | (2) |
|
12.2 Reproducing kernel Hilbert spaces |
|
|
385 | (9) |
|
12.2.1 Positive semidefinite kernel functions |
|
|
386 | (1) |
|
12.2.2 Feature maps in 2(N) |
|
|
387 | (1) |
|
12.2.3 Constructing an RKHS from a kernel |
|
|
388 | (2) |
|
12.2.4 A more abstract viewpoint and further examples |
|
|
390 | (4) |
|
12.3 Mercer's theorem and its consequences |
|
|
394 | (6) |
|
12.4 Operations on reproducing kernel Hilbert spaces |
|
|
400 | (5) |
|
12.4.1 Sums of reproducing kernels |
|
|
400 | (3) |
|
|
403 | (2) |
|
12.5 Interpolation and fitting |
|
|
405 | (4) |
|
12.5.1 Function interpolation |
|
|
405 | (2) |
|
12.5.2 Fitting via kernel ridge regression |
|
|
407 | (2) |
|
12.6 Distances between probability measures |
|
|
409 | (2) |
|
12.7 Bibliographic details and background |
|
|
411 | (1) |
|
|
412 | (4) |
|
13 Nonparametric least squares |
|
|
416 | (37) |
|
|
416 | (4) |
|
13.1.1 Different measures of quality |
|
|
416 | (1) |
|
13.1.2 Estimation via constrained least squares |
|
|
417 | (1) |
|
|
418 | (2) |
|
13.2 Bounding the prediction error |
|
|
420 | (12) |
|
13.2.1 Bounds via metric entropy |
|
|
425 | (2) |
|
13.2.2 Bounds for high-dimensional parametric problems |
|
|
427 | (2) |
|
13.2.3 Bounds for nonparametric problems |
|
|
429 | (1) |
|
13.2.4 Proof of Theorem 13.5 |
|
|
430 | (2) |
|
|
432 | (7) |
|
13.3.1 Some examples of oracle inequalities |
|
|
434 | (3) |
|
13.3.2 Proof of Theorem 13.13 |
|
|
437 | (2) |
|
13.4 Regularized estimators |
|
|
439 | (9) |
|
13.4.1 Oracle inequalities for regularized estimators |
|
|
439 | (1) |
|
13.4.2 Consequences for kernel ridge regression |
|
|
439 | (4) |
|
13.4.3 Proof of Corollary 13.18 |
|
|
443 | (1) |
|
13.4.4 Proof of Theorem 13.17 |
|
|
444 | (4) |
|
13.5 Bibliographic details and background |
|
|
448 | (1) |
|
|
449 | (4) |
|
14 Localization and uniform laws |
|
|
453 | (32) |
|
14.1 Population and empirical L2-norms |
|
|
453 | (9) |
|
14.1.1 A uniform law with localization |
|
|
454 | (4) |
|
14.1.2 Specialization to kernel classes |
|
|
458 | (2) |
|
14.1.3 Proof of Theorem 14.1 |
|
|
460 | (2) |
|
14.2 A one-sided uniform law |
|
|
462 | (7) |
|
14.2.1 Consequences for nonparametric least squares |
|
|
466 | (2) |
|
14.2.2 Proof of Theorem 14.12 |
|
|
468 | (1) |
|
14.3 A uniform law for Lipschitz cost functions |
|
|
469 | (6) |
|
14.3.1 General prediction problems |
|
|
469 | (3) |
|
14.3.2 Uniform law for Lipschitz cost functions |
|
|
472 | (3) |
|
14.4 Some consequences for nonparametric density estimation |
|
|
475 | (5) |
|
14.4.1 Density estimation via the nonparametric maximum likelihood estimate |
|
|
475 | (2) |
|
14.4.2 Density estimation via projections |
|
|
477 | (3) |
|
14.5 Appendix: Population and empirical Rademacher complexities |
|
|
480 | (1) |
|
14.6 Bibliographic details and background |
|
|
481 | (1) |
|
|
482 | (3) |
|
|
485 | (39) |
|
|
485 | (6) |
|
|
486 | (1) |
|
15.1.2 From estimation to testing |
|
|
487 | (2) |
|
15.1.3 Some divergence measures |
|
|
489 | (2) |
|
15.2 Binary testing and Le Cam's method |
|
|
491 | (9) |
|
15.2.1 Bayes error and total variation distance |
|
|
491 | (6) |
|
15.2.2 Le Cam's convex hull method |
|
|
497 | (3) |
|
|
500 | (15) |
|
15.3.1 Kullback-Leibler divergence and mutual information |
|
|
501 | (1) |
|
15.3.2 Fano lower bound on minimax risk |
|
|
501 | (2) |
|
15.3.3 Bounds based on local packings |
|
|
503 | (3) |
|
15.3.4 Local packings with Gaussian entropy bounds |
|
|
506 | (6) |
|
15.3.5 Yang-Barron version of Fano's method |
|
|
512 | (3) |
|
15.4 Appendix: Basic background in information theory |
|
|
515 | (3) |
|
15.5 Bibliographic details and background |
|
|
518 | (1) |
|
|
519 | (5) |
References |
|
524 | (16) |
Subject index |
|
540 | (8) |
Author index |
|
548 | |