Foreword |
|
xv | |
Preface |
|
xix | |
Acknowledgements |
|
xxviii | |
|
|
1 | (30) |
|
1.1 A Universal Task: Pursuit of Low-Dimensional Structure |
|
|
1 | (8) |
|
1.1.1 Identifying Dynamical Systems and Serial Data |
|
|
1 | (2) |
|
1.1.2 Patterns and Orders in a Man-Made World |
|
|
3 | (2) |
|
1.1.3 Efficient Data Acquisition and Processing |
|
|
5 | (2) |
|
1.1.4 Interpretation of Data with Graphical Models |
|
|
7 | (2) |
|
|
9 | (11) |
|
1.2.1 Neural Science: Sparse Coding |
|
|
9 | (3) |
|
1.2.2 Signal Processing: Sparse Error Correction |
|
|
12 | (3) |
|
1.2.3 Classical Statistics: Sparse Regression Analysis |
|
|
15 | (3) |
|
1.2.4 Data Analysis: Principal Component Analysis |
|
|
18 | (2) |
|
|
20 | (9) |
|
1.3.1 From Curses to Blessings of High Dimensionality |
|
|
20 | (2) |
|
1.3.2 Compressive Sensing, Error Correction, and Deep Learning |
|
|
22 | (2) |
|
1.3.3 High-Dimensional Geometry and Nonasymptotic Statistics |
|
|
24 | (2) |
|
1.3.4 Scalable Optimization: Convex and Nonconvex |
|
|
26 | (2) |
|
|
28 | (1) |
|
|
29 | (2) |
|
Part I Principles of Low-Dimensional Models |
|
|
31 | (272) |
|
|
33 | (36) |
|
2.1 Applications of Sparse Signal Modeling |
|
|
33 | (8) |
|
2.1.1 An Example from Medical Imaging |
|
|
34 | (3) |
|
2.1.2 An Example from Image Processing |
|
|
37 | (2) |
|
2.1.3 An Example from Face Recognition |
|
|
39 | (2) |
|
2.2 Recovering a Sparse Solution |
|
|
41 | (10) |
|
2.2.1 Norms on Vector Spaces |
|
|
42 | (2) |
|
|
44 | (1) |
|
2.2.3 The Sparsest Solution: Minimizing the l0 Norm |
|
|
45 | (3) |
|
2.2.4 Computational Complexity of l0 Minimization |
|
|
48 | (3) |
|
2.3 Relaxing the Sparse Recovery Problem |
|
|
51 | (11) |
|
|
51 | (3) |
|
2.3.2 A Convex Surrogate for the l0 Norm: the l1 Norm |
|
|
54 | (1) |
|
2.3.3 A Simple Test of l0 Minimization |
|
|
55 | (5) |
|
2.3.4 Sparse Error Correction via Logan's Phenomenon |
|
|
60 | (2) |
|
|
62 | (1) |
|
|
63 | (1) |
|
|
64 | (5) |
|
3 Convex Methods for Sparse Signal Recovery |
|
|
69 | (63) |
|
3.1 Why Does l1 Minimization Succeed? Geometric Intuitions |
|
|
69 | (3) |
|
3.2 A First Correctness Result for Incoherent Matrices |
|
|
72 | (10) |
|
3.2.1 Coherence of a Matrix |
|
|
72 | (2) |
|
3.2.2 Correctness of l1 Minimization |
|
|
74 | (3) |
|
3.2.3 Constructing an Incoherent Matrix |
|
|
77 | (2) |
|
3.2.4 Limitations of Incoherence |
|
|
79 | (3) |
|
3.3 Towards Stronger Correctness Results |
|
|
82 | (9) |
|
3.3.1 The Restricted Isometry Property (RIP) |
|
|
82 | (2) |
|
3.3.2 Restricted Strong Convexity Condition |
|
|
84 | (4) |
|
3.3.3 Success of l1 Minimization under RIP |
|
|
88 | (3) |
|
3.4 Matrices with Restricted Isometry Property |
|
|
91 | (10) |
|
3.4.1 The Johnson--Lindenstrauss Lemma |
|
|
91 | (3) |
|
3.4.2 RIP of Gaussian Random Matrices |
|
|
94 | (5) |
|
3.4.3 RIP of Non-Gaussian Matrices |
|
|
99 | (2) |
|
3.5 Noisy Observations or Approximate Sparsity |
|
|
101 | (11) |
|
3.5.1 Stable Recovery of Sparse Signals |
|
|
103 | (7) |
|
3.5.2 Recovery of Inexact Sparse Signals |
|
|
110 | (2) |
|
3.6 Phase Transitions in Sparse Recovery |
|
|
112 | (15) |
|
3.6.1 Phase Transitions: Main Conclusions |
|
|
114 | (1) |
|
3.6.2 Phase Transitions via Coefficient-Space Geometry |
|
|
115 | (3) |
|
3.6.3 Phase Transitions via Observation-Space Geometry |
|
|
118 | (1) |
|
3.6.4 Phase Transitions in Support Recovery |
|
|
119 | (8) |
|
|
127 | (1) |
|
|
128 | (1) |
|
|
128 | (4) |
|
4 Convex Methods for Low-Rank Matrix Recovery |
|
|
132 | (59) |
|
4.1 Motivating Examples of Low-Rank Modeling |
|
|
132 | (5) |
|
4.1.1 3D Shape from Photometric Measurements |
|
|
132 | (2) |
|
4.1.2 Recommendation Systems |
|
|
134 | (2) |
|
4.1.3 Euclidean Distance Matrix Embedding |
|
|
136 | (1) |
|
4.1.4 Latent Semantic Analysis |
|
|
136 | (1) |
|
4.2 Representing Low-Rank Matrix via SVD |
|
|
137 | (5) |
|
4.2.1 Singular Vectors via Nonconvex Optimization |
|
|
138 | (3) |
|
4.2.2 Best Low-Rank Matrix Approximation |
|
|
141 | (1) |
|
4.3 Recovering a Low-Rank Matrix |
|
|
142 | (21) |
|
4.3.1 General Rank Minimization Problems |
|
|
142 | (1) |
|
4.3.2 Convex Relaxation of Rank Minimization |
|
|
143 | (3) |
|
4.3.3 Nuclear Norm as a Convex Envelope of Rank |
|
|
146 | (1) |
|
4.3.4 Success of Nuclear Norm under Rank-RIP |
|
|
147 | (6) |
|
4.3.5 Rank-RIP of Random Measurements |
|
|
153 | (5) |
|
4.3.6 Noise, Inexact Low Rank, and Phase Transition |
|
|
158 | (5) |
|
4.4 Low-Rank Matrix Completion |
|
|
163 | (20) |
|
4.4.1 Nuclear Norm Minimization for Matrix Completion |
|
|
164 | (1) |
|
4.4.2 Algorithm via Augmented Lagrange Multiplier |
|
|
165 | (3) |
|
4.4.3 When Does Nuclear Norm Minimization Succeed? |
|
|
168 | (2) |
|
4.4.4 Proving Correctness of Nuclear Norm Minimization |
|
|
170 | (11) |
|
4.4.5 Stable Matrix Completion with Noise |
|
|
181 | (2) |
|
|
183 | (1) |
|
|
184 | (1) |
|
|
185 | (6) |
|
5 Decomposing Low-Rank and Sparse Matrices |
|
|
191 | (42) |
|
5.1 Robust PCA and Motivating Examples |
|
|
191 | (6) |
|
5.1.1 Problem Formulation |
|
|
191 | (2) |
|
5.1.2 Matrix Rigidity and Planted Clique |
|
|
193 | (1) |
|
5.1.3 Applications of Robust PCA |
|
|
194 | (3) |
|
5.2 Robust PCA via Principal Component Pursuit |
|
|
197 | (8) |
|
5.2.1 Convex Relaxation for Sparse Low-Rank Separation |
|
|
197 | (1) |
|
5.2.2 Solving PCP via Alternating Directions Method |
|
|
198 | (2) |
|
5.2.3 Numerical Simulations and Experiments of PCP |
|
|
200 | (5) |
|
5.3 Identifiability and Exact Recovery |
|
|
205 | (14) |
|
5.3.1 Identifiability Conditions |
|
|
205 | (2) |
|
5.3.2 Correctness of Principal Component Pursuit |
|
|
207 | (10) |
|
5.3.3 Some Extensions to the Main Result |
|
|
217 | (2) |
|
5.4 Stable Principal Component Pursuit with Noise |
|
|
219 | (4) |
|
5.5 Compressive Principal Component Pursuit |
|
|
223 | (2) |
|
5.6 Matrix Completion with Corrupted Entries |
|
|
225 | (2) |
|
|
227 | (1) |
|
|
228 | (1) |
|
|
229 | (4) |
|
6 Recovering General Low-Dimensional Models |
|
|
233 | (30) |
|
6.1 Concise Signal Models |
|
|
233 | (8) |
|
6.1.1 Atomic Sets and Examples |
|
|
234 | (3) |
|
6.1.2 Atomic Norm Minimization for Structured Signals |
|
|
237 | (4) |
|
6.2 Geometry, Measure Concentration, and Phase Transition |
|
|
241 | (14) |
|
6.2.1 Success Condition as Two Nonintersecting Cones |
|
|
241 | (2) |
|
6.2.2 Intrinsic Volumes and Kinematic Formula |
|
|
243 | (4) |
|
6.2.3 Statistical Dimension and Phase Transition |
|
|
247 | (2) |
|
6.2.4 Statistical Dimension of Descent Cone of the l1 Norm |
|
|
249 | (4) |
|
6.2.5 Phase Transition in Decomposing Structured Signals |
|
|
253 | (2) |
|
6.3 Limitations of Convex Relaxation |
|
|
255 | (5) |
|
6.3.1 Suboptimality of Convex Relaxation for Multiple Structures |
|
|
255 | (2) |
|
6.3.2 Intractable Convex Relaxation for High-Order Tensors |
|
|
257 | (1) |
|
6.3.3 Lack of Convex Relaxation for Bilinear Problems |
|
|
258 | (1) |
|
6.3.4 Nonlinear Low-Dimensional Structures |
|
|
259 | (1) |
|
6.3.5 Return of Nonconvex Formulation and Optimization |
|
|
259 | (1) |
|
|
260 | (1) |
|
|
260 | (3) |
|
7 Nonconvex Methods for Low-Dimensional Models |
|
|
263 | (40) |
|
|
263 | (8) |
|
7.1.1 Nonlinearity, Symmetry, and Nonconvexity |
|
|
264 | (3) |
|
7.1.2 Symmetry and the Global Geometry of Optimization |
|
|
267 | (2) |
|
7.1.3 A Taxonomy of Symmetric Nonconvex Problems |
|
|
269 | (2) |
|
7.2 Nonconvex Problems with Rotational Symmetries |
|
|
271 | (12) |
|
7.2.1 Minimal Example: Phase Retrieval with One Unknown |
|
|
271 | (1) |
|
7.2.2 Generalized Phase Retrieval |
|
|
272 | (4) |
|
7.2.3 Low-Rank Matrix Recovery |
|
|
276 | (6) |
|
7.2.4 Other Nonconvex Problems with Rotational Symmetry |
|
|
282 | (1) |
|
7.3 Nonconvex Problems with Discrete Symmetries |
|
|
283 | (11) |
|
7.3.1 Minimal Example: Dictionary Learning with One-Sparsity |
|
|
283 | (3) |
|
7.3.2 Dictionary Learning |
|
|
286 | (3) |
|
7.3.3 Sparse Blind Deconvolution |
|
|
289 | (3) |
|
7.3.4 Other Nonconvex Problems with Discrete Symmetry |
|
|
292 | (2) |
|
7.4 Notes and Open Problems |
|
|
294 | (3) |
|
|
297 | (6) |
|
Part II Computation for Large-Scale Problems |
|
|
303 | (116) |
|
8 Convex Optimization for Structured Signal Recovery |
|
|
305 | (60) |
|
8.1 Challenges and Opportunities |
|
|
306 | (2) |
|
8.2 Proximal Gradient Methods |
|
|
308 | (10) |
|
8.2.1 Convergence of Gradient Descent |
|
|
308 | (2) |
|
8.2.2 From Gradient to Proximal Gradient |
|
|
310 | (4) |
|
8.2.3 Proximal Gradient for the Lasso and Stable PCP |
|
|
314 | (2) |
|
8.2.4 Convergence of Proximal Gradient |
|
|
316 | (2) |
|
8.3 Accelerated Proximal Gradient Methods |
|
|
318 | (9) |
|
8.3.1 Acceleration via Nesterov's Method |
|
|
319 | (3) |
|
8.3.2 APG for Basis Pursuit Denoising |
|
|
322 | (1) |
|
8.3.3 APG for Stable Principal Component Pursuit |
|
|
322 | (1) |
|
|
323 | (3) |
|
8.3.5 Further Developments on Acceleration |
|
|
326 | (1) |
|
8.4 Augmented Lagrange Multipliers |
|
|
327 | (7) |
|
8.4.1 ALM for Basis Pursuit |
|
|
332 | (1) |
|
8.4.2 ALM for Principal Component Pursuit |
|
|
332 | (1) |
|
|
333 | (1) |
|
8.5 Alternating Direction Method of Multipliers |
|
|
334 | (12) |
|
8.5.1 ADMM for Principal Component Pursuit |
|
|
335 | (1) |
|
|
336 | (4) |
|
8.5.3 Convergence of ALM and ADMM |
|
|
340 | (6) |
|
8.6 Leveraging Problem Structures for Better Scalability |
|
|
346 | (12) |
|
8.6.1 Frank--Wolfe for Structured Constraint Set |
|
|
347 | (4) |
|
8.6.2 Frank--Wolfe for Stable Matrix Completion |
|
|
351 | (1) |
|
8.6.3 Connection to Greedy Methods for Sparsity |
|
|
352 | (4) |
|
8.6.4 Stochastic Gradient Descent for Finite Sum |
|
|
356 | (2) |
|
|
358 | (2) |
|
|
360 | (5) |
|
9 Nonconvex Optimization for High-Dimensional Problems |
|
|
365 | (54) |
|
9.1 Challenges and Opportunities |
|
|
366 | (6) |
|
9.1.1 Finding Critical Points via Gradient Descent |
|
|
367 | (3) |
|
9.1.2 Finding Critical Points via Newton's Method |
|
|
370 | (2) |
|
9.2 Cubic Regularization of Newton's Method |
|
|
372 | (5) |
|
9.2.1 Convergence to Second-Order Stationary Points |
|
|
373 | (4) |
|
9.2.2 More Scalable Solution to the Subproblem |
|
|
377 | (1) |
|
9.3 Gradient and Negative Curvature Descent |
|
|
377 | (8) |
|
9.3.1 Hybrid Gradient and Negative Curvature Descent |
|
|
378 | (2) |
|
9.3.2 Computing Negative Curvature via the Lanczos Method |
|
|
380 | (3) |
|
9.3.3 Overall Complexity in First-Order Oracle |
|
|
383 | (2) |
|
9.4 Negative Curvature and Newton Descent |
|
|
385 | (10) |
|
9.4.1 Curvature Guided Newton Descent |
|
|
385 | (4) |
|
9.4.2 Inexact Negative Curvature and Newton Descent |
|
|
389 | (4) |
|
9.4.3 Overall Complexity in First-Order Oracle |
|
|
393 | (2) |
|
9.5 Gradient Descent with Small Random Noise |
|
|
395 | (12) |
|
9.5.1 Diffusion Process and Laplace's Method |
|
|
395 | (3) |
|
9.5.2 Noisy Gradient with Langevin Monte Carlo |
|
|
398 | (2) |
|
9.5.3 Negative Curvature Descent with Random Noise |
|
|
400 | (5) |
|
9.5.4 Complexity of Perturbed Gradient Descent |
|
|
405 | (2) |
|
9.6 Leveraging Symmetry Structure: Generalized Power Iteration |
|
|
407 | (6) |
|
9.6.1 Power Iteration for Computing Singular Vectors |
|
|
407 | (2) |
|
9.6.2 Complete Dictionary Learning |
|
|
409 | (1) |
|
9.6.3 Optimization over Stiefel Manifolds |
|
|
410 | (1) |
|
9.6.4 Fixed Point of a Contraction Mapping |
|
|
411 | (2) |
|
|
413 | (1) |
|
|
414 | (5) |
|
Part III Applications to Real-World Problems |
|
|
419 | (154) |
|
10 Magnetic Resonance Imaging |
|
|
421 | (19) |
|
|
421 | (1) |
|
10.2 Formation of MR Images |
|
|
422 | (5) |
|
|
422 | (2) |
|
10.2.2 Selective Excitation and Spatial Encoding |
|
|
424 | (1) |
|
10.2.3 Sampling and Reconstruction |
|
|
425 | (2) |
|
10.3 Sparsity and Compressive Sampling of MR Images |
|
|
427 | (6) |
|
10.3.1 Sparsity of MR Images |
|
|
427 | (3) |
|
10.3.2 Compressive Sampling of MR Images |
|
|
430 | (3) |
|
10.4 Algorithms for MR Image Recovery |
|
|
433 | (4) |
|
|
437 | (1) |
|
|
438 | (2) |
|
11 Wideband Spectrum Sensing |
|
|
440 | (16) |
|
|
440 | (2) |
|
11.1.1 Wideband Communications |
|
|
440 | (1) |
|
11.1.2 Nyquist Sampling and Beyond |
|
|
441 | (1) |
|
11.2 Wideband Interferer Detection |
|
|
442 | (5) |
|
11.2.1 Conventional Scanning Approaches |
|
|
443 | (2) |
|
11.2.2 Compressive Sensing in the Frequency Domain |
|
|
445 | (2) |
|
11.3 System Implementation and Performance |
|
|
447 | (8) |
|
11.3.1 Quadrature Analog to Information Converter |
|
|
448 | (1) |
|
11.3.2 A Prototype Circuit Implementation |
|
|
449 | (5) |
|
11.3.3 Recent Developments in Hardware Implementation |
|
|
454 | (1) |
|
|
455 | (1) |
|
12 Scientific Imaging Problems |
|
|
456 | (12) |
|
|
456 | (1) |
|
12.2 Data Model and Optimization Formulation |
|
|
456 | (3) |
|
12.3 Symmetry in Short-and-Sparse Deconvolution |
|
|
459 | (2) |
|
12.4 Algorithms for Short-and-Sparse Deconvolution |
|
|
461 | (4) |
|
12.4.1 Alternating Descent Method |
|
|
462 | (1) |
|
12.4.2 Additional Heuristics for Highly Coherent Problems |
|
|
463 | (2) |
|
12.4.3 Computational Examples |
|
|
465 | (1) |
|
12.5 Extensions: Multiple Motifs |
|
|
465 | (2) |
|
|
467 | (1) |
|
13 Robust Face Recognition |
|
|
468 | (14) |
|
|
468 | (2) |
|
13.2 Classification Based on Sparse Representation |
|
|
470 | (3) |
|
13.3 Robustness to Occlusion or Corruption |
|
|
473 | (4) |
|
13.4 Dense Error Correction with the Cross and Bouquet |
|
|
477 | (2) |
|
|
479 | (1) |
|
|
480 | (2) |
|
14 Robust Photometric Stereo |
|
|
482 | (17) |
|
|
482 | (1) |
|
14.2 Photometric Stereo via Low-Rank Matrix Recovery |
|
|
483 | (6) |
|
14.2.1 Lambertian Surface under Directional Lights |
|
|
484 | (2) |
|
14.2.2 Modeling Shadows and Specularities |
|
|
486 | (3) |
|
14.3 Robust Matrix Completion Algorithm |
|
|
489 | (2) |
|
14.4 Experimental Evaluation |
|
|
491 | (6) |
|
14.4.1 Quantitative Evaluation with Synthetic Images |
|
|
492 | (4) |
|
14.4.2 Qualitative Evaluation with Real Images |
|
|
496 | (1) |
|
|
497 | (2) |
|
15 Structured Texture Recovery |
|
|
499 | (27) |
|
|
499 | (1) |
|
|
500 | (2) |
|
15.3 Structured Texture Inpainting |
|
|
502 | (5) |
|
15.4 Transform-Invariant Low-Rank Textures |
|
|
507 | (5) |
|
15.4.1 Deformed and Corrupted Low-Rank Textures |
|
|
507 | (2) |
|
15.4.2 The TILT Algorithm |
|
|
509 | (3) |
|
15.5 Applications of TILT |
|
|
512 | (11) |
|
15.5.1 Rectifying Planar Low-Rank Textures |
|
|
513 | (1) |
|
15.5.2 Rectifying Generalized Cylindrical Surfaces |
|
|
514 | (3) |
|
15.5.3 Calibrating Camera Lens Distortion |
|
|
517 | (6) |
|
|
523 | (3) |
|
16 Deep Networks for Classification |
|
|
526 | (47) |
|
|
526 | (6) |
|
16.1.1 Deep Learning in a Nutshell |
|
|
527 | (2) |
|
16.1.2 The Practice of Deep Learning |
|
|
529 | (2) |
|
16.1.3 Challenges with Nonlinearity and Discriminativeness |
|
|
531 | (1) |
|
16.2 Desiderata for Learning Discriminative Representation |
|
|
532 | (10) |
|
16.2.1 Measure of Compactness for a Representation |
|
|
533 | (3) |
|
16.2.2 Principle of Maximal Coding Rate Reduction |
|
|
536 | (1) |
|
16.2.3 Properties of the Rate Reduction Function |
|
|
537 | (2) |
|
16.2.4 Experiments on Real Data |
|
|
539 | (3) |
|
16.3 Deep Networks from First Principles |
|
|
542 | (18) |
|
16.3.1 Deep Networks from Optimizing Rate Reduction |
|
|
542 | (6) |
|
16.3.2 Convolutional Networks from Invariant Rate Reduction |
|
|
548 | (7) |
|
16.3.3 Simulations and Experiments |
|
|
555 | (5) |
|
16.4 Guaranteed Manifold Classification by Deep Networks |
|
|
560 | (5) |
|
16.4.1 Minimal Case: Two 1D Submanifolds |
|
|
560 | (2) |
|
16.4.2 Problem Formulation and Analysis |
|
|
562 | (2) |
|
|
564 | (1) |
|
16.5 Epilogue: Open Problems and Future Directions |
|
|
565 | (5) |
|
|
570 | (3) |
|
|
573 | (60) |
|
Appendix A Facts from Linear Algebra and Matrix Analysis |
|
|
575 | (23) |
|
Appendix B Convex Sets and Functions |
|
|
598 | (10) |
|
Appendix C Optimization Problems and Optimality Conditions |
|
|
608 | (6) |
|
Appendix D Methods for Optimization |
|
|
614 | (12) |
|
Appendix E Facts from High-Dimensional Statistics |
|
|
626 | (7) |
References |
|
633 | (38) |
List of Symbols |
|
671 | (3) |
Index |
|
674 | |