Muutke küpsiste eelistusi

Statistical Learning with Sparsity: The Lasso and Generalizations [Pehme köide]

(Stanford University, California, USA), (Department of Statistics, University of California, Berkeley), (Stanford University, California, USA)
Teised raamatud teemal:
Teised raamatud teemal:

Discover New Methods for Dealing with High-Dimensional Data





A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data.





Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of l1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso.





In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.

Arvustused

"The authors study and analyze methods using the sparsity property of some statistical models in order to recover the underlying signal in a dataset. They focus on the Lasso technique as an alternative to the standard least-squares method." Zentralblatt MATH 1319

Preface xv
1 Introduction
1(6)
2 The Lasso for Linear Models
7(22)
2.1 Introduction
7(1)
2.2 The Lasso Estimator
8(5)
2.3 Cross-Validation and Inference
13(1)
2.4 Computation of the Lasso Solution
14(3)
2.4.1 Single Predictor: Soft Thresholding
15(1)
2.4.2 Multiple Predictors: Cyclic Coordinate Descent
15(2)
2.4.3 Soft-Thresholding and Orthogonal Bases
17(1)
2.5 Degrees of Freedom
17(2)
2.6 Uniqueness of the Lasso Solutions
19(1)
2.7 A Glimpse at the Theory
20(1)
2.8 The Nonnegative Garrote
20(2)
2.9 Q Penalties and Bayes Estimates
22(1)
2.10 Some Perspective
23(6)
Exercises
24(5)
3 Generalized Linear Models
29(26)
3.1 Introduction
29(2)
3.2 Logistic Regression
31(5)
3.2.1 Example: Document Classification
32(3)
3.2.2 Algorithms
35(1)
3.3 Multiclass Logistic Regression
36(4)
3.3.1 Example: Handwritten Digits
37(2)
3.3.2 Algorithms
39(1)
3.3.3 Grouped-Lasso Multinomial
39(1)
3.4 Log-Linear Models and the Poisson GLM
40(2)
3.4.1 Example: Distribution Smoothing
40(2)
3.5 Cox Proportional Hazards Models
42(4)
3.5.1 Cross-Validation
43(2)
3.5.2 Pre-Validation
45(1)
3.6 Support Vector Machines
46(4)
3.6.1 Logistic Regression with Separable Data
49(1)
3.7 Computational Details and glmnet
50(5)
Bibliographic Notes
52(1)
Exercises
53(2)
4 Generalizations of the Lasso Penalty
55(40)
4.1 Introduction
55(1)
4.2 The Elastic Net
56(2)
4.3 The Group Lasso
58(11)
4.3.1 Computation for the Group Lasso
62(2)
4.3.2 Sparse Group Lasso
64(1)
4.3.3 The Overlap Group Lasso
65(4)
4.4 Sparse Additive Models and the Group Lasso
69(7)
4.4.1 Additive Models and Backfitting
69(1)
4.4.2 Sparse Additive Models and Backfitting
70(2)
4.4.3 Approaches Using Optimization and the Group Lasso
72(2)
4.4.4 Multiple Penalization for Sparse Additive Models
74(2)
4.5 The Fused Lasso
76(8)
4.5.1 Fitting the Fused Lasso
77(1)
4.5.1.1 Reparametrization
78(1)
4.5.1.2 A Path Algorithm
79(1)
4.5.1.3 A Dual Path Algorithm
79(1)
4.5.1.4 Dynamic Programming for the Fused Lasso
80(1)
4.5.2 Trend Filtering
81(2)
4.5.3 Nearly Isotonic Regression
83(1)
4.6 Nonconvex Penalties
84(11)
Bibliographic Notes
86(2)
Exercises
88(7)
5 Optimization Methods
95(44)
5.1 Introduction
95(1)
5.2 Convex Optimality Conditions
95(5)
5.2.1 Optimality for Differentiable Problems
95(3)
5.2.2 Nondifferentiable Functions and Subgradients
98(2)
5.3 Gradient Descent
100(9)
5.3.1 Unconstrained Gradient Descent
101(1)
5.3.2 Projected Gradient Methods
102(1)
5.3.3 Proximal Gradient Methods
103(4)
5.3.4 Accelerated Gradient Methods
107(2)
5.4 Coordinate Descent
109(8)
5.4.1 Separability and Coordinate Descent
110(2)
5.4.2 Linear Regression and the Lasso
112(3)
5.4.3 Logistic Regression and Generalized Linear Models
115(2)
5.5 A Simulation Study
117(1)
5.6 Least Angle Regression
118(3)
5.7 Alternating Direction Method of Multipliers
121(2)
5.8 Minorization-Maximization Algorithms
123(1)
5.9 Biconvexity and Alternating Minimization
124(3)
5.10 Screening Rules
127(12)
Bibliographic Notes
131(1)
Appendix
132(2)
Exercises
134(5)
6 Statistical Inference
139(28)
6.1 The Bayesian Lasso
139(3)
6.2 The Bootstrap
142(5)
6.3 Post-Selection Inference for the Lasso
147(11)
6.3.1 The Covariance Test
147(3)
6.3.2 A General Scheme for Post-Selection Inference
150(4)
6.3.2.1 Fixed-λ Inference for the Lasso
154(2)
6.3.2.2 The Spacing Test for LAR
156(1)
6.3.3 What Hypothesis Is Being Tested?
157(1)
6.3.4 Back to Forward Stepwise Regression
158(1)
6.4 Inference via a Debiased Lasso
158(2)
6.5 Other Proposals for Post-Selection Inference
160(7)
Bibliographic Notes
161(1)
Exercises
162(5)
7 Matrix Decompositions, Approximations, and Completion
167(34)
7.1 Introduction
167(2)
7.2 The Singular Value Decomposition
169(1)
7.3 Missing Data and Matrix Completion
169(15)
7.3.1 The Netflix Movie Challenge
170(4)
7.3.2 Matrix Completion Using Nuclear Norm
174(3)
7.3.3 Theoretical Results for Matrix Completion
177(4)
7.3.4 Maximum Margin Factorization and Related Methods
181(3)
7.4 Reduced-Rank Regression
184(1)
7.5 A General Matrix Regression Framework
185(2)
7.6 Penalized Matrix Decomposition
187(3)
7.7 Additive Matrix Decomposition
190(11)
Bibliographic Notes
195(1)
Exercises
196(5)
8 Sparse Multivariate Methods
201(40)
8.1 Introduction
201(1)
8.2 Sparse Principal Components Analysis
202(11)
8.2.1 Some Background
202(2)
8.2.2 Sparse Principal Components
204(1)
8.2.2.1 Sparsity from Maximum Variance
204(2)
8.2.2.2 Methods Based on Reconstruction
206(1)
8.2.3 Higher-Rank Solutions
207(2)
8.2.3.1 Illustrative Application of Sparse PCA
209(1)
8.2.4 Sparse PCA via Fantope Projection
210(1)
8.2.5 Sparse Autoencoders and Deep Learning
210(2)
8.2.6 Some Theory for Sparse PCA
212(1)
8.3 Sparse Canonical Correlation Analysis
213(4)
8.3.1 Example: Netflix Movie Rating Data
215(2)
8.4 Sparse Linear Discriminant Analysis
217(10)
8.4.1 Normal Theory and Bayes' Rule
217(1)
8.4.2 Nearest Shrunken Centroids
218(3)
8.4.3 Fisher's Linear Discriminant Analysis
221(1)
8.4.3.1 Example: Simulated Data with Five Classes
222(3)
8.4.4 Optimal Scoring
225(1)
8.4.4.1 Example: Face Silhouettes
226(1)
8.5 Sparse Clustering
227(14)
8.5.1 Some Background on Clustering
227(1)
8.5.1.1 Example: Simulated Data with Six Classes
228(1)
8.5.2 Sparse Hierarchical Clustering
228(2)
8.5.3 Sparse K-Means Clustering
230(1)
8.5.4 Convex Clustering
231(1)
Bibliographic Notes
232(2)
Exercises
234(7)
9 Graphs and Model Selection
241(28)
9.1 Introduction
241(1)
9.2 Basics of Graphical Models
241(5)
9.2.1 Factorization and Markov Properties
241(1)
9.2.1.1 Factorization Property
242(1)
9.2.1.2 Markov Property
243(1)
9.2.1.3 Equivalence of Factorization and Markov Properties
243(1)
9.2.2 Some Examples
244(1)
9.2.2.1 Discrete Graphical Models
244(1)
9.2.2.2 Gaussian Graphical Models
245(1)
9.3 Graph Selection via Penalized Likelihood
246(8)
9.3.1 Global Likelihoods for Gaussian Models
247(1)
9.3.2 Graphical Lasso Algorithm
248(3)
9.3.3 Exploiting Block-Diagonal Structure
251(1)
9.3.4 Theoretical Guarantees for the Graphical Lasso
252(1)
9.3.5 Global Likelihood for Discrete Models
253(1)
9.4 Graph Selection via Conditional Inference
254(7)
9.4.1 Neighborhood-Based Likelihood for Gaussians
255(1)
9.4.2 Neighborhood-Based Likelihood for Discrete Models
256(3)
9.4.3 Pseudo-Likelihood for Mixed Models
259(2)
9.5 Graphical Models with Hidden Variables
261(8)
Bibliographic Notes
261(2)
Exercises
263(6)
10 Signal Approximation and Compressed Sensing
269(20)
10.1 Introduction
269(1)
10.2 Signals and Sparse Representations
269(7)
10.2.1 Orthogonal Bases
269(2)
10.2.2 Approximation in Orthogonal Bases
271(3)
10.2.3 Reconstruction in Overcomplete Bases
274(2)
10.3 Random Projection and Approximation
276(4)
10.3.1 Johnson-Lindenstrauss Approximation
277(1)
10.3.2 Compressed Sensing
278(2)
10.4 Equivalence between 0 and 1 Recovery
280(9)
10.4.1 Restricted Nullspace Property
281(1)
10.4.2 Sufficient Conditions for Restricted Nullspace
282(2)
10.4.3 Proofs
284(1)
10.4.3.1 Proof of Theorem 10.1
284(1)
10.4.3.2 Proof of Proposition 10.1
284(1)
Bibliographic Notes
285(1)
Exercises
286(3)
11 Theoretical Results for the Lasso
289(26)
11.1 Introduction
289(2)
11.1.1 Types of Loss Functions
289(1)
11.1.2 Types of Sparsity Models
290(1)
11.2 Bounds on Lasso 2-Error
291(8)
11.2.1 Strong Convexity in the Classical Setting
291(2)
11.2.2 Restricted Eigenvalues for Regression
293(1)
11.2.3 A Basic Consistency Result
294(5)
11.3 Bounds on Prediction Error
299(2)
11.4 Support Recovery in Linear Regression
301(9)
11.4.1 Variable-Selection Consistency for the Lasso
301(2)
11.4.1.1 Some Numerical Studies
303(2)
11.4.2 Proof of Theorem 11.3
305(5)
11.5 Beyond the Basic Lasso
310(5)
Bibliographic Notes
311(1)
Exercises
312(3)
Bibliography 315(22)
Author Index 337(6)
Index 343
Trevor Hastie is the John A. Overdeck Professor of Statistics at Stanford University. Prior to joining Stanford University, Professor Hastie worked at AT&T Bell Laboratories, where he helped develop the statistical modeling environment popular in the R computing system. Professor Hastie is known for his research in applied statistics, particularly in the fields of data mining, bioinformatics, and machine learning. He has published five books and over 180 research articles in these areas. In 2014, he received the Emanuel and Carol Parzen Prize for Statistical Innovation. He earned a PhD from Stanford University.





Robert Tibshirani is a professor in the Departments of Statistics and Health Research and Policy at Stanford University. He has authored five books, co-authored three books, and published over 200 research articles. He has made important contributions to the analysis of complex datasets, including the lasso and significance analysis of microarrays (SAM). He also co-authored the first study that linked cell phone usage with car accidents, a widely cited article that has played a role in the introduction of legislation that restricts the use of phones while driving. Professor Tibshirani was a recipient of the prestigious COPSS Presidents Award in 1996 and was elected to the National Academy of Sciences in 2012.





Martin Wainwright is a professor in the Department of Statistics and the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Professor Wainwright is known for theoretical and methodological research at the interface between statistics and computation, with particular emphasis on high-dimensional statistics, machine learning, graphical models, and information theory. He has published over 80 papers and one book in these areas, received the COPSS Presidents Award in 2014, and was a section lecturer at the International Congress of Mathematicians in 2014. He received PhD in EECS from the Massachusetts Institute of Technology (MIT).