Muutke küpsiste eelistusi

Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition Second Edition 2009 [Kõva köide]

4.43/5 (2189 hinnangut Goodreads-ist)
  • Formaat: Hardback, 745 pages, kõrgus x laius: 235x155 mm, kaal: 1451 g, 604 Illustrations, color; 54 Illustrations, black and white; XXII, 745 p. 658 illus., 604 illus. in color., 1 Hardback
  • Sari: Springer Series in Statistics
  • Ilmumisaeg: 09-Feb-2009
  • Kirjastus: Springer-Verlag New York Inc.
  • ISBN-10: 0387848576
  • ISBN-13: 9780387848570
  • Kõva köide
  • Hind: 71,86 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 84,54 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Hardback, 745 pages, kõrgus x laius: 235x155 mm, kaal: 1451 g, 604 Illustrations, color; 54 Illustrations, black and white; XXII, 745 p. 658 illus., 604 illus. in color., 1 Hardback
  • Sari: Springer Series in Statistics
  • Ilmumisaeg: 09-Feb-2009
  • Kirjastus: Springer-Verlag New York Inc.
  • ISBN-10: 0387848576
  • ISBN-13: 9780387848570
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book.This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates.Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

Arvustused

From the reviews: "Like the first edition, the current one is a welcome edition to researchers and academicians equally... Almost all of the chapters are revised... The Material is nicely reorganized and repackaged, with the general layout being the same as that of the first edition... If you bought the first edition, I suggest that you buy the second editon for maximum effect, and if you haven't, then I still strongly recommend you have this book at your desk. Is it a good investment, statistically speaking!" (Book Review Editor, Technometrics, August 2009, VOL. 51, NO. 3) From the reviews of the second edition: "This second edition pays tribute to the many developments in recent years in this field, and new material was added to several existing chapters as well as four new chapters ... were included. ... These additions make this book worthwhile to obtain ... . In general this is a well written book which gives a good overview on statistical learning and can be recommended to everyone interested in this field. The book is so comprehensive that it offers material for several courses." (Klaus Nordhausen, International Statistical Review, Vol. 77 (3), 2009) "The second edition ... features about 200 pages of substantial new additions in the form of four new chapters, as well as various complements to existing chapters. ... the book may also be of interest to a theoretically inclined reader looking for an entry point to the area and wanting to get an initial understanding of which mathematical issues are relevant in relation to practice. ... this is a welcome update to an already fine book, which will surely reinforce its status as a reference." (Gilles Blanchard, Mathematical Reviews, Issue 2012 d) "The book would be ideal for statistics graduate students ... . This book really is the standard in the field, referenced in most papers and books on the subject, and it is easy to see why. The book is very well written, with informative graphics on almost every other page. It looks great and inviting. You can flip the book open to any page, read a sentence or two and be hooked for the next hour or so." (Peter Rabinovitch, The Mathematical Association of America, May, 2012)

Preface to the Second Edition vii
Preface to the First Edition xi
Introduction
1(8)
Overview of Supervised Learning
9(34)
Introduction
9(1)
Variable Types and Terminology
9(2)
Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors
11(7)
Linear Models and Least Squares
11(3)
Nearest-Neighbor Methods
14(2)
From Least Squares to Nearest Neighbors
16(2)
Statistical Decision Theory
18(4)
Local Methods in High Dimensions
22(6)
Statistical Models, Supervised Learning and Function Approximation
28(4)
A Statistical Model for the Joint Distribution Pr(X, Y)
28(1)
Supervised Learning
29(1)
Function Approximation
29(3)
Structured Regression Models
32(1)
Difficulty of the Problem
32(1)
Classes of Restricted Estimators
33(4)
Roughness Penalty and Bayesian Methods
34(1)
Kernel Methods and Local Regression
34(1)
Basis Functions and Dictionary Methods
35(2)
Model Selection and the Bias--Variance Tradeoff
37(2)
Bibliographic Notes
39(1)
Exercises
39(4)
Linear Methods for Regression
43(58)
Introduction
43(1)
Linear Regression Models and Least Squares
44(13)
Example: Prostate Cancer
49(2)
The Gauss--Markov Theorem
51(1)
Multiple Regression from Simple Univariate Regression
52(4)
Multiple Outputs
56(1)
Subset Selection
57(4)
Best-Subset Selection
57(1)
Forward- and Backward-Stepwise Selection
58(2)
Forward-Stagewise Regression
60(1)
Prostate Cancer Data Example (Continued)
61(1)
Shrinkage Methods
61(18)
Ridge Regression
61(7)
The Lasso
68(1)
Discussion: Subset Selection, Ridge Regression and the Lasso
69(4)
Least Angle Regression
73(6)
Methods Using Derived Input Directions
79(3)
Principal Components Regression
79(1)
Partial Least Squares
80(2)
Discussion: A Comparison of the Selection and Shrinkage Methods
82(2)
Multiple Outcome Shrinkage and Selection
84(2)
More on the Lasso and Related Path Algorithms
86(7)
Incremental Forward Stagewise Regression
86(3)
Piecewise-Linear Path Algorithms
89(1)
The Dantzig Selector
89(1)
The Grouped Lasso
90(1)
Further Properties of the Lasso
91(1)
Pathwise Coordinate Optimization
92(1)
Computational Considerations
93(1)
Bibliographic Notes
94(1)
Exercises
94(7)
Linear Methods for Classification
101(38)
Introduction
101(2)
Linear Regression of an Indicator Matrix
103(3)
Linear Discriminant Analysis
106(13)
Regularized Discriminant Analysis
112(1)
Computations for LDA
113(1)
Reduced-Rank Linear Discriminant Analysis
113(6)
Logistic Regression
119(10)
Fitting Logistic Regression Models
120(2)
Example: South African Heart Disease
122(2)
Quadratic Approximations and Inference
124(1)
L1 Regularized Logistic Regression
125(2)
Logistic Regression or LDA?
127(2)
Separating Hyperplanes
129(6)
Rosenblatt's Perceptron Learning Algorithm
130(2)
Optimal Separating Hyperplanes
132(3)
Bibliographic Notes
135(1)
Exercises
135(4)
Basis Expansions and Regularization
139(52)
Introduction
139(2)
Piecewise Polynomials and Splines
141(9)
Natural Cubic Splines
144(2)
Example: South African Heart Disease (Continued)
146(2)
Example: Phoneme Recognition
148(2)
Filtering and Feature Extraction
150(1)
Smoothing Splines
151(5)
Degrees of Freedom and Smoother Matrices
153(3)
Automatic Selection of the Smoothing Parameters
156(5)
Fixing the Degrees of Freedom
158(1)
The Bias--Variance Tradeoff
158(3)
Nonparametric Logistic Regression
161(1)
Multidimensional Splines
162(5)
Regularization and Reproducing Kernel Hilbert Spaces
167(7)
Spaces of Functions Generated by Kernels
168(2)
Examples of RKHS
170(4)
Wavelet Smoothing
174(7)
Wavelet Bases and the Wavelet Transform
176(3)
Adaptive Wavelet Filtering
179(2)
Bibliographic Notes
181(1)
Exercises
181(5)
Appendix: Computational Considerations for Splines
186(5)
Appendix: B-splines
186(3)
Appendix: Computations for Smoothing Splines
189(2)
Kernel Smoothing Methods
191(28)
One-Dimensional Kernel Smoothers
192(6)
Local Linear Regression
194(3)
Local Polynomial Regression
197(1)
Selecting the Width of the Kernel
198(2)
Local Regression in IRp
200(1)
Structured Local Regression Models in IRp
201(4)
Structured Kernels
203(1)
Structured Regression Functions
203(2)
Local Likelihood and Other Models
205(3)
Kernel Density Estimation and Classification
208(4)
Kernel Density Estimation
208(2)
Kernel Density Classification
210(1)
The Naive Bayes Classifier
210(2)
Radial Basis Functions and Kernels
212(2)
Mixture Models for Density Estimation and Classification
214(2)
Computational Considerations
216(1)
Bibliographic Notes
216(1)
Exercises
216(3)
Model Assessment and Selection
219(42)
Introduction
219(1)
Bias, Variance and Model Complexity
219(4)
The Bias--Variance Decomposition
223(5)
Example: Bias--Variance Tradeoff
226(2)
Optimism of the Training Error Rate
228(2)
Estimates of In-Sample Prediction Error
230(2)
The Effective Number of Parameters
232(1)
The Bayesian Approach and BIC
233(2)
Minimum Description Length
235(2)
Vapnik--Chervonenkis Dimension
237(4)
Example (Continued)
239(2)
Cross-Validation
241(8)
K-Fold Cross-Validation
241(4)
The Wrong and Right Way to Do Cross-validation
245(2)
Does Cross-Validation Really Work?
247(2)
Bootstrap Methods
249(5)
Example (Continued)
252(2)
Conditional or Expected Test Error?
254(3)
Bibliographic Notes
257(1)
Exercises
257(4)
Model Inference and Averaging
261(34)
Introduction
261(1)
The Bootstrap and Maximum Likelihood Methods
261(6)
A Smoothing Example
261(4)
Maximum Likelihood Inference
265(2)
Bootstrap versus Maximum Likelihood
267(1)
Bayesian Methods
267(4)
Relationship Between the Bootstrap and Bayesian Inference
271(1)
The EM Algorithm
272(7)
Two-Component Mixture Model
272(4)
The EM Algorithm in General
276(1)
EM as a Maximization--Maximization Procedure
277(2)
MCMC for Sampling from the Posterior
279(3)
Baggin
282(6)
Example: Trees with Simulated Data
283(5)
Model Averaging and Stacking
288(2)
Stochastic Search: Bumping
290(2)
Bibliographic Notes
292(1)
Exercises
293(2)
Additive Models, Trees, and Related Methods
295(42)
Generalized Additive Models
295(10)
Fitting Additive Models
297(2)
Example: Additive Logistic Regression
299(5)
Summary
304(1)
Tree-Based Methods
305(12)
Background
305(2)
Regression Trees
307(1)
Classification Trees
308(2)
Other Issues
310(3)
Spam Example (Continued)
313(4)
PRIM: Bump Hunting
317(4)
Spam Example (Continued)
320(1)
MARS: Multivariate Adaptive Regression Splines
321(8)
Spam Example (Continued)
326(1)
Example (Simulated Data)
327(1)
Other Issues
328(1)
Hierarchical Mixtures of Experts
329(3)
Missing Data
332(2)
Computational Considerations
334(1)
Bibliographic Notes
334(1)
Exercises
335(2)
Boosting and Additive Trees
337(52)
Boosting Methods
337(4)
Outline of This
Chapter
340(1)
Boosting Fits an Additive Model
341(1)
Forward Stagewise Additive Modeling
342(1)
Exponential Loss and AdaBoost
343(2)
Why Exponential Loss?
345(1)
Loss Functions and Robustness
346(4)
``Off-the-Shelf'' Procedures for Data Mining
350(2)
Example: Spam Data
352(1)
Boosting Trees
353(5)
Numerical Optimization via Gradient Boosting
358(3)
Steepest Descent
358(1)
Gradient Boosting
359(1)
Implementations of Gradient Boosting
360(1)
Right-Sized Trees for Boosting
361(3)
Regularization
364(3)
Shrinkage
364(1)
Subsampling
365(2)
Interpretation
367(4)
Relative Importance of Predictor Variables
367(2)
Partial Dependence Plots
369(2)
Illustrations
371(9)
California Housing
371(4)
New Zealand Fish
375(4)
Demographics Data
379(1)
Bibliographic Notes
380(4)
Exercises
384(5)
Neural Networks
389(28)
Introduction
389(1)
Projection Pursuit Regression
389(3)
Neural Networks
392(3)
Fitting Neural Networks
395(2)
Some Issues in Training Neural Networks
397(4)
Starting Values
397(1)
Overfitting
398(1)
Scaling of the Inputs
398(2)
Number of Hidden Units and Layers
400(1)
Multiple Minima
400(1)
Example: Simulated Data
401(3)
Example: ZIP Code Data
404(4)
Discussion
408(1)
Bayesian Neural Nets and the NIPS 2003 Challenge
409(5)
Bayes, Boosting and Bagging
410(2)
Performance Comparisons
412(2)
Computational Considerations
414(1)
Bibliographic Notes
415(1)
Exercises
415(2)
Support Vector Machines and Flexible Discriminants
417(42)
Introduction
417(1)
The Support Vector Classifier
417(6)
Computing the Support Vector Classifier
420(1)
Mixture Example (Continued)
421(2)
Support Vector Machines and Kernels
423(15)
Computing the SVM for Classification
423(3)
The SVM as a Penalization Method
426(2)
Function Estimation and Reproducing Kernels
428(3)
SVMs and the Curse of Dimensionality
431(1)
A Path Algorithm for the SVM Classifier
432(2)
Support Vector Machines for Regression
434(2)
Regression and Kernels
436(2)
Discussion
438(1)
Generalizing Linear Discriminant Analysis
438(2)
Flexible Discriminant Analysis
440(6)
Computing the FDA Estimates
444(2)
Penalized Discriminant Analysis
446(3)
Mixture Discriminant Analysis
449(6)
Example: Waveform Data
451(4)
Bibliographic Notes
455(1)
Exercises
455(4)
Prototype Methods and Nearest-Neighbors
459(26)
Introduction
459(1)
Prototype Methods
459(4)
K-means Clustering
460(2)
Learning Vector Quantization
462(1)
Gaussian Mixtures
463(1)
k-Nearest-Neighbor Classifiers
463(12)
Example: A Comparative Study
468(2)
Example: k-Nearest-Neighbors and Image Scene Classification
470(1)
Invariant Metrics and Tangent Distance
471(4)
Adaptive Nearest-Neighbor Methods
475(5)
Example
478(1)
Global Dimension Reduction for Nearest-Neighbors
479(1)
Computational Considerations
480(1)
Bibliographic Notes
481(1)
Exercises
481(4)
Unsupervised Learning
485(102)
Introduction
485(2)
Association Rules
487(14)
Market Basket Analysis
488(1)
The Apriori Algorithm
489(3)
Example: Market Basket Analysis
492(3)
Unsupervised as Supervised Learning
495(2)
Generalized Association Rules
497(2)
Choice of Supervised Learning Method
499(1)
Example: Market Basket Analysis (Continued)
499(2)
Cluster Analysis
501(27)
Proximity Matrices
503(1)
Dissimilarities Based on Attributes
503(2)
Object Dissimilarity
505(2)
Clustering Algorithms
507(1)
Combinatorial Algorithms
507(2)
K-means
509(1)
Gaussian Mixtures as Soft K-means Clustering
510(2)
Example: Human Tumor Microarray Data
512(2)
Vector Quantization
514(1)
K-medoids
515(3)
Practical Issues
518(2)
Hierarchical Clustering
520(8)
Self-Organizing Maps
528(6)
Principal Components, Curves and Surfaces
534(19)
Principal Components
534(7)
Principal Curves and Surfaces
541(3)
Spectral Clustering
544(3)
Kernel Principal Components
547(3)
Sparse Principal Components
550(3)
Non-negative Matrix Factorization
553(4)
Archetypal Analysis
554(3)
Independent Component Analysis and Exploratory Projection Pursuit
557(13)
Latent Variables and Factor Analysis
558(2)
Independent Component Analysis
560(5)
Exploratory Projection Pursuit
565(1)
A Direct Approach to ICA
565(5)
Multidimensional Scaling
570(2)
Nonlinear Dimension Reduction and Local Multidimensional Scaling
572(4)
The Google PageRank Algorithm
576(2)
Bibliographic Notes
578(1)
Exercises
579(8)
Random Forests
587(18)
Introduction
587(1)
Definition of Random Forests
587(5)
Details of Random Forests
592(5)
Out of Bag Samples
592(1)
Variable Importance
593(2)
Proximity Plots
595(1)
Random Forests and Overfitting
596(1)
Analysis of Random Forests
597(5)
Variance and the De-Correlation Effect
597(3)
Bias
600(1)
Adaptive Nearest Neighbors
601(1)
Bibliographic Notes
602(1)
Exercises
603(2)
Ensemble Learning
605(20)
Introduction
605(2)
Boosting and Regularization Paths
607(9)
Penalized Regression
607(3)
The ``Bet on Sparsity'' Principle
610(3)
Regularization Paths, Over-fitting and Margins
613(3)
Learning Ensembles
616(7)
Learning a Good Ensemble
617(5)
Rule Ensembles
622(1)
Bibliographic Notes
623(1)
Exercises
624(1)
Undirected Graphical Models
625(24)
Introduction
625(2)
Markov Graphs and Their Properties
627(3)
Undirected Graphical Models for Continuous Variables
630(8)
Estimation of the Parameters when the Graph Structure is Known
631(4)
Estimation of the Graph Structure
635(3)
Undirected Graphical Models for Discrete Variables
638(7)
Estimation of the Parameters when the Graph Structure is Known
639(2)
Hidden Nodes
641(1)
Estimation of the Graph Structure
642(1)
Restricted Boltzmann Machines
643(2)
Exercises
645(4)
High-Dimensional Problems: p » N
649(50)
When p is Much Bigger than N
649(2)
Diagonal Linear Discriminant Analysis and Nearest Shrunken Centroids
651(3)
Linear Classifiers with Quadratic Regularization
654(7)
Regularized Discriminant Analysis
656(1)
Logistic Regression with Quadratic Regularization
657(1)
The Support Vector Classifier
657(1)
Feature Selection
658(1)
Computational Shortcuts When p » N
659(2)
Linear Classifiers with L1 Regularization
661(7)
Application of Lasso to Protein Mass Spectroscopy
664(2)
The Fused Lasso for Functional Data
666(2)
Classification When Features are Unavailable
668(6)
Example: String Kernels and Protein Classification
668(2)
Classification and Other Models Using Inner-Product Kernels and Pairwise Distances
670(2)
Example: Abstracts Classification
672(2)
High-Dimensional Regression: Supervised Principal Components
674(9)
Connection to Latent-Variable Modeling
678(2)
Relationship with Partial Least Squares
680(1)
Pre-Conditioning for Feature Selection
681(2)
Feature Assessment and the Multiple-Testing Problem
683(10)
The False Discovery Rate
687(3)
Asymmetric Cutpoints and the SAM Procedure
690(2)
A Bayesian Interpretation of the FDR
692(1)
Bibliographic Notes
693(1)
Exercises
694(5)
References 699(30)
Author Index 729(8)
Index 737
Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.