Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition Second Edition 2009 [Kõva köide]

4.43/5 (2189 hinnangut Goodreads-ist)

Jerome Friedman, Robert Tibshirani, Trevor Hastie

Formaat: Hardback, 745 pages, kõrgus x laius: 235x155 mm, kaal: 1451 g, 604 Illustrations, color; 54 Illustrations, black and white; XXII, 745 p. 658 illus., 604 illus. in color., 1 Hardback
Sari: Springer Series in Statistics
Ilmumisaeg: 09-Feb-2009
Kirjastus: Springer-Verlag New York Inc.
ISBN-10: 0387848576
ISBN-13: 9780387848570

Teised raamatud teemal:

Artificial intelligence - (Hetkel poes: 4 nimetust)
Data mining - (Hetkel poes: 1 nimetust)
Expert systems / knowledge-based systems
Probability & statistics - (Hetkel poes: 2 nimetust)
Stochastics
Biology, life sciences - (Hetkel poes: 30 nimetust)
Computer science - (Hetkel poes: 7 nimetust)

Kõva köide
Hind: 71,86 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Tavahind: 84,54 €
Säästad 15%
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Hardback, 745 pages, kõrgus x laius: 235x155 mm, kaal: 1451 g, 604 Illustrations, color; 54 Illustrations, black and white; XXII, 745 p. 658 illus., 604 illus. in color., 1 Hardback
Sari: Springer Series in Statistics
Ilmumisaeg: 09-Feb-2009
Kirjastus: Springer-Verlag New York Inc.
ISBN-10: 0387848576
ISBN-13: 9780387848570

Teised raamatud teemal:

Artificial intelligence - (Hetkel poes: 4 nimetust)
Data mining - (Hetkel poes: 1 nimetust)
Expert systems / knowledge-based systems
Probability & statistics - (Hetkel poes: 2 nimetust)
Stochastics
Biology, life sciences - (Hetkel poes: 30 nimetust)
Computer science - (Hetkel poes: 7 nimetust)

Püsilink: https://www.kriso.ee/db/9780387848570.html

Märksõnad:

Supervised learning Machine learning

During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book.This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates.Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

Arvustused

From the reviews: "Like the first edition, the current one is a welcome edition to researchers and academicians equally... Almost all of the chapters are revised... The Material is nicely reorganized and repackaged, with the general layout being the same as that of the first edition... If you bought the first edition, I suggest that you buy the second editon for maximum effect, and if you haven't, then I still strongly recommend you have this book at your desk. Is it a good investment, statistically speaking!" (Book Review Editor, Technometrics, August 2009, VOL. 51, NO. 3) From the reviews of the second edition: "This second edition pays tribute to the many developments in recent years in this field, and new material was added to several existing chapters as well as four new chapters ... were included. ... These additions make this book worthwhile to obtain ... . In general this is a well written book which gives a good overview on statistical learning and can be recommended to everyone interested in this field. The book is so comprehensive that it offers material for several courses." (Klaus Nordhausen, International Statistical Review, Vol. 77 (3), 2009) "The second edition ... features about 200 pages of substantial new additions in the form of four new chapters, as well as various complements to existing chapters. ... the book may also be of interest to a theoretically inclined reader looking for an entry point to the area and wanting to get an initial understanding of which mathematical issues are relevant in relation to practice. ... this is a welcome update to an already fine book, which will surely reinforce its status as a reference." (Gilles Blanchard, Mathematical Reviews, Issue 2012 d) "The book would be ideal for statistics graduate students ... . This book really is the standard in the field, referenced in most papers and books on the subject, and it is easy to see why. The book is very well written, with informative graphics on almost every other page. It looks great and inviting. You can flip the book open to any page, read a sentence or two and be hooked for the next hour or so." (Peter Rabinovitch, The Mathematical Association of America, May, 2012)

Preface to the Second Edition

vii

Preface to the First Edition

Introduction

(8)

Overview of Supervised Learning

(34)

Introduction

(1)

Variable Types and Terminology

(2)

Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors

(7)

Linear Models and Least Squares

(3)

Nearest-Neighbor Methods

(2)

From Least Squares to Nearest Neighbors

(2)

Statistical Decision Theory

(4)

Local Methods in High Dimensions

(6)

Statistical Models, Supervised Learning and Function Approximation

(4)

A Statistical Model for the Joint Distribution Pr(X, Y)

(1)

Supervised Learning

(1)

Function Approximation

(3)

Structured Regression Models

(1)

Difficulty of the Problem

(1)

Classes of Restricted Estimators

(4)

Roughness Penalty and Bayesian Methods

(1)

Kernel Methods and Local Regression

(1)

Basis Functions and Dictionary Methods

(2)

Model Selection and the Bias--Variance Tradeoff

(2)

Bibliographic Notes

(1)

Exercises

(4)

Linear Methods for Regression

(58)

Introduction

(1)

Linear Regression Models and Least Squares

(13)

Example: Prostate Cancer

(2)

The Gauss--Markov Theorem

(1)

Multiple Regression from Simple Univariate Regression

(4)

Multiple Outputs

(1)

Subset Selection

(4)

Best-Subset Selection

(1)

Forward- and Backward-Stepwise Selection

(2)

Forward-Stagewise Regression

(1)

Prostate Cancer Data Example (Continued)

(1)

Shrinkage Methods

(18)

Ridge Regression

(7)

The Lasso

(1)

Discussion: Subset Selection, Ridge Regression and the Lasso

(4)

Least Angle Regression

(6)

Methods Using Derived Input Directions

(3)

Principal Components Regression

(1)

Partial Least Squares

(2)

Discussion: A Comparison of the Selection and Shrinkage Methods

(2)

Multiple Outcome Shrinkage and Selection

(2)

More on the Lasso and Related Path Algorithms

(7)

Incremental Forward Stagewise Regression

(3)

Piecewise-Linear Path Algorithms

(1)

The Dantzig Selector

(1)

The Grouped Lasso

(1)

Further Properties of the Lasso

(1)

Pathwise Coordinate Optimization

(1)

Computational Considerations

(1)

Bibliographic Notes

(1)

Exercises

(7)

Linear Methods for Classification

101

(38)

Introduction

101

(2)

Linear Regression of an Indicator Matrix

103

(3)

Linear Discriminant Analysis

106

(13)

Regularized Discriminant Analysis

112

(1)

Computations for LDA

113

(1)

Reduced-Rank Linear Discriminant Analysis

113

(6)

Logistic Regression

119

(10)

Fitting Logistic Regression Models

120

(2)

Example: South African Heart Disease

122

(2)

Quadratic Approximations and Inference

124

(1)

L1 Regularized Logistic Regression

125

(2)

Logistic Regression or LDA?

127

(2)

Separating Hyperplanes

129

(6)

Rosenblatt's Perceptron Learning Algorithm

130

(2)

Optimal Separating Hyperplanes

132

(3)

Bibliographic Notes

135

(1)

Exercises

135

(4)

Basis Expansions and Regularization

139

(52)

Introduction

139

(2)

Piecewise Polynomials and Splines

141

(9)

Natural Cubic Splines

144

(2)

Example: South African Heart Disease (Continued)

146

(2)

Example: Phoneme Recognition

148

(2)

Filtering and Feature Extraction

150

(1)

Smoothing Splines

151

(5)

Degrees of Freedom and Smoother Matrices

153

(3)

Automatic Selection of the Smoothing Parameters

156

(5)

Fixing the Degrees of Freedom

158

(1)

The Bias--Variance Tradeoff

158

(3)

Nonparametric Logistic Regression

161

(1)

Multidimensional Splines

162

(5)

Regularization and Reproducing Kernel Hilbert Spaces

167

(7)

Spaces of Functions Generated by Kernels

168

(2)

Examples of RKHS

170

(4)

Wavelet Smoothing

174

(7)

Wavelet Bases and the Wavelet Transform

176

(3)

Adaptive Wavelet Filtering

179

(2)

Bibliographic Notes

181

(1)

Exercises

181

(5)

Appendix: Computational Considerations for Splines

186

(5)

Appendix: B-splines

186

(3)

Appendix: Computations for Smoothing Splines

189

(2)

Kernel Smoothing Methods

191

(28)

One-Dimensional Kernel Smoothers

192

(6)

Local Linear Regression

194

(3)

Local Polynomial Regression

197

(1)

Selecting the Width of the Kernel

198

(2)

Local Regression in IRp

200

(1)

Structured Local Regression Models in IRp

201

(4)

Structured Kernels

203

(1)

Structured Regression Functions

203

(2)

Local Likelihood and Other Models

205

(3)

Kernel Density Estimation and Classification

208

(4)

Kernel Density Estimation

208

(2)

Kernel Density Classification

210

(1)

The Naive Bayes Classifier

210

(2)

Radial Basis Functions and Kernels

212

(2)

Mixture Models for Density Estimation and Classification

214

(2)

Computational Considerations

216

(1)

Bibliographic Notes

216

(1)

Exercises

216

(3)

Model Assessment and Selection

219

(42)

Introduction

219

(1)

Bias, Variance and Model Complexity

219

(4)

The Bias--Variance Decomposition

223

(5)

Example: Bias--Variance Tradeoff

226

(2)

Optimism of the Training Error Rate

228

(2)

Estimates of In-Sample Prediction Error

230

(2)

The Effective Number of Parameters

232

(1)

The Bayesian Approach and BIC

233

(2)

Minimum Description Length

235

(2)

Vapnik--Chervonenkis Dimension

237

(4)

Example (Continued)

239

(2)

Cross-Validation

241

(8)

K-Fold Cross-Validation

241

(4)

The Wrong and Right Way to Do Cross-validation

245

(2)

Does Cross-Validation Really Work?

247

(2)

Bootstrap Methods

249

(5)

Example (Continued)

252

(2)

Conditional or Expected Test Error?

254

(3)

Bibliographic Notes

257

(1)

Exercises

257

(4)

Model Inference and Averaging

261

(34)

Introduction

261

(1)

The Bootstrap and Maximum Likelihood Methods

261

(6)

A Smoothing Example

261

(4)

Maximum Likelihood Inference

265

(2)

Bootstrap versus Maximum Likelihood

267

(1)

Bayesian Methods

267

(4)

Relationship Between the Bootstrap and Bayesian Inference

271

(1)

The EM Algorithm

272

(7)

Two-Component Mixture Model

272

(4)

The EM Algorithm in General

276

(1)

EM as a Maximization--Maximization Procedure

277

(2)

MCMC for Sampling from the Posterior

279

(3)

Baggin

282

(6)

Example: Trees with Simulated Data

283

(5)

Model Averaging and Stacking

288

(2)

Stochastic Search: Bumping

290

(2)

Bibliographic Notes

292

(1)

Exercises

293

(2)

Additive Models, Trees, and Related Methods

295

(42)

Generalized Additive Models

295

(10)

Fitting Additive Models

297

(2)

Example: Additive Logistic Regression

299

(5)

Summary

304

(1)

Tree-Based Methods

305

(12)

Background

305

(2)

Regression Trees

307

(1)

Classification Trees

308

(2)

Other Issues

310

(3)

Spam Example (Continued)

313

(4)

PRIM: Bump Hunting

317

(4)

Spam Example (Continued)

320

(1)

MARS: Multivariate Adaptive Regression Splines

321

(8)

Spam Example (Continued)

326

(1)

Example (Simulated Data)

327

(1)

Other Issues

328

(1)

Hierarchical Mixtures of Experts

329

(3)

Missing Data

332

(2)

Computational Considerations

334

(1)

Bibliographic Notes

334

(1)

Exercises

335

(2)

Boosting and Additive Trees

337

(52)

Boosting Methods

337

(4)

Outline of This
Chapter

340

(1)

Boosting Fits an Additive Model

341

(1)

Forward Stagewise Additive Modeling

342

(1)

Exponential Loss and AdaBoost

343

(2)

Why Exponential Loss?

345

(1)

Loss Functions and Robustness

346

(4)

``Off-the-Shelf'' Procedures for Data Mining

350

(2)

Example: Spam Data

352

(1)

Boosting Trees

353

(5)

Numerical Optimization via Gradient Boosting

358

(3)

Steepest Descent

358

(1)

Gradient Boosting

359

(1)

Implementations of Gradient Boosting

360

(1)

Right-Sized Trees for Boosting

361

(3)

Regularization

364

(3)

Shrinkage

364

(1)

Subsampling

365

(2)

Interpretation

367

(4)

Relative Importance of Predictor Variables

367

(2)

Partial Dependence Plots

369

(2)

Illustrations

371

(9)

California Housing

371

(4)

New Zealand Fish

375

(4)

Demographics Data

379

(1)

Bibliographic Notes

380

(4)

Exercises

384

(5)

Neural Networks

389

(28)

Introduction

389

(1)

Projection Pursuit Regression

389

(3)

Neural Networks

392

(3)

Fitting Neural Networks

395

(2)

Some Issues in Training Neural Networks

397

(4)

Starting Values

397

(1)

Overfitting

398

(1)

Scaling of the Inputs

398

(2)

Number of Hidden Units and Layers

400

(1)

Multiple Minima

400

(1)

Example: Simulated Data

401

(3)

Example: ZIP Code Data

404

(4)

Discussion

408

(1)

Bayesian Neural Nets and the NIPS 2003 Challenge

409

(5)

Bayes, Boosting and Bagging

410

(2)

Performance Comparisons

412

(2)

Computational Considerations

414

(1)

Bibliographic Notes

415

(1)

Exercises

415

(2)

Support Vector Machines and Flexible Discriminants

417

(42)

Introduction

417

(1)

The Support Vector Classifier

417

(6)

Computing the Support Vector Classifier

420

(1)

Mixture Example (Continued)

421

(2)

Support Vector Machines and Kernels

423

(15)

Computing the SVM for Classification

423

(3)

The SVM as a Penalization Method

426

(2)

Function Estimation and Reproducing Kernels

428

(3)

SVMs and the Curse of Dimensionality

431

(1)

A Path Algorithm for the SVM Classifier

432

(2)

Support Vector Machines for Regression

434

(2)

Regression and Kernels

436

(2)

Discussion

438

(1)

Generalizing Linear Discriminant Analysis

438

(2)

Flexible Discriminant Analysis

440

(6)

Computing the FDA Estimates

444

(2)

Penalized Discriminant Analysis

446

(3)

Mixture Discriminant Analysis

449

(6)

Example: Waveform Data

451

(4)

Bibliographic Notes

455

(1)

Exercises

455

(4)

Prototype Methods and Nearest-Neighbors

459

(26)

Introduction

459

(1)

Prototype Methods

459

(4)

K-means Clustering

460

(2)

Learning Vector Quantization

462

(1)

Gaussian Mixtures

463

(1)

k-Nearest-Neighbor Classifiers

463

(12)

Example: A Comparative Study

468

(2)

Example: k-Nearest-Neighbors and Image Scene Classification

470

(1)

Invariant Metrics and Tangent Distance

471

(4)

Adaptive Nearest-Neighbor Methods

475

(5)

Example

478

(1)

Global Dimension Reduction for Nearest-Neighbors

479

(1)

Computational Considerations

480

(1)

Bibliographic Notes

481

(1)

Exercises

481

(4)

Unsupervised Learning

485

(102)

Introduction

485

(2)

Association Rules

487

(14)

Market Basket Analysis

488

(1)

The Apriori Algorithm

489

(3)

Example: Market Basket Analysis

492

(3)

Unsupervised as Supervised Learning

495

(2)

Generalized Association Rules

497

(2)

Choice of Supervised Learning Method

499

(1)

Example: Market Basket Analysis (Continued)

499

(2)

Cluster Analysis

501

(27)

Proximity Matrices

503

(1)

Dissimilarities Based on Attributes

503

(2)

Object Dissimilarity

505

(2)

Clustering Algorithms

507

(1)

Combinatorial Algorithms

507

(2)

K-means

509

(1)

Gaussian Mixtures as Soft K-means Clustering

510

(2)

Example: Human Tumor Microarray Data

512

(2)

Vector Quantization

514

(1)

K-medoids

515

(3)

Practical Issues

518

(2)

Hierarchical Clustering

520

(8)

Self-Organizing Maps

528

(6)

Principal Components, Curves and Surfaces

534

(19)

Principal Components

534

(7)

Principal Curves and Surfaces

541

(3)

Spectral Clustering

544

(3)

Kernel Principal Components

547

(3)

Sparse Principal Components

550

(3)

Non-negative Matrix Factorization

553

(4)

Archetypal Analysis

554

(3)

Independent Component Analysis and Exploratory Projection Pursuit

557

(13)

Latent Variables and Factor Analysis

558

(2)

Independent Component Analysis

560

(5)

Exploratory Projection Pursuit

565

(1)

A Direct Approach to ICA

565

(5)

Multidimensional Scaling

570

(2)

Nonlinear Dimension Reduction and Local Multidimensional Scaling

572

(4)

The Google PageRank Algorithm

576

(2)

Bibliographic Notes

578

(1)

Exercises

579

(8)

Random Forests

587

(18)

Introduction

587

(1)

Definition of Random Forests

587

(5)

Details of Random Forests

592

(5)

Out of Bag Samples

592

(1)

Variable Importance

593

(2)

Proximity Plots

595

(1)

Random Forests and Overfitting

596

(1)

Analysis of Random Forests

597

(5)

Variance and the De-Correlation Effect

597

(3)

Bias

600

(1)

Adaptive Nearest Neighbors

601

(1)

Bibliographic Notes

602

(1)

Exercises

603

(2)

Ensemble Learning

605

(20)

Introduction

605

(2)

Boosting and Regularization Paths

607

(9)

Penalized Regression

607

(3)

The ``Bet on Sparsity'' Principle

610

(3)

Regularization Paths, Over-fitting and Margins

613

(3)

Learning Ensembles

616

(7)

Learning a Good Ensemble

617

(5)

Rule Ensembles

622

(1)

Bibliographic Notes

623

(1)

Exercises

624

(1)

Undirected Graphical Models

625

(24)

Introduction

625

(2)

Markov Graphs and Their Properties

627

(3)

Undirected Graphical Models for Continuous Variables

630

(8)

Estimation of the Parameters when the Graph Structure is Known

631

(4)

Estimation of the Graph Structure

635

(3)

Undirected Graphical Models for Discrete Variables

638

(7)

Estimation of the Parameters when the Graph Structure is Known

639

(2)

Hidden Nodes

641

(1)

Estimation of the Graph Structure

642

(1)

Restricted Boltzmann Machines

643

(2)

Exercises

645

(4)

High-Dimensional Problems: p » N

649

(50)

When p is Much Bigger than N

649

(2)

Diagonal Linear Discriminant Analysis and Nearest Shrunken Centroids

651

(3)

Linear Classifiers with Quadratic Regularization

654

(7)

Regularized Discriminant Analysis

656

(1)

Logistic Regression with Quadratic Regularization

657

(1)

The Support Vector Classifier

657

(1)

Feature Selection

658

(1)

Computational Shortcuts When p » N

659

(2)

Linear Classifiers with L1 Regularization

661

(7)

Application of Lasso to Protein Mass Spectroscopy

664

(2)

The Fused Lasso for Functional Data

666

(2)

Classification When Features are Unavailable

668

(6)

Example: String Kernels and Protein Classification

668

(2)

Classification and Other Models Using Inner-Product Kernels and Pairwise Distances

670

(2)

Example: Abstracts Classification

672

(2)

High-Dimensional Regression: Supervised Principal Components

674

(9)

Connection to Latent-Variable Modeling

678

(2)

Relationship with Partial Least Squares

680

(1)

Pre-Conditioning for Feature Selection

681

(2)

Feature Assessment and the Multiple-Testing Problem

683

(10)

The False Discovery Rate

687

(3)

Asymmetric Cutpoints and the SAM Procedure

690

(2)

A Bayesian Interpretation of the FDR

692

(1)

Bibliographic Notes

693

(1)

Exercises

694

(5)

References

699

(30)

Author Index

729

(8)

Index

737

Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition Second Edition 2009 [Kõva köide]

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv