Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Introduction to Statistical Learning: with Applications in R Second Edition 2021 [Pehme köide]

4.59/5 (2299 hinnangut Goodreads-ist)

Trevor Hastie, Daniela Witten, Robert Tibshirani, Gareth James

Formaat: Paperback / softback, 607 pages, kõrgus x laius: 235x155 mm, kaal: 943 g, 182 Illustrations, color; 9 Illustrations, black and white; XV, 607 p. 191 illus., 182 illus. in color., 1 Paperback / softback
Sari: Springer Texts in Statistics
Ilmumisaeg: 30-Jul-2022
Kirjastus: Springer-Verlag New York Inc.
ISBN-10: 1071614207
ISBN-13: 9781071614204

Teised raamatud teemal:

Probability & statistics - (Hetkel poes: 2 nimetust)
Mathematical & statistical software - (Hetkel poes: 1 nimetust)
Artificial intelligence - (Hetkel poes: 4 nimetust)

Pehme köide
Hind: 57,96 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Tavahind: 68,19 €
Säästad 15%
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 607 pages, kõrgus x laius: 235x155 mm, kaal: 943 g, 182 Illustrations, color; 9 Illustrations, black and white; XV, 607 p. 191 illus., 182 illus. in color., 1 Paperback / softback
Sari: Springer Texts in Statistics
Ilmumisaeg: 30-Jul-2022
Kirjastus: Springer-Verlag New York Inc.
ISBN-10: 1071614207
ISBN-13: 9781071614204

Teised raamatud teemal:

Probability & statistics - (Hetkel poes: 2 nimetust)
Mathematical & statistical software - (Hetkel poes: 1 nimetust)
Artificial intelligence - (Hetkel poes: 4 nimetust)

Püsilink: https://www.kriso.ee/db/9781071614204.html

Märksõnad:

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, deep learning, survival analysis, multiple testing, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.

Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.

This Second Edition features new chapters on deep learning, survival analysis, and multiple testing, as well as expanded treatments of naļve Bayes, generalized linear models, Bayesian additive regression trees, and matrix completion. R code has been updated throughout to ensure compatibility.

Preface

vii

1 Introduction

(14)

2 Statistical Learning

(44)

2.1 What Is Statistical Learning?

(14)

2.1.1 Why Estimate f?

(4)

2.1.2 How Do We Estimate f?

(3)

2.1.3 The Trade-Off Between Prediction Accuracy and Model Interpretability

(2)

2.1.4 Supervised Versus Unsupervised Learning

(2)

2.1.5 Regression Versus Classification Problems

(1)

2.2 Assessing Model Accuracy

(13)

2.2.1 Measuring the Quality of Fit

(4)

2.2.2 The Bias-Variance Trade-Off

(4)

2.2.3 The Classification Setting

(5)

2.3 Lab: Introduction to R

(10)

2.3.1 Basic Commands

(2)

2.3.2 Graphics

(2)

2.3.3 Indexing Data

(1)

2.3.4 Loading Data

(2)

2.3.5 Additional Graphical and Numerical Summaries

(2)

2.4 Exercises

(7)

3 Linear Regression

(70)

3.1 Simple Linear Regression

(11)

3.1.1 Estimating the Coefficients

(2)

3.1.2 Assessing the Accuracy of the Coefficient Estimates

(5)

3.1.3 Assessing the Accuracy of the Model

(3)

3.2 Multiple Linear Regression

(12)

3.2.1 Estimating the Regression Coefficients

(3)

3.2.2 Some Important Questions

(8)

3.3 Other Considerations in the Regression Model

(20)

3.3.1 Qualitative Predictors

(4)

3.3.2 Extensions of the Linear Model

(5)

3.3.3 Potential Problems

(11)

3.4 The Marketing Plan

103

(2)

3.5 Comparison of Linear Regression with K-Nearest Neighbors

105

(5)

3.6 Lab: Linear Regression

110

(11)

3.6.1 Libraries

110

(1)

3.6.2 Simple Linear Regression

111

(3)

3.6.3 Multiple Linear Regression

114

(2)

3.6.4 Interaction Terms

116

(1)

3.6.5 Non-linear Transformations of the Predictors

116

(3)

3.6.6 Qualitative Predictors

119

(1)

3.6.7 Writing Functions

120

(1)

3.7 Exercises

121

(8)

4 Classification

129

(68)

4.1 An Overview of Classification

130

(1)

4.2 Why Not Linear Regression?

131

(2)

4.3 Logistic Regression

133

(8)

4.3.1 The Logistic Model

133

(2)

4.3.2 Estimating the Regression Coefficients

135

(1)

4.3.3 Making Predictions

136

(1)

4.3.4 Multiple Logistic Regression

137

(3)

4.3.5 Multinomial Logistic Regression

140

(1)

4.4 Generative Models for Classification

141

(17)

4.4.1 Linear Discriminant Analysis for p = 1

142

(3)

4.4.2 Linear Discriminant Analysis for p > 1

145

(7)

4.4.3 Quadratic Discriminant Analysis

152

(1)

4.4.4 Naive Bayes

153

(5)

4.5 A Comparison of Classification Methods

158

(6)

4.5.1 An Analytical Comparison

158

(3)

4.5.2 An Empirical Comparison

161

(3)

4.6 Generalized Linear Models

164

(7)

4.6.1 Linear Regression on the Bikeshare Data

164

(3)

4.6.2 Poisson Regression on the Bikeshare Data

167

(3)

4.6.3 Generalized Linear Models in Greater Generality

170

(1)

4.7 Lab: Classification Methods

171

(18)

4.7.1 The Stock Market Data

171

(1)

4.7.2 Logistic Regression

172

(5)

4.7.3 Linear Discriminant Analysis

177

(2)

4.7.4 Quadratic Discriminant Analysis

179

(1)

4.7.5 Naive Bayes

180

(1)

4.7.6 K-Nearest Neighbors

181

(4)

4.7.7 Poisson Regression

185

(4)

4.8 Exercises

189

(8)

5 Resampling Methods

197

(28)

5.1 Cross-Validation

198

(11)

5.1.1 The Validation Set Approach

198

(2)

5.1.2 Leave-One-Out Cross-Validation

200

(3)

5.1.3 K-Fold Cross-Validation

203

(2)

5.1.4 Bias-Variance Trade-Off for k-Fold Cross-Validation

205

(1)

5.1.5 Cross-Validation on Classification Problems

206

(3)

5.2 The Bootstrap

209

(3)

5.3 Lab: Cross-Validation and the Bootstrap

212

(7)

5.3.1 The Validation Set Approach

213

(1)

5.3.2 Leave-One-Out Cross-Validation

214

(1)

5.3.3 FC-Fold Cross-Validation

215

(1)

5.3.4 The Bootstrap

216

(3)

5.4 Exercises

219

(6)

6 Linear Model Selection and Regularization

225

(64)

6.1 Subset Selection

227

(10)

6.1.1 Best Subset Selection

227

(2)

6.1.2 Stepwise Selection

229

(3)

6.1.3 Choosing the Optimal Model

232

(5)

6.2 Shrinkage Methods

237

(14)

6.2.1 Ridge Regression

237

(4)

6.2.2 The Lasso

241

(9)

6.2.3 Selecting the Tuning Parameter

250

(1)

6.3 Dimension Reduction Methods

251

(10)

6.3.1 Principal Components Regression

252

(7)

6.3.2 Partial Least Squares

259

(2)

6.4 Considerations in High Dimensions

261

(6)

6.4.1 High-Dimensional Data

261

(1)

6.4.2 What Goes Wrong in High Dimensions?

262

(2)

6.4.3 Regression in High Dimensions

264

(2)

6.4.4 Interpreting Results in High Dimensions

266

(1)

6.5 Lab: Linear Models and Regularization Methods

267

(15)

6.5.1 Subset Selection Methods

267

(7)

6.5.2 Ridge Regression and the Lasso

274

(5)

6.5.3 PCR and PLS Regression

279

(3)

6.6 Exercises

282

(7)

7 Moving Beyond Linearity

289

(38)

7.1 Polynomial Regression

290

(2)

7.2 Step Functions

292

(2)

7.3 Basis Functions

294

(1)

7.4 Regression Splines

295

(6)

7.4.1 Piecewise Polynomials

295

(1)

7.4.2 Constraints and Splines

295

(2)

7.4.3 The Spline Basis Representation

297

(1)

7.4.4 Choosing the Number and Locations of the Knots

298

(2)

7.4.5 Comparison to Polynomial Regression

300

(1)

7.5 Smoothing Splines

301

(3)

7.5.1 An Overview of Smoothing Splines

301

(1)

7.5.2 Choosing the Smoothing Parameter A

302

(2)

7.6 Local Regression

304

(2)

7.7 Generalized Additive Models

306

(5)

7.7.1 GAMs for Regression Problems

307

(3)

7.7.2 GAMs for Classification Problems

310

(1)

7.8 Lab: Non-linear Modeling

311

(10)

7.8.1 Polynomial Regression and Step Functions

312

(5)

7.8.2 Splines

317

(1)

7.8.3 GAMs

318

(3)

7.9 Exercises

321

(6)

8 Tree-Based Methods

327

(40)

8.1 The Basics of Decision Trees

327

(13)

8.1.1 Regression Trees

328

(7)

8.1.2 Classification Trees

335

(3)

8.1.3 Trees Versus Linear Models

338

(1)

8.1.4 Advantages and Disadvantages of Trees

339

(1)

8.2 Bagging, Random Forests, Boosting, and Bayesian Additive Regression Trees

340

(13)

8.2.1 Bagging

340

(3)

8.2.2 Random Forests

343

(2)

8.2.3 Boosting

345

(3)

8.2.4 Bayesian Additive Regression Trees

348

(3)

8.2.5 Summary of Tree Ensemble Methods

351

(2)

8.3 Lab: Decision Trees

353

(8)

8.3.1 Fitting Classification Trees

353

(3)

8.3.2 Fitting Regression Trees

356

(1)

8.3.3 Bagging and Random Forests

357

(2)

8.3.4 Boosting

359

(1)

8.3.5 Bayesian Additive Regression Trees

360

(1)

8.4 Exercises

361

(6)

9 Support Vector Machines

367

(36)

9.1 Maximal Margin Classifier

368

(5)

9.1.1 What Is a Hyperplane?

368

(1)

9.1.2 Classification Using a Separating Hyperplane

369

(2)

9.1.3 The Maximal Margin Classifier

371

(1)

9.1.4 Construction of the Maximal Margin Classifier

372

(1)

9.1.5 The Non-separable Case

373

(1)

9.2 Support Vector Classifiers

373

(6)

9.2.1 Overview of the Support Vector Classifier

373

(2)

9.2.2 Details of the Support Vector Classifier

375

(4)

9.3 Support Vector Machines

379

(6)

9.3.1 Classification with Non-Linear Decision Boundaries

379

(1)

9.3.2 The Support Vector Machine

380

(3)

9.3.3 An Application to the Heart Disease Data

383

(2)

9.4 SVMs with More than Two Classes

385

(1)

9.4.1 One-Versus-One Classification

385

(1)

9.4.2 One-Versus-All Classification

385

(1)

9.5 Relationship to Logistic Regression

386

(2)

9.6 Lab: Support Vector Machines

388

(10)

9.6.1 Support Vector Classifier

389

(3)

9.6.2 Support Vector Machine

392

(2)

9.6.3 ROC Curves

394

(2)

9.6.4 SVM with Multiple Classes

396

(1)

9.6.5 Application to Gene Expression Data

396

(2)

9.7 Exercises

398

(5)

10 Deep Learning

403

(58)

10.1 Single Layer Neural Networks

404

(3)

10.2 Multilayer Neural Networks

407

(4)

10.3 Convolutional Neural Networks

411

(8)

10.3.1 Convolution Layers

412

(3)

10.3.2 Pooling Layers

415

(1)

10.3.3 Architecture of a Convolutional Neural Network

415

(2)

10.3.4 Data Augmentation

417

(1)

10.3.5 Results Using a Pretrained Classifier

417

(2)

10.4 Document Classification

419

(2)

10.5 Recurrent Neural Networks

421

(11)

10.5.1 Sequential Models for Document Classification

424

(3)

10.5.2 Time Series Forecasting

427

(4)

10.5.3 Summary of RNNs

431

(1)

10.6 When to Use Deep Learning

432

(2)

10.7 Fitting a Neural Network

434

(5)

10.7.1 Backpropagation

435

(1)

10.7.2 Regularization and Stochastic Gradient Descent

436

(2)

10.7.3 Dropout Learning

438

(1)

10.7.4 Network Tuning

438

(1)

10.8 Interpolation and Double Descent

439

(4)

10.9 Lab: Deep Learning

443

(15)

10.9.1 A Single Layer Network on the Hitters Data

443

(2)

10.9.2 A Multilayer Network on the MNIST Digit Data

445

(3)

10.9.3 Convolutional Neural Networks

448

(3)

10.9.4 Using Pretrained CNN Models

451

(1)

10.9.5 IMDb Document Classification

452

(2)

10.9.6 Recurrent Neural Networks

454

(4)

10.10 Exercises

458

(3)

11 Survival Analysis and Censored Data

461

(36)

11.1 Survival and Censoring Times

462

(1)

11.2 A Closer Look at Censoring

463

(1)

11.3 The Kaplan-Meier Survival Curve

464

(2)

11.4 The Log-Rank Test

466

(3)

11.5 Regression Models With a Survival Response

469

(9)

11.5.1 The Hazard Function

469

(2)

11.5.2 Proportional Hazards

471

(4)

11.5.3 Example: Brain Cancer Data

475

(1)

11.5.4 Example: Publication Data

475

(3)

11.6 Shrinkage for the Cox Model

478

(2)

11.7 Additional Topics

480

(3)

11.7.1 Area Under the Curve for Survival Analysis

480

(1)

11.7.2 Choice of Time Scale

481

(1)

11.7.3 Time-Dependent Covariates

481

(1)

11.7.4 Checking the Proportional Hazards Assumption

482

(1)

11.7.5 Survival Trees

482

(1)

11.8 Lab: Survival Analysis

483

(7)

11.8.1 Brain Cancer Data

483

(3)

11.8.2 Publication Data

486

(1)

11.8.3 Call Center Data

487

(3)

11.9 Exercises

490

(7)

12 Unsupervised Learning

497

(56)

12.1 The Challenge of Unsupervised Learning

497

(1)

12.2 Principal Components Analysis

498

(12)

12.2.1 What Are Principal Components?

499

(4)

12.2.2 Another Interpretation of Principal Components

503

(2)

12.2.3 The Proportion of Variance Explained

505

(2)

12.2.4 More on PCA

507

(3)

12.2.5 Other Uses for Principal Components

510

(1)

12.3 Missing Values and Matrix Completion

510

(6)

12.4 Clustering Methods

516

(16)

12.4.1 If-Means Clustering

517

(4)

12.4.2 Hierarchical Clustering

521

(9)

12.4.3 Practical Issues in Clustering

530

(2)

12.5 Lab: Unsupervised Learning

532

(16)

12.5.1 Principal Components Analysis

532

(3)

12.5.2 Matrix Completion

535

(3)

12.5.3 Clustering

538

(4)

12.5.4 NCI60 Data Example

542

(6)

12.6 Exercises

548

(5)

13 Multiple Testing

553

(44)

13.1 A Quick Review of Hypothesis Testing

554

(6)

13.1.1 Testing a Hypothesis

555

(4)

13.1.2 Type I and Type II Errors

559

(1)

13.2 The Challenge of Multiple Testing

560

(1)

13.3 The Family-Wise Error Rate

561

(10)

13.3.1 What is the Family-Wise Error Rate?

562

(2)

13.3.2 Approaches to Control the Family-Wise Error Rate

564

(6)

13.3.3 Trade-Off Between the FWER and Power

570

(1)

13.4 The False Discovery Rate

571

(4)

13.4.1 Intuition for the False Discovery Rate

571

(2)

13.4.2 The Benjamini-Hochberg Procedure

573

(2)

13.5 A Re-Sampling Approach to p-Values and False Discovery Rates

575

(7)

13.5.1 A Re-Sampling Approach to the p-Value

576

(2)

13.5.2 A Re-Sampling Approach to the False Discovery Rate

578

(3)

13.5.3 When Are Re-Sampling Approaches Useful?

581

(1)

13.6 Lab: Multiple Testing

582

(9)

13.6.1 Review of Hypothesis Tests

582

(1)

13.6.2 The Family-Wise Error Rate

583

(3)

13.6.3 The False Discovery Rate

586

(2)

13.6.4 A Re-Sampling Approach

588

(3)

13.7 Exercises

591

(6)

Index

597

Gareth James is a professor of data sciences and operations, and the E. Morgan Stanley Chair in Business Administration, at the University of Southern California. He has published an extensive body of methodological work in the domain of statistical learning with particular emphasis on high-dimensional and functional data. The conceptual framework for this book grew out of his MBA elective courses in this area.

Daniela Witten is a professor of statistics and biostatistics, and the Dorothy Gilford Endowed Chair, at the University of Washington. Her research focuses largely on statistical machine learning techniques for the analysis of complex, messy, and large-scale data, with an emphasis on unsupervised learning.

Trevor Hastie and Robert Tibshirani are professors of statistics at Stanford University, and are co-authors of the successful textbook Elements of Statistical Learning. Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap.

Introduction to Statistical Learning: with Applications in R Second Edition 2021 [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv