Customer Support: +372 7440010

Help | New account | Log In

Tree-Based Methods for Statistical Learning in R [Hardback]

Brandon M. Greenwell (University of Cincinnati, Cincinnati, USA)

Format: Hardback, 404 pages, height x width: 234x156 mm, weight: 740 g, 5 Tables, black and white; 115 Line drawings, black and white; 1 Halftones, black and white; 116 Illustrations, black and white
Series: Chapman & Hall/CRC Data Science Series
Pub. Date: 23-Jun-2022
Publisher: Chapman & Hall/CRC
ISBN-10: 0367532468
ISBN-13: 9780367532468

Other books in subject:

Probability & statistics - (Currently in stock: 2 items)
Automatic control engineering

Hardback
Price: 124,29 €
This book is not in stock. Book will arrive in about 2-4 weeks. Please allow another 2 weeks for shipping outside Estonia.
Quantity:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Add to basket
Delivery time 4-6 weeks
Add to Wishlist
For Libraries

Format: Hardback, 404 pages, height x width: 234x156 mm, weight: 740 g, 5 Tables, black and white; 115 Line drawings, black and white; 1 Halftones, black and white; 116 Illustrations, black and white
Series: Chapman & Hall/CRC Data Science Series
Pub. Date: 23-Jun-2022
Publisher: Chapman & Hall/CRC
ISBN-10: 0367532468
ISBN-13: 9780367532468

Other books in subject:

Probability & statistics - (Currently in stock: 2 items)
Automatic control engineering

Permanent link: https://www.kriso.ee/db/9780367532468.html

Keywords:

Tree-based Methods for Statistical Learning in R provides a thorough introduction to both individual decision tree algorithms (Part I) and ensembles thereof (Part II). Part I of the book brings several different tree algorithms into focus, both conventional and contemporary. Building a strong foundation for how individual decision trees work will help readers better understand tree-based ensembles at a deeper level, which lie at the cutting edge of modern statistical and machine learning methodology.

The book follows up most ideas and mathematical concepts with code-based examples in the R statistical language; with an emphasis on using as few external packages as possible. For example, users will be exposed to writing their own random forest and gradient tree boosting functions using simple for loops and basic tree fitting software (like rpart and party/partykit), and more. The core chapters also end with a detailed section on relevant software in both R and other opensource alternatives (e.g., Python, Spark, and Julia), and example usage on real data sets. While the book mostly uses R, it is meant to be equally accessible and useful to non-R programmers.

Consumers of this book will have gained a solid foundation (and appreciation) for tree-based methods and how they can be used to solve practical problems and challenges data scientists often face in applied work.

Features:

Thorough coverage, from the ground up, of tree-based methods (e.g., CART, conditional inference trees, bagging, boosting, and random forests).

A companion website containing additional supplementary material and the code to reproduce every example and figure in the book. A companion R package, called treemisc, which contains several data sets and functions used throughout the book (e.g., theres an implementation of gradient tree boosting with LAD loss that shows how to perform the line search step by updating the terminal node estimates of a fitted rpart tree). Interesting examples that are of practical use; for example, how to construct partial dependence plots from a fitted model in Spark MLlib (using only Spark operations), or post-processing tree ensembles via the LASSO to reduce the number of trees while maintaining, or even improving performance.

Reviews

Tree-based algorithms have been a workhorse for data science teams for decades, but the data science field has lacked an all-encompassing review of trees - and their modern variants like XGBoost - until now. Greenwell has written the ultimate guide for tree-based methods: how they work, their pitfalls, and alternative solutions. He puts it all together in a readable and immediately usable book. You're guaranteed to learn new tips and tricks to help your data science team.

-Alex Gutman, Director of Data Science, Author: Becoming a Data Head "Heres a new title that is a must have for any data scientist who uses the R language. Its a wonderful learning resource for tree-based techniques in statistical learning, one thats become my go-to text when I find the need to do a deep dive into various ML topic areas for my work."

Daniel D. Gutierrez, Editor-in-Chief for insideBIGDATA, USA, insideBIGDATA, February 2023

Preface

xiii

1 Introduction

(36)

1.1 Select topics in statistical and machine learning

(8)

1.1.1 Statistical jargon and conventions

(1)

1.1.2 Supervised learning

(1)

1.1.2.1 Description

(1)

1.1.2.2 Prediction

(1)

1.1.2.3 Classification vs. regression

(1)

1.1.2.4 Discrimination vs. prediction

(1)

1.1.2.5 The bias-variance tradeoff

(2)

1.1.3 Unsupervised learning

(1)

1.2 Why trees?

(7)

1.2.1 A brief history of decision trees

(2)

1.2.2 The anatomy of a simple decision tree

(1)

1.2.2.1 Example: survival on the Titanic

(2)

1.3 Why R?

(3)

1.3.1 No really, why R?

(2)

1.3.2 Software information and conventions

(1)

1.4 Some example data sets

(15)

1.4.1 Swiss banknotes

(1)

1.4.2 New York air quality measurements

(2)

1.4.3 The Friedman 1 benchmark problem

(1)

1.4.4 Mushroom edibility

(1)

1.4.5 Spam or ham?

(3)

1.4.6 Employee attrition

(1)

1.4.7 Predicting home prices in Ames, Iowa

(1)

1.4.8 Wine quality ratings

(1)

1.4.9 Mayo Clinic primary biliary cholangitis study

(4)

1.5 There ain't no such thing as a free lunch

(1)

1.6 Outline of this book

(2)

I Decision trees

(140)

2 Binary recursive partitioning with CART

(72)

2.1 Introduction

(2)

2.2 Classification trees

(17)

2.2.1 Splits on ordered variables

(4)

2.2.1.1 So which is it in practice, Gini or entropy?

(1)

2.2.2 Example: Swiss banknotes

(3)

2.2.3 Fitted values and predictions

(1)

2.2.4 Class priors and misclassification costs

(2)

2.2.4.1 Altered priors

(1)

2.2.4.2 Example: employee attrition

(3)

2.3 Regression trees

(4)

2.3.1 Example: New York air quality measurements

(3)

2.4 Categorical splits

(7)

2.4.1 Example: mushroom edibility

(3)

2.4.2 Be wary of categoricals with high cardinality

(1)

2.4.3 To encode, or not to encode?

(1)

2.5 Building a decision tree

(9)

2.5.1 Cost-complexity pruning

(3)

2.5.1.1 Example: mushroom edibility

(3)

2.5.2 Cross-validation

(1)

2.5.2.1 The 1-SE rule

(1)

2.6 Hyperparameters and tuning

(1)

2.7 Missing data and surrogate splits

(4)

2.7.1 Other missing value strategies

(2)

2.8 Variable importance

(1)

2.9 Software and examples

(22)

2.9.1 Example: Swiss banknotes

(4)

2.9.2 Example: mushroom edibility

(8)

2.9.3 Example: predicting home prices

(4)

2.9.4 Example: employee attrition

100

(3)

2.9.5 Example: letter image recognition

103

(2)

2.10 Discussion

105

(3)

2.10.1 Advantages of CART

105

(1)

2.10.2 Disadvantages of CART

106

(2)

2.11 Recommended reading

108

(3)

3 Conditional inference trees

111

(36)

3.1 Introduction

111

(1)

3.2 Early attempts at unbiased recursive partitioning

112

(2)

3.3 A quick digression into conditional inference

114

(7)

3.3.1 Example: X and Y are both univariate continuous

117

(1)

3.3.2 Example: X and Y are both nominal categorical

118

(2)

3.3.3 Which test statistic should you use?

120

(1)

3.4 Conditional inference trees

121

(11)

3.4.1 Selecting the splitting variable

121

(2)

3.4.1.1 Example: New York air quality measurements

123

(1)

3.4.1.2 Example: Swiss banknotes

124

(1)

3.4.2 Finding the optimal split point

125

(1)

3.4.2.1 Example: New York air quality measurements

126

(2)

3.4.3 Pruning

128

(1)

3.4.4 Missing values

128

(1)

3.4.5 Choice of a, g(), and h()

128

(3)

3.4.6 Fitted values and predictions

131

(1)

3.4.7 Imbalanced classes

131

(1)

3.4.8 Variable importance

132

(1)

3.5 Software and examples

132

(11)

3.5.1 Example: New York air quality measurements

133

(4)

3.5.2 Example: wine quality ratings

137

(3)

3.5.3 Example: Mayo Clinic liver transplant data

140

(3)

3.6 Final thoughts

143

(4)

4 The hitchhiker's GUIDE to modern decision trees

147

(30)

4.1 Introduction

148

(2)

4.2 A GUIDE for regression

150

(7)

4.2.1 Piecewise constant models

150

(2)

4.2.1.1 Example: New York air quality measurements

152

(1)

4.2.2 Interaction tests

153

(1)

4.2.3 Non-constant fits

154

(1)

4.2.3.1 Example: predicting home prices

155

(2)

4.2.3.2 Bootstrap bias correction

157

(1)

4.3 A GUIDE for classification

157

(5)

4.3.1 Linear/oblique splits

157

(1)

4.3.1.1 Example: classifying the Palmer penguins

158

(3)

4.3.2 Priors and misclassification costs

161

(1)

4.3.3 Non-constant fits

161

(1)

4.3.3.1 Kernel-based and k-nearest neighbor fits

162

(1)

4.4 Pruning

162

(1)

4.5 Missing values

163

(1)

4.6 Fitted values and predictions

163

(1)

4.7 Variable importance

163

(1)

4.8 Ensembles

164

(1)

4.9 Software and examples

165

(7)

4.9.1 Example: credit card default

165

(7)

4.10 Final thoughts

172

(5)

II Tree-based ensembles

177

(182)

5 Ensemble algorithms

179

(24)

5.1 Bootstrap aggregating (bagging)

181

(7)

5.1.1 When does bagging work?

184

(1)

5.1.2 Bagging from scratch: classifying email spam

184

(3)

5.1.3 Sampling without replacement

187

(1)

5.1.4 Hyperparameters and tuning

187

(1)

5.1.5 Software

188

(1)

5.2 Boosting

188

(7)

5.2.1 AdaBoost. M1 for binary outcomes

189

(1)

5.2.2 Boosting from scratch: classifying email spam

190

(2)

5.2.3 Tuning

192

(1)

5.2.4 Forward stagewise additive modeling and exponential loss

192

(2)

5.2.5 Software

194

(1)

5.3 Bagging or boosting: which should you use?

195

(1)

5.4 Variable importance

195

(1)

5.5 Importance sampled learning ensembles

196

(6)

5.5.1 Example: post-processing a bagged tree ensemble

197

(5)

5.6 Final thoughts

202

(1)

6 Peeking inside the "black box": post-hoc interpretability

203

(26)

6.1 Feature importance

204

(4)

6.1.1 Permutation importance

204

(2)

6.1.2 Software

206

(1)

6.1.3 Example: predicting home prices

206

(2)

6.2 Feature effects

208

(9)

6.2.1 Partial dependence

208

(1)

6.2.1.1 Classification problems

209

(1)

6.2.2 Interaction effects

210

(1)

6.2.3 Individual conditional expectations

210

(1)

6.2.4 Software

211

(1)

6.2.5 Example: predicting home prices

211

(4)

6.2.6 Example: Edgar Anderson's iris data

215

(2)

6.3 Feature contributions

217

(8)

6.3.1 Shapley values

217

(2)

6.3.2 Explaining predictions with Shapley values

219

(1)

6.3.2.1 TreeSHAP

220

(1)

6.3.2.2 Monte Carlo-based Shapley explanations

221

(2)

6.3.3 Software

223

(1)

6.3.4 Example: predicting home prices

223

(2)

6.4 Drawbacks of existing methods

225

(1)

6.5 Final thoughts

226

(3)

7 Random forests

229

(80)

7.1 Introduction

229

(1)

7.2 The random forest algorithm

229

(10)

7.2.1 Voting and probability estimation

232

(2)

7.2.1.1 Example: Mease model simulation

234

(2)

7.2.2 Subsampling (without replacement)

236

(1)

7.2.3 Random forest from scratch: predicting home prices

237

(2)

7.3 Out-of-bag (OOB) data

239

(4)

7.4 Hyperparameters and tuning

243

(2)

7.5 Variable importance

245

(4)

7.5.1 Impurity-based importance

245

(2)

7.5.2 OOB-based permutation importance

247

(1)

7.5.2.1 Holdout permutation importance

248

(1)

7.5.2.2 Conditional permutation importance

249

(1)

7.6 Casewise proximities

249

(7)

7.6.1 Detecting anomalies and outliers

251

(1)

7.6.1.1 Example: Swiss banknotes

251

(1)

7.6.2 Missing value imputation

252

(1)

7.6.3 Unsupervised random forests

253

(1)

7.6.3.1 Example: Swiss banknotes

254

(1)

7.6.4 Case-specific random forests

254

(2)

7.7 Prediction standard errors

256

(2)

7.7.1 Example: predicting email spam

257

(1)

7.8 Random forest extensions

258

(18)

7.8.1 Oblique random forests

258

(1)

7.8.2 Quantile regression forests

259

(1)

7.8.2.1 Example: predicting home prices (with prediction intervals)

260

(1)

7.8.3 Rotation forests and random rotation forests

261

(2)

7.8.3.1 Random rotation forests

263

(1)

7.8.3.2 Example: Gaussian mixture data

264

(3)

7.8.4 Extremely randomized trees

267

(2)

7.8.5 Anomaly detection with isolation forests

269

(2)

7.8.5.1 Extended isolation forests

271

(1)

7.8.5.2 Example: detecting credit card fraud

271

(5)

7.9 Software and examples

276

(30)

7.9.1 Example: mushroom edibility

277

(1)

7.9.2 Example: "deforesting" a random forest

277

(6)

7.9.3 Example: survival on the Titanic

283

(1)

7.9.3.1 Missing value imputation

284

(3)

7.9.3.2 Analyzing the imputed data sets

287

(7)

7.9.4 Example: class imbalance (the good, the bad, and the ugly)

294

(6)

7.9.5 Example: partial dependence with Spark MLlib

300

(6)

7.10 Final thoughts

306

(3)

8 Gradient boosting machines

309

(50)

8.1 Steepest descent (a brief overview)

310

(1)

8.2 Gradient tree boosting

311

(6)

8.2.0.1 Loss functions

314

(3)

8.2.0.2 Always a regression tree?

317

(1)

8.2.0.3 Priors and missclassification cost

317

(1)

8.3 Hyperparameters and tuning

317

(5)

8.3.1 Boosting-specific hyperparameters

318

(1)

8.3.1.1 The number of trees in the ensemble: B

318

(1)

8.3.1.2 Regularization and shrinkage

319

(1)

8.3.1.3 Example: predicting ALS progression

320

(1)

8.3.2 Tree-specific hyperparameters

321

(1)

8.3.3 A simple tuning strategy

322

(1)

8.4 Stochastic gradient boosting

322

(1)

8.4.1 Column subsampling

323

(1)

8.5 Gradient tree boosting from scratch

323

(4)

8.5.1 Example: predicting home prices

326

(1)

8.6 Interpretability

327

(5)

8.6.1 Faster partial dependence with the recursion method

328

(1)

8.6.1.1 Example: predicting email spam

329

(1)

8.6.2 Monotonic constraints

329

(1)

8.6.2.1 Example: bank marketing data

330

(2)

8.7 Specialized topics

332

(3)

8.7.1 Level-wise vs. leaf-wise tree induction

332

(1)

8.7.2 Histogram binning

333

(1)

8.7.3 Explainable boosting machines

333

(1)

8.7.4 Probabilistic regression via natural gradient boosting

334

(1)

8.8 Specialized implementations

335

(4)

8.8.1 Extreme Gradient Boosting: XGBoost

335

(2)

8.8.2 Light Gradient Boosting Machine: LightGBM

337

(1)

8.8.3 CatBoost

338

(1)

8.9 Software and examples

339

(17)

8.9.1 Example: Mayo Clinic liver transplant data

339

(7)

8.9.2 Example: probabilistic predictions with NGBoost (in Python)

346

(1)

8.9.3 Example: post-processing GBMs with the LASSO

347

(4)

8.9.4 Example: direct marketing campaigns with XGBoost

351

(5)

8.10 Final thoughts

356

(3)

Bibliography

359

(22)

Index

381

Brandon M. Greenwell is a data scientist at 84.51° where he works on a diverse team to enable, empower, and enculturate statistical and machine learning best practices where its applicable to help others solve real business problems. He received a B.S. in Statistics and an M.S. in Applied Statistics from Wright State University, and a Ph.D. in Applied Mathematics from the Air Force Institute of Technology. He's currently part of the Adjunct Graduate Faculty at Wright State University, an Adjunct Instructor at the University of Cincinnati, the lead developer and maintainer of several R packages available on CRAN (and off CRAN), and co-author of Hands-On Machine Learning with R.

Tree-Based Methods for Statistical Learning in R [Hardback]

Reviews

Account & settings

Search

Search database

Refine By

Subjects English Books

Choose shopping cart