Update cookies preferences

Tree-Based Methods for Statistical Learning in R [Hardback]

(University of Cincinnati, Cincinnati, USA)
  • Format: Hardback, 404 pages, height x width: 234x156 mm, weight: 740 g, 5 Tables, black and white; 115 Line drawings, black and white; 1 Halftones, black and white; 116 Illustrations, black and white
  • Series: Chapman & Hall/CRC Data Science Series
  • Pub. Date: 23-Jun-2022
  • Publisher: Chapman & Hall/CRC
  • ISBN-10: 0367532468
  • ISBN-13: 9780367532468
Other books in subject:
  • Hardback
  • Price: 124,29 €
  • This book is not in stock. Book will arrive in about 2-4 weeks. Please allow another 2 weeks for shipping outside Estonia.
  • Quantity:
  • Add to basket
  • Delivery time 4-6 weeks
  • Add to Wishlist
  • For Libraries
  • Format: Hardback, 404 pages, height x width: 234x156 mm, weight: 740 g, 5 Tables, black and white; 115 Line drawings, black and white; 1 Halftones, black and white; 116 Illustrations, black and white
  • Series: Chapman & Hall/CRC Data Science Series
  • Pub. Date: 23-Jun-2022
  • Publisher: Chapman & Hall/CRC
  • ISBN-10: 0367532468
  • ISBN-13: 9780367532468
Other books in subject:
Tree-based Methods for Statistical Learning in R provides a thorough introduction to both individual decision tree algorithms (Part I) and ensembles thereof (Part II). Part I of the book brings several different tree algorithms into focus, both conventional and contemporary. Building a strong foundation for how individual decision trees work will help readers better understand tree-based ensembles at a deeper level, which lie at the cutting edge of modern statistical and machine learning methodology.

The book follows up most ideas and mathematical concepts with code-based examples in the R statistical language; with an emphasis on using as few external packages as possible. For example, users will be exposed to writing their own random forest and gradient tree boosting functions using simple for loops and basic tree fitting software (like rpart and party/partykit), and more. The core chapters also end with a detailed section on relevant software in both R and other opensource alternatives (e.g., Python, Spark, and Julia), and example usage on real data sets. While the book mostly uses R, it is meant to be equally accessible and useful to non-R programmers.

Consumers of this book will have gained a solid foundation (and appreciation) for tree-based methods and how they can be used to solve practical problems and challenges data scientists often face in applied work.

Features:





Thorough coverage, from the ground up, of tree-based methods (e.g., CART, conditional inference trees, bagging, boosting, and random forests).



A companion website containing additional supplementary material and the code to reproduce every example and figure in the book. A companion R package, called treemisc, which contains several data sets and functions used throughout the book (e.g., theres an implementation of gradient tree boosting with LAD loss that shows how to perform the line search step by updating the terminal node estimates of a fitted rpart tree). Interesting examples that are of practical use; for example, how to construct partial dependence plots from a fitted model in Spark MLlib (using only Spark operations), or post-processing tree ensembles via the LASSO to reduce the number of trees while maintaining, or even improving performance.

Reviews

Tree-based algorithms have been a workhorse for data science teams for decades, but the data science field has lacked an all-encompassing review of trees - and their modern variants like XGBoost - until now. Greenwell has written the ultimate guide for tree-based methods: how they work, their pitfalls, and alternative solutions. He puts it all together in a readable and immediately usable book. You're guaranteed to learn new tips and tricks to help your data science team.

-Alex Gutman, Director of Data Science, Author: Becoming a Data Head "Heres a new title that is a must have for any data scientist who uses the R language. Its a wonderful learning resource for tree-based techniques in statistical learning, one thats become my go-to text when I find the need to do a deep dive into various ML topic areas for my work."

Daniel D. Gutierrez, Editor-in-Chief for insideBIGDATA, USA, insideBIGDATA, February 2023

Preface xiii
1 Introduction
1(36)
1.1 Select topics in statistical and machine learning
2(8)
1.1.1 Statistical jargon and conventions
3(1)
1.1.2 Supervised learning
4(1)
1.1.2.1 Description
5(1)
1.1.2.2 Prediction
6(1)
1.1.2.3 Classification vs. regression
7(1)
1.1.2.4 Discrimination vs. prediction
7(1)
1.1.2.5 The bias-variance tradeoff
8(2)
1.1.3 Unsupervised learning
10(1)
1.2 Why trees?
10(7)
1.2.1 A brief history of decision trees
12(2)
1.2.2 The anatomy of a simple decision tree
14(1)
1.2.2.1 Example: survival on the Titanic
15(2)
1.3 Why R?
17(3)
1.3.1 No really, why R?
17(2)
1.3.2 Software information and conventions
19(1)
1.4 Some example data sets
20(15)
1.4.1 Swiss banknotes
21(1)
1.4.2 New York air quality measurements
21(2)
1.4.3 The Friedman 1 benchmark problem
23(1)
1.4.4 Mushroom edibility
24(1)
1.4.5 Spam or ham?
25(3)
1.4.6 Employee attrition
28(1)
1.4.7 Predicting home prices in Ames, Iowa
29(1)
1.4.8 Wine quality ratings
30(1)
1.4.9 Mayo Clinic primary biliary cholangitis study
31(4)
1.5 There ain't no such thing as a free lunch
35(1)
1.6 Outline of this book
35(2)
I Decision trees
37(140)
2 Binary recursive partitioning with CART
39(72)
2.1 Introduction
39(2)
2.2 Classification trees
41(17)
2.2.1 Splits on ordered variables
43(4)
2.2.1.1 So which is it in practice, Gini or entropy?
47(1)
2.2.2 Example: Swiss banknotes
48(3)
2.2.3 Fitted values and predictions
51(1)
2.2.4 Class priors and misclassification costs
52(2)
2.2.4.1 Altered priors
54(1)
2.2.4.2 Example: employee attrition
55(3)
2.3 Regression trees
58(4)
2.3.1 Example: New York air quality measurements
59(3)
2.4 Categorical splits
62(7)
2.4.1 Example: mushroom edibility
64(3)
2.4.2 Be wary of categoricals with high cardinality
67(1)
2.4.3 To encode, or not to encode?
68(1)
2.5 Building a decision tree
69(9)
2.5.1 Cost-complexity pruning
71(3)
2.5.1.1 Example: mushroom edibility
74(3)
2.5.2 Cross-validation
77(1)
2.5.2.1 The 1-SE rule
78(1)
2.6 Hyperparameters and tuning
78(1)
2.7 Missing data and surrogate splits
78(4)
2.7.1 Other missing value strategies
80(2)
2.8 Variable importance
82(1)
2.9 Software and examples
83(22)
2.9.1 Example: Swiss banknotes
84(4)
2.9.2 Example: mushroom edibility
88(8)
2.9.3 Example: predicting home prices
96(4)
2.9.4 Example: employee attrition
100(3)
2.9.5 Example: letter image recognition
103(2)
2.10 Discussion
105(3)
2.10.1 Advantages of CART
105(1)
2.10.2 Disadvantages of CART
106(2)
2.11 Recommended reading
108(3)
3 Conditional inference trees
111(36)
3.1 Introduction
111(1)
3.2 Early attempts at unbiased recursive partitioning
112(2)
3.3 A quick digression into conditional inference
114(7)
3.3.1 Example: X and Y are both univariate continuous
117(1)
3.3.2 Example: X and Y are both nominal categorical
118(2)
3.3.3 Which test statistic should you use?
120(1)
3.4 Conditional inference trees
121(11)
3.4.1 Selecting the splitting variable
121(2)
3.4.1.1 Example: New York air quality measurements
123(1)
3.4.1.2 Example: Swiss banknotes
124(1)
3.4.2 Finding the optimal split point
125(1)
3.4.2.1 Example: New York air quality measurements
126(2)
3.4.3 Pruning
128(1)
3.4.4 Missing values
128(1)
3.4.5 Choice of a, g(), and h()
128(3)
3.4.6 Fitted values and predictions
131(1)
3.4.7 Imbalanced classes
131(1)
3.4.8 Variable importance
132(1)
3.5 Software and examples
132(11)
3.5.1 Example: New York air quality measurements
133(4)
3.5.2 Example: wine quality ratings
137(3)
3.5.3 Example: Mayo Clinic liver transplant data
140(3)
3.6 Final thoughts
143(4)
4 The hitchhiker's GUIDE to modern decision trees
147(30)
4.1 Introduction
148(2)
4.2 A GUIDE for regression
150(7)
4.2.1 Piecewise constant models
150(2)
4.2.1.1 Example: New York air quality measurements
152(1)
4.2.2 Interaction tests
153(1)
4.2.3 Non-constant fits
154(1)
4.2.3.1 Example: predicting home prices
155(2)
4.2.3.2 Bootstrap bias correction
157(1)
4.3 A GUIDE for classification
157(5)
4.3.1 Linear/oblique splits
157(1)
4.3.1.1 Example: classifying the Palmer penguins
158(3)
4.3.2 Priors and misclassification costs
161(1)
4.3.3 Non-constant fits
161(1)
4.3.3.1 Kernel-based and k-nearest neighbor fits
162(1)
4.4 Pruning
162(1)
4.5 Missing values
163(1)
4.6 Fitted values and predictions
163(1)
4.7 Variable importance
163(1)
4.8 Ensembles
164(1)
4.9 Software and examples
165(7)
4.9.1 Example: credit card default
165(7)
4.10 Final thoughts
172(5)
II Tree-based ensembles
177(182)
5 Ensemble algorithms
179(24)
5.1 Bootstrap aggregating (bagging)
181(7)
5.1.1 When does bagging work?
184(1)
5.1.2 Bagging from scratch: classifying email spam
184(3)
5.1.3 Sampling without replacement
187(1)
5.1.4 Hyperparameters and tuning
187(1)
5.1.5 Software
188(1)
5.2 Boosting
188(7)
5.2.1 AdaBoost. M1 for binary outcomes
189(1)
5.2.2 Boosting from scratch: classifying email spam
190(2)
5.2.3 Tuning
192(1)
5.2.4 Forward stagewise additive modeling and exponential loss
192(2)
5.2.5 Software
194(1)
5.3 Bagging or boosting: which should you use?
195(1)
5.4 Variable importance
195(1)
5.5 Importance sampled learning ensembles
196(6)
5.5.1 Example: post-processing a bagged tree ensemble
197(5)
5.6 Final thoughts
202(1)
6 Peeking inside the "black box": post-hoc interpretability
203(26)
6.1 Feature importance
204(4)
6.1.1 Permutation importance
204(2)
6.1.2 Software
206(1)
6.1.3 Example: predicting home prices
206(2)
6.2 Feature effects
208(9)
6.2.1 Partial dependence
208(1)
6.2.1.1 Classification problems
209(1)
6.2.2 Interaction effects
210(1)
6.2.3 Individual conditional expectations
210(1)
6.2.4 Software
211(1)
6.2.5 Example: predicting home prices
211(4)
6.2.6 Example: Edgar Anderson's iris data
215(2)
6.3 Feature contributions
217(8)
6.3.1 Shapley values
217(2)
6.3.2 Explaining predictions with Shapley values
219(1)
6.3.2.1 TreeSHAP
220(1)
6.3.2.2 Monte Carlo-based Shapley explanations
221(2)
6.3.3 Software
223(1)
6.3.4 Example: predicting home prices
223(2)
6.4 Drawbacks of existing methods
225(1)
6.5 Final thoughts
226(3)
7 Random forests
229(80)
7.1 Introduction
229(1)
7.2 The random forest algorithm
229(10)
7.2.1 Voting and probability estimation
232(2)
7.2.1.1 Example: Mease model simulation
234(2)
7.2.2 Subsampling (without replacement)
236(1)
7.2.3 Random forest from scratch: predicting home prices
237(2)
7.3 Out-of-bag (OOB) data
239(4)
7.4 Hyperparameters and tuning
243(2)
7.5 Variable importance
245(4)
7.5.1 Impurity-based importance
245(2)
7.5.2 OOB-based permutation importance
247(1)
7.5.2.1 Holdout permutation importance
248(1)
7.5.2.2 Conditional permutation importance
249(1)
7.6 Casewise proximities
249(7)
7.6.1 Detecting anomalies and outliers
251(1)
7.6.1.1 Example: Swiss banknotes
251(1)
7.6.2 Missing value imputation
252(1)
7.6.3 Unsupervised random forests
253(1)
7.6.3.1 Example: Swiss banknotes
254(1)
7.6.4 Case-specific random forests
254(2)
7.7 Prediction standard errors
256(2)
7.7.1 Example: predicting email spam
257(1)
7.8 Random forest extensions
258(18)
7.8.1 Oblique random forests
258(1)
7.8.2 Quantile regression forests
259(1)
7.8.2.1 Example: predicting home prices (with prediction intervals)
260(1)
7.8.3 Rotation forests and random rotation forests
261(2)
7.8.3.1 Random rotation forests
263(1)
7.8.3.2 Example: Gaussian mixture data
264(3)
7.8.4 Extremely randomized trees
267(2)
7.8.5 Anomaly detection with isolation forests
269(2)
7.8.5.1 Extended isolation forests
271(1)
7.8.5.2 Example: detecting credit card fraud
271(5)
7.9 Software and examples
276(30)
7.9.1 Example: mushroom edibility
277(1)
7.9.2 Example: "deforesting" a random forest
277(6)
7.9.3 Example: survival on the Titanic
283(1)
7.9.3.1 Missing value imputation
284(3)
7.9.3.2 Analyzing the imputed data sets
287(7)
7.9.4 Example: class imbalance (the good, the bad, and the ugly)
294(6)
7.9.5 Example: partial dependence with Spark MLlib
300(6)
7.10 Final thoughts
306(3)
8 Gradient boosting machines
309(50)
8.1 Steepest descent (a brief overview)
310(1)
8.2 Gradient tree boosting
311(6)
8.2.0.1 Loss functions
314(3)
8.2.0.2 Always a regression tree?
317(1)
8.2.0.3 Priors and missclassification cost
317(1)
8.3 Hyperparameters and tuning
317(5)
8.3.1 Boosting-specific hyperparameters
318(1)
8.3.1.1 The number of trees in the ensemble: B
318(1)
8.3.1.2 Regularization and shrinkage
319(1)
8.3.1.3 Example: predicting ALS progression
320(1)
8.3.2 Tree-specific hyperparameters
321(1)
8.3.3 A simple tuning strategy
322(1)
8.4 Stochastic gradient boosting
322(1)
8.4.1 Column subsampling
323(1)
8.5 Gradient tree boosting from scratch
323(4)
8.5.1 Example: predicting home prices
326(1)
8.6 Interpretability
327(5)
8.6.1 Faster partial dependence with the recursion method
328(1)
8.6.1.1 Example: predicting email spam
329(1)
8.6.2 Monotonic constraints
329(1)
8.6.2.1 Example: bank marketing data
330(2)
8.7 Specialized topics
332(3)
8.7.1 Level-wise vs. leaf-wise tree induction
332(1)
8.7.2 Histogram binning
333(1)
8.7.3 Explainable boosting machines
333(1)
8.7.4 Probabilistic regression via natural gradient boosting
334(1)
8.8 Specialized implementations
335(4)
8.8.1 Extreme Gradient Boosting: XGBoost
335(2)
8.8.2 Light Gradient Boosting Machine: LightGBM
337(1)
8.8.3 CatBoost
338(1)
8.9 Software and examples
339(17)
8.9.1 Example: Mayo Clinic liver transplant data
339(7)
8.9.2 Example: probabilistic predictions with NGBoost (in Python)
346(1)
8.9.3 Example: post-processing GBMs with the LASSO
347(4)
8.9.4 Example: direct marketing campaigns with XGBoost
351(5)
8.10 Final thoughts
356(3)
Bibliography 359(22)
Index 381
Brandon M. Greenwell is a data scientist at 84.51° where he works on a diverse team to enable, empower, and enculturate statistical and machine learning best practices where its applicable to help others solve real business problems. He received a B.S. in Statistics and an M.S. in Applied Statistics from Wright State University, and a Ph.D. in Applied Mathematics from the Air Force Institute of Technology. He's currently part of the Adjunct Graduate Faculty at Wright State University, an Adjunct Instructor at the University of Cincinnati, the lead developer and maintainer of several R packages available on CRAN (and off CRAN), and co-author of Hands-On Machine Learning with R.