About the Authors |
|
vi | |
Preface for the Second Edition |
|
vii | |
Preface for the First Edition |
|
ix | |
1 Introduction to Decision Trees |
|
1 | (16) |
|
|
1 | (1) |
|
|
2 | (1) |
|
|
3 | (1) |
|
1.4 Knowledge Discovery in Databases (KDD) |
|
|
4 | (4) |
|
1.5 Taxonomy of Data Mining' Methods |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
9 | (1) |
|
|
10 | (2) |
|
1.8 Characteristics of Classification Trees |
|
|
12 | (3) |
|
|
14 | (1) |
|
1.8.2 The Hierarchical Nature of Decision Trees |
|
|
15 | (1) |
|
1.9 Relation to Rule Induction |
|
|
15 | (2) |
2 Training Decision Trees |
|
17 | (6) |
|
|
17 | (1) |
|
2.2 Preparing the Training Set |
|
|
17 | (2) |
|
2.3 Training the Decision Tree |
|
|
19 | (4) |
3 A Generic Algorithm for Top-Down Induction of Decision Trees |
|
23 | (8) |
|
|
23 | (2) |
|
3.2 Definition of the Classification Problem |
|
|
25 | (1) |
|
|
26 | (1) |
|
3.4 Probability Estimation in Decision Trees |
|
|
26 | (2) |
|
|
27 | (1) |
|
|
28 | (1) |
|
3.5 Algorithmic Framework for Decision Trees |
|
|
28 | (2) |
|
|
30 | (1) |
4 Evaluation of Classification Trees |
|
31 | (30) |
|
|
31 | (1) |
|
|
31 | (21) |
|
4.2.1 Theoretical Estimation of Generalization Error |
|
|
32 | (1) |
|
4.2.2 Empirical Estimation of Generalization Error |
|
|
32 | (2) |
|
4.2.3 Alternatives to the Accuracy Measure |
|
|
34 | (1) |
|
|
35 | (1) |
|
|
36 | (1) |
|
4.2.6 Classifier Evaluation under Limited Resources |
|
|
37 | (11) |
|
|
39 | (1) |
|
|
40 | (1) |
|
4.2.6.3 Qrecall (Quota Recall) |
|
|
40 | (1) |
|
|
41 | (1) |
|
4.2.6.5 Pearson Correlation Coefficient |
|
|
41 | (2) |
|
4.2.6.6 Area Under Curve (AUC) |
|
|
43 | (1) |
|
|
44 | (1) |
|
|
44 | (1) |
|
4.2.6.9 Potential Extract Measure (PEM) |
|
|
45 | (3) |
|
4.2.7 Which Decision Tree Classifier is Better? |
|
|
48 | (13) |
|
|
48 | (2) |
|
4.2.7.2 A Test for the Difference of Two Proportions |
|
|
50 | (1) |
|
4.2.7.3 The Resampled Paired t Test |
|
|
51 | (1) |
|
4.2.7.4 The k-fold Cross-validated Paired t Test |
|
|
51 | (1) |
|
4.3 Computational Complexity |
|
|
52 | (1) |
|
|
52 | (1) |
|
4.5 Scalability to Large Datasets |
|
|
53 | (2) |
|
|
55 | (1) |
|
|
55 | (1) |
|
4.8 Interestingness Measures |
|
|
56 | (1) |
|
4.9 Overfitting and Underfitting |
|
|
57 | (1) |
|
4.10 "No Free Lunch" Theorem |
|
|
58 | (3) |
5 Splitting Criteria |
|
61 | (8) |
|
5.1 Univariate Splitting Criteria |
|
|
61 | (6) |
|
|
61 | (1) |
|
5.1.2 Impurity-based Criteria |
|
|
61 | (1) |
|
|
62 | (1) |
|
|
62 | (1) |
|
5.1.5 Likelihood Ratio Chi-squared Statistics |
|
|
63 | (1) |
|
|
63 | (1) |
|
5.1.7 Normalized Impurity-based Criteria |
|
|
63 | (1) |
|
|
64 | (1) |
|
|
64 | (1) |
|
|
64 | (1) |
|
|
65 | (1) |
|
5.1.12 Orthogonal Criterion |
|
|
65 | (1) |
|
5.1.13 Kolmogorov—Smirnov Criterion |
|
|
66 | (1) |
|
5.1.14 AUC Splitting Criteria |
|
|
66 | (1) |
|
5.1.15 Other Univariate Splitting Criteria |
|
|
66 | (1) |
|
5.1.16 Comparison of Univariate Splitting Criteria |
|
|
66 | (1) |
|
5.2 Handling Missing Values |
|
|
67 | (2) |
6 Pruning Trees |
|
69 | (8) |
|
|
69 | (1) |
|
|
69 | (5) |
|
|
69 | (1) |
|
6.2.2 Cost Complexity Pruning |
|
|
70 | (1) |
|
6.2.3 Reduced Error Pruning |
|
|
70 | (1) |
|
6.2.4 Minimum Error Pruning (MEP) |
|
|
71 | (1) |
|
6.2.5 Pessimistic Pruning |
|
|
71 | (1) |
|
6.2.6 Error-Based Pruning (EBP) |
|
|
72 | (1) |
|
6.2.7 Minimum Description Length (MDL) Pruning |
|
|
73 | (1) |
|
6.2.8 Other Pruning Methods |
|
|
73 | (1) |
|
6.2.9 Comparison of Pruning Methods |
|
|
73 | (1) |
|
|
74 | (3) |
7 Popular Decision Trees Induction Algorithms |
|
77 | (8) |
|
|
77 | (1) |
|
|
77 | (1) |
|
|
78 | (1) |
|
|
79 | (1) |
|
|
79 | (1) |
|
|
80 | (1) |
|
7.7 Reference to Other Algorithms |
|
|
80 | (1) |
|
7.8 Advantages and Disadvantages of DecIsion Trees |
|
|
81 | (4) |
8 Beyond Classification Tasks |
|
85 | (14) |
|
|
85 | (1) |
|
|
85 | (1) |
|
|
86 | (3) |
|
|
89 | (5) |
|
|
89 | (1) |
|
8.4.2 Minkowski: Distance Measures for Numeric Attributes |
|
|
90 | (2) |
|
8.4.2.1 Distance Measures for Binary Attributes |
|
|
90 | (1) |
|
8.4.2.2 Distance Measures for Nominal Attributes |
|
|
91 | (1) |
|
8.4.2.3 Distance Metrics for Ordinal Attributes |
|
|
91 | (1) |
|
8.4.2.4 Distance Metrics for Mixed-Type Attributes |
|
|
92 | (1) |
|
8.4.3 Similarity Functions |
|
|
92 | (1) |
|
|
93 | (1) |
|
8.4.3.2 Pearson Correlation Measure |
|
|
93 | (1) |
|
8.4.3.3 Extended Jaccard Measure |
|
|
93 | (1) |
|
8.4.3.4 Dice Coefficient Measure |
|
|
93 | (1) |
|
|
93 | (1) |
|
8.5 Hidden Markov Model Trees |
|
|
94 | (5) |
9 Decision Forests |
|
99 | (52) |
|
|
99 | (1) |
|
|
99 | (9) |
|
|
108 | (10) |
|
|
108 | (5) |
|
|
108 | (1) |
|
9.3.1.2 Performance Weighting |
|
|
109 | (1) |
|
9.3.1.3 Distribution Summation |
|
|
109 | (1) |
|
9.3.1.4 Bayesian Combination |
|
|
109 | (1) |
|
|
110 | (1) |
|
|
110 | (1) |
|
|
110 | (1) |
|
9.3.1.8 Entropy Weighting |
|
|
110 | (1) |
|
9.3.1.9 Density-based Weighting |
|
|
111 | (1) |
|
9.3.1.10 DEA Weighting Method |
|
|
111 | (1) |
|
9.3.1.11 Logarithmic Opinion Pool |
|
|
111 | (1) |
|
|
112 | (1) |
|
9.3.1.13 Order Statistics |
|
|
113 | (1) |
|
9.3.2 Meta-combination Methods |
|
|
113 | (5) |
|
|
113 | (1) |
|
|
114 | (2) |
|
|
116 | (1) |
|
|
117 | (1) |
|
9.4 Classifier Dependency |
|
|
118 | (12) |
|
|
118 | (4) |
|
9.4.1.1 Model-guided Instance Selection |
|
|
118 | (4) |
|
9.4.1.2 Incremental Batch Learning |
|
|
122 | (1) |
|
9.4.2 Independent Methods |
|
|
122 | (8) |
|
|
122 | (2) |
|
|
124 | (1) |
|
|
125 | (1) |
|
|
126 | (3) |
|
9.4.2.5 Cross-validated Committees |
|
|
129 | (1) |
|
|
130 | (14) |
|
9.5.1 Manipulating the Inducer |
|
|
131 | (2) |
|
9.5.1.1 Manipulation of the Inducer's Parameters |
|
|
131 | (1) |
|
9.5.1.2 Starting Point in Hypothesis Space |
|
|
132 | (1) |
|
9.5.1.3 Hypothesis Space Traversal |
|
|
132 | (1) |
|
9.5.1.3.1 Random-based Strategy |
|
|
132 | (1) |
|
9.5.1.3.2 Collective-Performance-based Strategy |
|
|
132 | (1) |
|
9.5.2 Manipulating the Training Samples |
|
|
133 | (1) |
|
|
133 | (1) |
|
|
133 | (1) |
|
|
134 | (1) |
|
9.5.3 Manipulating the Target Attribute Representation |
|
|
134 | (2) |
|
9.5.4 Partitioning the Search Space |
|
|
136 | (6) |
|
9.5.4.1 Divide and Conquer |
|
|
136 | (1) |
|
9.5.4.2 Feature Subset-based Ensemble Methods |
|
|
137 | (9) |
|
9.5.4.2.1 Random based Strategy |
|
|
138 | (1) |
|
9.5.4.2.2 Reduct-based Strategy |
|
|
138 | (1) |
|
9.5.4.2.3 Collective-Performance-based Strategy |
|
|
139 | (1) |
|
9.5.4.2.4 Feature Set Partitioning |
|
|
139 | (3) |
|
|
142 | (1) |
|
9.5.6 Measuring the Diversity |
|
|
143 | (1) |
|
|
144 | (3) |
|
9.6.1 Selecting the Ensemble Size |
|
|
144 | (1) |
|
9.6.2 Pre-selection of the Ensemble Size |
|
|
145 | (1) |
|
9.6.3 Selection of the Ensemble Size while Training |
|
|
145 | (1) |
|
9.6.4 Pruning — Post Selection of the Ensemble Size |
|
|
146 | (6) |
|
9.6.4.1 Pre-combining Pruning |
|
|
146 | (1) |
|
9.6.4.2 Post-combining Pruning |
|
|
146 | (1) |
|
|
147 | (1) |
|
9.8 Multistrategy Ensemble Learning |
|
|
148 | (1) |
|
9.9 Which Ensemble Method Should be Used? |
|
|
148 | (1) |
|
9.10 Open Source for Decision Trees Forests |
|
|
149 | (2) |
10 A Walk-through-guide for Using Decision Trees Software |
|
151 | (16) |
|
|
151 | (1) |
|
|
152 | (7) |
|
10.2.1 Training a Classification Tree |
|
|
153 | (5) |
|
|
158 | (1) |
|
|
159 | (8) |
|
|
159 | (3) |
|
|
162 | (1) |
|
10.3.3 Other Types of Trees |
|
|
163 | (1) |
|
|
164 | (1) |
|
|
165 | (2) |
11 Advanced Decision Trees |
|
167 | (16) |
|
11.1 Oblivious Decision Trees |
|
|
167 | (1) |
|
11.2 Online Adaptive Decision Trees |
|
|
168 | (1) |
|
|
168 | (1) |
|
|
169 | (3) |
|
|
172 | (1) |
|
11.6 Oblique Decision Trees |
|
|
172 | (3) |
|
11.7 Incremental Learning of Decision Trees |
|
|
175 | (4) |
|
11.7.1 The Motives for Incremental Learning |
|
|
175 | (1) |
|
11.7.2 The Inefficiency Challenge |
|
|
176 | (1) |
|
11.7.3 The Concept Drift Challenge |
|
|
177 | (2) |
|
11.8 Decision Trees Inducers for Large Datasets |
|
|
179 | (4) |
|
11.8.1 Accelerating Tree Induction |
|
|
180 | (2) |
|
11.8.2 Parallel Induction of Tree |
|
|
182 | (1) |
12 Cost-sensitive Active and Proactive Learning of Decision Trees |
|
183 | (20) |
|
|
183 | (1) |
|
|
184 | (1) |
|
|
185 | (3) |
|
12.4 Induction of Cost Sensitive Decision Trees |
|
|
188 | (1) |
|
|
189 | (7) |
|
12.6 Proactive Data Mining |
|
|
196 | (7) |
|
12.6.1 Changing the Input Data |
|
|
197 | (1) |
|
12.6.2 Attribute Changing Cost and Benefit Functions |
|
|
198 | (1) |
|
12.6.3 Maximizing Utility |
|
|
199 | (1) |
|
12.6.4 An Algorithmic Framework for Proactive Data Mining |
|
|
200 | (3) |
13 Feature Selection |
|
203 | (22) |
|
|
203 | (1) |
|
13.2 The "Curse of Dimensionality" |
|
|
203 | (3) |
|
13.3 Techniques for Feature Selection |
|
|
206 | (5) |
|
|
207 | (2) |
|
|
207 | (1) |
|
|
207 | (1) |
|
13.3.1.3 Using a Learning Algorithm as a Filter |
|
|
207 | (1) |
|
13.3.1.4 An Information Theoretic Feature Filter |
|
|
208 | (1) |
|
13.3.1.5 RELIEF Algorithm |
|
|
208 | (1) |
|
13.3.1.6 Simba and G-flip |
|
|
208 | (1) |
|
13.3.1.7 Contextual Merit (CM) Algorithm |
|
|
209 | (1) |
|
13.3.2 Using Traditional Statistics for Filtering |
|
|
209 | (2) |
|
|
209 | (1) |
|
13.3.2.2 AIC, BIC and F-ratio |
|
|
209 | (1) |
|
13.3.2.3 Principal Component Analysis (PCA) |
|
|
210 | (1) |
|
13.3.2.4 Factor Analysis (FA) |
|
|
210 | (1) |
|
13.3.2.5 Projection Pursuit (PP) |
|
|
210 | (1) |
|
|
211 | (2) |
|
13.3.3.1 Wrappers for Decision Tree Learners |
|
|
211 | (1) |
|
13.4 Feature Selection as a means of Creating Ensembles |
|
|
211 | (2) |
|
13.5 Ensemble Methodology for Improving Feature Selection |
|
|
213 | (8) |
|
13.5.1 Independent Algorithmic Framework |
|
|
215 | (1) |
|
13.5.2 Combining Procedure |
|
|
216 | (4) |
|
13.5.2.1 Simple Weighted Voting |
|
|
216 | (2) |
|
13.5.2.2 Using Artificial Contrasts |
|
|
218 | (2) |
|
13.5.3 Feature Ensemble Generator |
|
|
220 | (10) |
|
13.5.3.1 Multiple Feature Selectors |
|
|
220 | (1) |
|
|
221 | (1) |
|
13.6 Using Decision Trees for Feature Selection |
|
|
221 | (1) |
|
13.7 Limitation of Feature Selection Methods |
|
|
222 | (3) |
14 Fuzzy Decision Trees |
|
225 | (12) |
|
|
225 | (1) |
|
|
226 | (1) |
|
14.3 Fuzzy Classification Problems |
|
|
227 | (1) |
|
14.4 Fuzzy Set Operations |
|
|
228 | (1) |
|
14.5 Fuzzy Classification Rules |
|
|
229 | (1) |
|
14.6 Creating Fuzzy Decision Tree |
|
|
230 | (4) |
|
14.6.1 Fuzzifying Numeric Attributes |
|
|
230 | (2) |
|
14.6.2 Inducing of Fuzzy Decision Tree |
|
|
232 | (2) |
|
14.7 Simplifying the Decision Tree |
|
|
234 | (1) |
|
14.8 Classification of New Instances |
|
|
234 | (1) |
|
14.9 Other Fuzzy Decision Tree Inducers |
|
|
234 | (3) |
15 Hybridization of Decision Trees with other Techniques |
|
237 | (14) |
|
|
237 | (1) |
|
15.2 A Framework for Instance-Space Decomposition |
|
|
237 | (5) |
|
|
240 | (1) |
|
|
241 | (1) |
|
15.2.3 Split Validation Examinations |
|
|
241 | (1) |
|
15.3 The Contrasted Population Miner (CPOM) Algorithm |
|
|
242 | (4) |
|
|
242 | (2) |
|
15.3.2 The Grouped Gain Ratio Splitting Rule |
|
|
244 | (2) |
|
15.4 Induction of Decision Trees by an Evolutionary Algorithm (EA) |
|
|
246 | (5) |
16 Decision Trees and Recommender Systems |
|
251 | (22) |
|
|
251 | (1) |
|
16.2 Using Decision Trees for Recommending Items |
|
|
252 | (7) |
|
16.2.1 RS-Adapted Decision Tree |
|
|
253 | (4) |
|
16.2.2 Least Probable Intersections |
|
|
257 | (2) |
|
16.3 Using Decision Trees for Preferences Elicitation |
|
|
259 | (14) |
|
|
261 | (1) |
|
16.3.2 Dynamic Methods and Decision Trees |
|
|
262 | (1) |
|
16.3.3 SVD-based CF Method |
|
|
263 | (1) |
|
16.3.4 Pairwise Comparisons |
|
|
264 | (2) |
|
16.3.5 Profile Representation |
|
|
266 | (1) |
|
16.3.6 Selecting the Next Pairwise Comparison |
|
|
267 | (2) |
|
16.3.7 Clustering the Items |
|
|
269 | (1) |
|
16.3.8 Training a Lazy Decision Tree |
|
|
270 | (3) |
Bibliography |
|
273 | (30) |
Index |
|
303 | |