Muutke küpsiste eelistusi

E-raamat: Data Mining With Decision Trees: Theory And Applications (2nd Edition)

(Ben-gurion Univ Of The Negev, Israel), (Tel-aviv Univ, Israel)
Teised raamatud teemal:
  • Formaat - EPUB+DRM
  • Hind: 42,12 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

The textbook is for graduate and undergraduate courses in data mining for students in the information sciences. Nearly a quarter of the second edition is new material, say Rokach and Maimon, including four new chapters. Their topics include training decision trees, evaluating classification trees, splitting criteria, popular decision trees induction algorithms, a walk-through guide for using decision trees software, the cost-sensitive active and proactive learning of decision trees, feature selection, and decision trees and recommender systems. Annotation ©2015 Ringgold, Inc., Portland, OR (protoview.com)

Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining; it is the science of exploring large and complex bodies of data in order to discover useful patterns. Decision tree learning continues to evolve over time. Existing methods are constantly being improved and new methods introduced.This 2nd Edition is dedicated entirely to the field of decision trees in data mining; to cover all aspects of this important technique, as well as improved or new methods and techniques developed after the publication of our first edition. In this new edition, all chapters have been revised and new topics brought in. New topics include Cost-Sensitive Active Learning, Learning with Uncertain and Imbalanced Data, Using Decision Trees beyond Classification Tasks, Privacy Preserving Decision Tree Learning, Lessons Learned from Comparative Studies, and Learning Decision Trees for Big Data. A walk-through guide to existing open-source data mining software is also included in this edition.This book invites readers to explore the many benefits in data mining that decision trees offer: Self-explanatory and easy to follow when compacted Able to handle a variety of input data: nominal, numeric and textual Scales well to big data Able to process datasets that may have errors or missing values High predictive performance for a relatively small computational effort Available in many open source data mining packages over a variety of platforms Useful for various tasks, such as classification, regression, clustering and feature selection
About the Authors vi
Preface for the Second Edition vii
Preface for the First Edition ix
1 Introduction to Decision Trees 1(16)
1.1 Data Science
1(1)
1.2 Data Mining
2(1)
1.3 The Four-Layer Model
3(1)
1.4 Knowledge Discovery in Databases (KDD)
4(4)
1.5 Taxonomy of Data Mining' Methods
8(1)
1.6 Supervised Methods
9(1)
1.6.1 Overview
9(1)
1.7 Classification Trees
10(2)
1.8 Characteristics of Classification Trees
12(3)
1.8.1 Tree Size
14(1)
1.8.2 The Hierarchical Nature of Decision Trees
15(1)
1.9 Relation to Rule Induction
15(2)
2 Training Decision Trees 17(6)
2.1 What is Learning?
17(1)
2.2 Preparing the Training Set
17(2)
2.3 Training the Decision Tree
19(4)
3 A Generic Algorithm for Top-Down Induction of Decision Trees 23(8)
3.1 Training Set
23(2)
3.2 Definition of the Classification Problem
25(1)
3.3 Induction Algorithms
26(1)
3.4 Probability Estimation in Decision Trees
26(2)
3.4.1 Laplace Correction
27(1)
3.4.2 No Match
28(1)
3.5 Algorithmic Framework for Decision Trees
28(2)
3.6 Stopping Criteria
30(1)
4 Evaluation of Classification Trees 31(30)
4.1 Overview
31(1)
4.2 Generalization Error
31(21)
4.2.1 Theoretical Estimation of Generalization Error
32(1)
4.2.2 Empirical Estimation of Generalization Error
32(2)
4.2.3 Alternatives to the Accuracy Measure
34(1)
4.2.4 The F-Measure
35(1)
4.2.5 Confusion Matrix
36(1)
4.2.6 Classifier Evaluation under Limited Resources
37(11)
4.2.6.1 ROC Curves
39(1)
4.2.6.2 Hit-Rate Curve
40(1)
4.2.6.3 Qrecall (Quota Recall)
40(1)
4.2.6.4 Lift Curve
41(1)
4.2.6.5 Pearson Correlation Coefficient
41(2)
4.2.6.6 Area Under Curve (AUC)
43(1)
4.2.6.7 Average Hit-Rate
44(1)
4.2.6.8 Average Qrecall
44(1)
4.2.6.9 Potential Extract Measure (PEM)
45(3)
4.2.7 Which Decision Tree Classifier is Better?
48(13)
4.2.7.1 McNemar's Test
48(2)
4.2.7.2 A Test for the Difference of Two Proportions
50(1)
4.2.7.3 The Resampled Paired t Test
51(1)
4.2.7.4 The k-fold Cross-validated Paired t Test
51(1)
4.3 Computational Complexity
52(1)
4.4 Comprehensibility
52(1)
4.5 Scalability to Large Datasets
53(2)
4.6 Robustness
55(1)
4.7 Stability
55(1)
4.8 Interestingness Measures
56(1)
4.9 Overfitting and Underfitting
57(1)
4.10 "No Free Lunch" Theorem
58(3)
5 Splitting Criteria 61(8)
5.1 Univariate Splitting Criteria
61(6)
5.1.1 Overview
61(1)
5.1.2 Impurity-based Criteria
61(1)
5.1.3 Information Gain
62(1)
5.1.4 Gini Index
62(1)
5.1.5 Likelihood Ratio Chi-squared Statistics
63(1)
5.1.6 DKM Criterion
63(1)
5.1.7 Normalized Impurity-based Criteria
63(1)
5.1.8 Gain Ratio
64(1)
5.1.9 Distance Measure
64(1)
5.1.10 Binary Criteria
64(1)
5.1.11 Twoing Criterion
65(1)
5.1.12 Orthogonal Criterion
65(1)
5.1.13 Kolmogorov—Smirnov Criterion
66(1)
5.1.14 AUC Splitting Criteria
66(1)
5.1.15 Other Univariate Splitting Criteria
66(1)
5.1.16 Comparison of Univariate Splitting Criteria
66(1)
5.2 Handling Missing Values
67(2)
6 Pruning Trees 69(8)
6.1 Stopping Criteria
69(1)
6.2 Heuristic Pruning
69(5)
6.2.1 Overview
69(1)
6.2.2 Cost Complexity Pruning
70(1)
6.2.3 Reduced Error Pruning
70(1)
6.2.4 Minimum Error Pruning (MEP)
71(1)
6.2.5 Pessimistic Pruning
71(1)
6.2.6 Error-Based Pruning (EBP)
72(1)
6.2.7 Minimum Description Length (MDL) Pruning
73(1)
6.2.8 Other Pruning Methods
73(1)
6.2.9 Comparison of Pruning Methods
73(1)
6.3 Optimal Pruning
74(3)
7 Popular Decision Trees Induction Algorithms 77(8)
7.1 Overview
77(1)
7.2 ID3
77(1)
7.3 C4.5
78(1)
7.4 CART
79(1)
7.5 CHAID
79(1)
7.6 QUEST
80(1)
7.7 Reference to Other Algorithms
80(1)
7.8 Advantages and Disadvantages of DecIsion Trees
81(4)
8 Beyond Classification Tasks 85(14)
8.1 Introduction
85(1)
8.2 Regression Trees
85(1)
8.3 Survival Trees
86(3)
8.4 Clustering Tree
89(5)
8.4.1 Distance Measures
89(1)
8.4.2 Minkowski: Distance Measures for Numeric Attributes
90(2)
8.4.2.1 Distance Measures for Binary Attributes
90(1)
8.4.2.2 Distance Measures for Nominal Attributes
91(1)
8.4.2.3 Distance Metrics for Ordinal Attributes
91(1)
8.4.2.4 Distance Metrics for Mixed-Type Attributes
92(1)
8.4.3 Similarity Functions
92(1)
8.4.3.1 Cosine Measure
93(1)
8.4.3.2 Pearson Correlation Measure
93(1)
8.4.3.3 Extended Jaccard Measure
93(1)
8.4.3.4 Dice Coefficient Measure
93(1)
8.4.4 The OCCT Algorithm
93(1)
8.5 Hidden Markov Model Trees
94(5)
9 Decision Forests 99(52)
9.1 Introduction
99(1)
9.2 Back to the Roots
99(9)
9.3 Combination Methods
108(10)
9.3.1 Weighting Methods
108(5)
9.3.1.1 Majority Voting
108(1)
9.3.1.2 Performance Weighting
109(1)
9.3.1.3 Distribution Summation
109(1)
9.3.1.4 Bayesian Combination
109(1)
9.3.1.5 Dempster—Shafer
110(1)
9.3.1.6 Vogging
110(1)
9.3.1.7 Naive Bayes
110(1)
9.3.1.8 Entropy Weighting
110(1)
9.3.1.9 Density-based Weighting
111(1)
9.3.1.10 DEA Weighting Method
111(1)
9.3.1.11 Logarithmic Opinion Pool
111(1)
9.3.1.12 Gating Network
112(1)
9.3.1.13 Order Statistics
113(1)
9.3.2 Meta-combination Methods
113(5)
9.3.2.1 Stacking
113(1)
9.3.2.2 Arbiter Trees
114(2)
9.3.2.3 Combiner Trees
116(1)
9.3.2.4 Grading
117(1)
9.4 Classifier Dependency
118(12)
9.4.1 Dependent Methods
118(4)
9.4.1.1 Model-guided Instance Selection
118(4)
9.4.1.2 Incremental Batch Learning
122(1)
9.4.2 Independent Methods
122(8)
9.4.2.1 Bagging
122(2)
9.4.2.2 Wagging
124(1)
9.4.2.3 Random Forest
125(1)
9.4.2.4 Rotation Forest
126(3)
9.4.2.5 Cross-validated Committees
129(1)
9.5 Ensemble Diversity
130(14)
9.5.1 Manipulating the Inducer
131(2)
9.5.1.1 Manipulation of the Inducer's Parameters
131(1)
9.5.1.2 Starting Point in Hypothesis Space
132(1)
9.5.1.3 Hypothesis Space Traversal
132(1)
9.5.1.3.1 Random-based Strategy
132(1)
9.5.1.3.2 Collective-Performance-based Strategy
132(1)
9.5.2 Manipulating the Training Samples
133(1)
9.5.2.1 Resampling
133(1)
9.5.2.2 Creation
133(1)
9.5.2.3 Partitioning
134(1)
9.5.3 Manipulating the Target Attribute Representation
134(2)
9.5.4 Partitioning the Search Space
136(6)
9.5.4.1 Divide and Conquer
136(1)
9.5.4.2 Feature Subset-based Ensemble Methods
137(9)
9.5.4.2.1 Random based Strategy
138(1)
9.5.4.2.2 Reduct-based Strategy
138(1)
9.5.4.2.3 Collective-Performance-based Strategy
139(1)
9.5.4.2.4 Feature Set Partitioning
139(3)
9.5.5 Multi-Inducers
142(1)
9.5.6 Measuring the Diversity
143(1)
9.6 Ensemble Size
144(3)
9.6.1 Selecting the Ensemble Size
144(1)
9.6.2 Pre-selection of the Ensemble Size
145(1)
9.6.3 Selection of the Ensemble Size while Training
145(1)
9.6.4 Pruning — Post Selection of the Ensemble Size
146(6)
9.6.4.1 Pre-combining Pruning
146(1)
9.6.4.2 Post-combining Pruning
146(1)
9.7 Cross-Inducer
147(1)
9.8 Multistrategy Ensemble Learning
148(1)
9.9 Which Ensemble Method Should be Used?
148(1)
9.10 Open Source for Decision Trees Forests
149(2)
10 A Walk-through-guide for Using Decision Trees Software 151(16)
10.1 Introduction
151(1)
10.2 Weka
152(7)
10.2.1 Training a Classification Tree
153(5)
10.2.2 Building a Forest
158(1)
10.3 R
159(8)
10.3.1 Party Package
159(3)
10.3.2 Forest
162(1)
10.3.3 Other Types of Trees
163(1)
10.3.4 The Rpart Package
164(1)
10.3.5 RandomForest
165(2)
11 Advanced Decision Trees 167(16)
11.1 Oblivious Decision Trees
167(1)
11.2 Online Adaptive Decision Trees
168(1)
11.3 Lazy Tree
168(1)
11.4 Option Tree
169(3)
11.5 Lookahead
172(1)
11.6 Oblique Decision Trees
172(3)
11.7 Incremental Learning of Decision Trees
175(4)
11.7.1 The Motives for Incremental Learning
175(1)
11.7.2 The Inefficiency Challenge
176(1)
11.7.3 The Concept Drift Challenge
177(2)
11.8 Decision Trees Inducers for Large Datasets
179(4)
11.8.1 Accelerating Tree Induction
180(2)
11.8.2 Parallel Induction of Tree
182(1)
12 Cost-sensitive Active and Proactive Learning of Decision Trees 183(20)
12.1 Overview
183(1)
12.2 Type of Costs
184(1)
12.3 Learning with Costs
185(3)
12.4 Induction of Cost Sensitive Decision Trees
188(1)
12.5 Active Learning
189(7)
12.6 Proactive Data Mining
196(7)
12.6.1 Changing the Input Data
197(1)
12.6.2 Attribute Changing Cost and Benefit Functions
198(1)
12.6.3 Maximizing Utility
199(1)
12.6.4 An Algorithmic Framework for Proactive Data Mining
200(3)
13 Feature Selection 203(22)
13.1 Overview
203(1)
13.2 The "Curse of Dimensionality"
203(3)
13.3 Techniques for Feature Selection
206(5)
13.3.1 Feature Filters
207(2)
13.3.1.1 FOCUS
207(1)
13.3.1.2 LVF
207(1)
13.3.1.3 Using a Learning Algorithm as a Filter
207(1)
13.3.1.4 An Information Theoretic Feature Filter
208(1)
13.3.1.5 RELIEF Algorithm
208(1)
13.3.1.6 Simba and G-flip
208(1)
13.3.1.7 Contextual Merit (CM) Algorithm
209(1)
13.3.2 Using Traditional Statistics for Filtering
209(2)
13.3.2.1 Mallows Cp
209(1)
13.3.2.2 AIC, BIC and F-ratio
209(1)
13.3.2.3 Principal Component Analysis (PCA)
210(1)
13.3.2.4 Factor Analysis (FA)
210(1)
13.3.2.5 Projection Pursuit (PP)
210(1)
13.3.3 Wrappers
211(2)
13.3.3.1 Wrappers for Decision Tree Learners
211(1)
13.4 Feature Selection as a means of Creating Ensembles
211(2)
13.5 Ensemble Methodology for Improving Feature Selection
213(8)
13.5.1 Independent Algorithmic Framework
215(1)
13.5.2 Combining Procedure
216(4)
13.5.2.1 Simple Weighted Voting
216(2)
13.5.2.2 Using Artificial Contrasts
218(2)
13.5.3 Feature Ensemble Generator
220(10)
13.5.3.1 Multiple Feature Selectors
220(1)
13.5.3.2 Bagging
221(1)
13.6 Using Decision Trees for Feature Selection
221(1)
13.7 Limitation of Feature Selection Methods
222(3)
14 Fuzzy Decision Trees 225(12)
14.1 Overview
225(1)
14.2 Membership Function
226(1)
14.3 Fuzzy Classification Problems
227(1)
14.4 Fuzzy Set Operations
228(1)
14.5 Fuzzy Classification Rules
229(1)
14.6 Creating Fuzzy Decision Tree
230(4)
14.6.1 Fuzzifying Numeric Attributes
230(2)
14.6.2 Inducing of Fuzzy Decision Tree
232(2)
14.7 Simplifying the Decision Tree
234(1)
14.8 Classification of New Instances
234(1)
14.9 Other Fuzzy Decision Tree Inducers
234(3)
15 Hybridization of Decision Trees with other Techniques 237(14)
15.1 Introduction
237(1)
15.2 A Framework for Instance-Space Decomposition
237(5)
15.2.1 Stopping Rules
240(1)
15.2.2 Splitting Rules
241(1)
15.2.3 Split Validation Examinations
241(1)
15.3 The Contrasted Population Miner (CPOM) Algorithm
242(4)
15.3.1 CPOM Outline
242(2)
15.3.2 The Grouped Gain Ratio Splitting Rule
244(2)
15.4 Induction of Decision Trees by an Evolutionary Algorithm (EA)
246(5)
16 Decision Trees and Recommender Systems 251(22)
16.1 Introduction
251(1)
16.2 Using Decision Trees for Recommending Items
252(7)
16.2.1 RS-Adapted Decision Tree
253(4)
16.2.2 Least Probable Intersections
257(2)
16.3 Using Decision Trees for Preferences Elicitation
259(14)
16.3.1 Static Methods
261(1)
16.3.2 Dynamic Methods and Decision Trees
262(1)
16.3.3 SVD-based CF Method
263(1)
16.3.4 Pairwise Comparisons
264(2)
16.3.5 Profile Representation
266(1)
16.3.6 Selecting the Next Pairwise Comparison
267(2)
16.3.7 Clustering the Items
269(1)
16.3.8 Training a Lazy Decision Tree
270(3)
Bibliography 273(30)
Index 303