|
|
1 | (18) |
|
1.1 Data Mining and Knowledge Discovery |
|
|
1 | (1) |
|
|
2 | (4) |
|
|
6 | (1) |
|
1.4 Unsupervised Learning |
|
|
7 | (1) |
|
|
8 | (1) |
|
|
8 | (1) |
|
1.5 Other Learning Paradigms |
|
|
8 | (2) |
|
1.5.1 unbalanced Learning |
|
|
8 | (1) |
|
1.5.2 Multi-instance Learning |
|
|
9 | (1) |
|
1.5.3 Multi-label Classification |
|
|
9 | (1) |
|
1.5.4 Semi-supervised Learning |
|
|
9 | (1) |
|
|
9 | (1) |
|
|
10 | (1) |
|
1.5.7 Data Stream Learning |
|
|
10 | (1) |
|
1.6 Introduction to Data Preprocessing |
|
|
10 | (9) |
|
|
11 | (2) |
|
|
13 | (3) |
|
|
16 | (3) |
|
2 Data Sets and Proper Statistical Analysis of Data Mining Techniques |
|
|
19 | (20) |
|
2.1 Data Sets and Partitions |
|
|
19 | (6) |
|
2.1.1 Data Set Partitioning |
|
|
21 | (3) |
|
2.1.2 Performance Measures |
|
|
24 | (1) |
|
2.2 Using Statistical Tests to Compare Methods |
|
|
25 | (14) |
|
2.2.1 Conditions for the Safe Use of Parametric Tests |
|
|
26 | (1) |
|
2.2.2 Normality Test over the Group of Data Sets and Algorithms |
|
|
27 | (2) |
|
2.2.3 Non-parametric Tests for Comparing Two Algorithms in Multiple Data Set Analysis |
|
|
29 | (3) |
|
2.2.4 Non-parametric Tests for Multiple Comparisons Among More than Two Algorithms |
|
|
32 | (5) |
|
|
37 | (2) |
|
3 Data Preparation Basic Models |
|
|
39 | (20) |
|
|
39 | (1) |
|
|
40 | (5) |
|
3.2.1 Finding Redundant Attributes |
|
|
41 | (2) |
|
3.2.2 Detecting Tuple Duplication and Inconsistency |
|
|
43 | (2) |
|
|
45 | (1) |
|
|
46 | (2) |
|
3.4.1 Min-Max Normalization |
|
|
46 | (1) |
|
3.4.2 Z-score Normalization |
|
|
47 | (1) |
|
3.4.3 Decimal Scaling Normalization |
|
|
48 | (1) |
|
|
48 | (11) |
|
3.5.1 Linear Transformations |
|
|
49 | (1) |
|
3.5.2 Quadratic Transformations |
|
|
49 | (1) |
|
3.5.3 Non-polynomial Approximations of Transformations |
|
|
50 | (1) |
|
3.5.4 Polynomial Approximations of Transformations |
|
|
51 | (1) |
|
3.5.5 Rank Transformations |
|
|
52 | (1) |
|
3.5.6 Box-Cox Transformations |
|
|
53 | (1) |
|
3.5.7 Spreading the Histogram |
|
|
54 | (1) |
|
3.5.8 Nominal to Binary Transformation |
|
|
54 | (1) |
|
3.5.9 Transformations via Data Reduction |
|
|
55 | (1) |
|
|
55 | (4) |
|
4 Dealing with Missing Values |
|
|
59 | (48) |
|
|
59 | (2) |
|
4.2 Assumptions and Missing Data Mechanisms |
|
|
61 | (2) |
|
4.3 Simple Approaches to Missing Data |
|
|
63 | (1) |
|
4.4 Maximum Likelihood Imputation Methods |
|
|
64 | (12) |
|
4.4.1 Expectation-Maximization (EM) |
|
|
65 | (3) |
|
4.4.2 Multiple Imputation |
|
|
68 | (4) |
|
4.4.3 Bayesian Principal Component Analysis (BPCA) |
|
|
72 | (4) |
|
4.5 Imputation of Missing Values. Machine Learning Based Methods |
|
|
76 | (14) |
|
4.5.1 Imputation with K-Nearest Neighbor (KNNI) |
|
|
76 | (1) |
|
4.5.2 Weighted Imputation with K-Nearest Neighbour (WKNNI) |
|
|
77 | (1) |
|
4.5.3 K-means Clustering Imputation (KMI) |
|
|
78 | (1) |
|
4.5.4 Imputation with Fuzzy K-means Clustering (FKMI) |
|
|
78 | (1) |
|
4.5.5 Support Vector Machines Imputation (SVMI) |
|
|
79 | (3) |
|
4.5.6 Event Covering (EC) |
|
|
82 | (4) |
|
4.5.7 Singular Value Decomposition Imputation (SVDI) |
|
|
86 | (1) |
|
4.5.8 Local Least Squares Imputation (LLSI) |
|
|
86 | (4) |
|
4.5.9 Recent Machine Learning Approaches to Missing Values Imputation |
|
|
90 | (1) |
|
4.6 Experimental Comparative Analysis |
|
|
90 | (17) |
|
4.6.1 Effect of the Imputation Methods in the Attributes' Relationships |
|
|
90 | (7) |
|
4.6.2 Best Imputation Methods for Classification Methods |
|
|
97 | (3) |
|
4.6.3 Interesting Comments |
|
|
100 | (1) |
|
|
101 | (6) |
|
5 Dealing with Noisy Data |
|
|
107 | (40) |
|
|
107 | (3) |
|
5.2 Types of Noise Data: Class Noise and Attribute Noise |
|
|
110 | (5) |
|
5.2.1 Noise Introduction Mechanisms |
|
|
111 | (3) |
|
5.2.2 Simulating the Noise of Real-World Data Sets |
|
|
114 | (1) |
|
5.3 Noise Filtering at Data Level |
|
|
115 | (3) |
|
|
116 | (1) |
|
5.3.2 Cross-Validated Committees Filter |
|
|
117 | (1) |
|
5.3.3 Iterative-Partitioning Filter |
|
|
117 | (1) |
|
5.3.4 More Filtering Methods |
|
|
118 | (1) |
|
5.4 Robust Learners Against Noise |
|
|
118 | (7) |
|
5.4.1 Multiple Classifier Systems for Classification Tasks |
|
|
120 | (3) |
|
5.4.2 Addressing Multi-class Classification Problems by Decomposition |
|
|
123 | (2) |
|
5.5 Empirical Analysis of Noise Filters and Robust Strategies |
|
|
125 | (22) |
|
|
125 | (2) |
|
5.5.2 Noise Filters for Class Noise |
|
|
127 | (2) |
|
5.5.3 Noise Filtering Efficacy Prediction by Data Complexity Measures |
|
|
129 | (4) |
|
5.5.4 Multiple Classifier Systems with Noise |
|
|
133 | (3) |
|
5.5.5 Analysis of the OVO Decomposition with Noise |
|
|
136 | (4) |
|
|
140 | (7) |
|
|
147 | (16) |
|
|
147 | (1) |
|
6.2 The Curse of Dimensionality |
|
|
148 | (8) |
|
6.2.1 Principal Components Analysis |
|
|
149 | (2) |
|
|
151 | (1) |
|
6.2.3 Multidimensional Scaling |
|
|
152 | (3) |
|
6.2.4 Locally Linear Embedding |
|
|
155 | (1) |
|
|
156 | (5) |
|
|
158 | (1) |
|
|
159 | (1) |
|
|
159 | (2) |
|
6.4 Binning and Reduction of Cardinality |
|
|
161 | (2) |
|
|
162 | (1) |
|
|
163 | (32) |
|
|
163 | (1) |
|
|
164 | (12) |
|
7.2.1 The Search of a Subset of Features |
|
|
164 | (4) |
|
|
168 | (5) |
|
7.2.3 Filter, Wrapper and Embedded Feature Selection |
|
|
173 | (3) |
|
|
176 | (4) |
|
7.3.1 Output of Feature Selection |
|
|
176 | (1) |
|
|
177 | (2) |
|
|
179 | (1) |
|
7.3.4 Using Decision Trees for Feature Selection |
|
|
179 | (1) |
|
7.4 Description of the Most Representative Feature Selection Methods |
|
|
180 | (5) |
|
|
181 | (1) |
|
|
182 | (1) |
|
7.4.3 Nondeterministic Methods |
|
|
182 | (2) |
|
7.4.4 Feature Weighting Methods |
|
|
184 | (1) |
|
7.5 Related and Advanced Topics |
|
|
185 | (5) |
|
7.5.1 Leading and Recent Feature Selection Techniques |
|
|
186 | (2) |
|
|
188 | (1) |
|
7.5.3 Feature Construction |
|
|
189 | (1) |
|
7.6 Experimental Comparative Analyses in Feature Selection |
|
|
190 | (5) |
|
|
191 | (4) |
|
|
195 | (50) |
|
|
195 | (2) |
|
8.2 Training Set Selection Versus Prototype Selection |
|
|
197 | (2) |
|
8.3 Prototype Selection Taxonomy |
|
|
199 | (7) |
|
8.3.1 Common Properties in Prototype Selection Methods |
|
|
199 | (3) |
|
8.3.2 Prototype Selection Methods |
|
|
202 | (1) |
|
8.3.3 Taxonomy of Prototype Selection Methods |
|
|
202 | (4) |
|
8.4 Description of Methods |
|
|
206 | (15) |
|
8.4.1 Condensation Algorithms |
|
|
206 | (4) |
|
|
210 | (2) |
|
|
212 | (9) |
|
8.5 Related and Advanced Topics |
|
|
221 | (3) |
|
8.5.1 Prototype Generation |
|
|
221 | (1) |
|
8.5.2 Distance Metrics, Feature Weighting and Combinations with Feature Selection |
|
|
221 | (1) |
|
8.5.3 Hybridizations with Other Learning Methods and Ensembles |
|
|
222 | (1) |
|
8.5.4 Scaling-Up Approaches |
|
|
223 | (1) |
|
|
223 | (1) |
|
8.6 Experimental Comparative Analysis in Prototype Selection |
|
|
224 | (21) |
|
8.6.1 Analysis and Empirical Results on Small Size Data Sets |
|
|
225 | (5) |
|
8.6.2 Analysis and Empirical Results on Medium Size Data Sets |
|
|
230 | (1) |
|
8.6.3 Global View of the Obtained Results |
|
|
231 | (2) |
|
8.6.4 Visualization of Data Subsets: A Case Study Based on the Banana Data Set |
|
|
233 | (3) |
|
|
236 | (9) |
|
|
245 | (40) |
|
|
245 | (2) |
|
9.2 Perspectives and Background |
|
|
247 | (4) |
|
9.2.1 Discretization Process |
|
|
247 | (3) |
|
9.2.2 Related and Advanced Work |
|
|
250 | (1) |
|
9.3 Properties and Taxonomy |
|
|
251 | (14) |
|
|
251 | (4) |
|
9.3.2 Methods and Taxonomy |
|
|
255 | (4) |
|
9.3.3 Description of the Most Representative Discretization Methods |
|
|
259 | (6) |
|
9.4 Experimental Comparative Analysis |
|
|
265 | (20) |
|
9.4.1 Experimental Set up |
|
|
265 | (3) |
|
9.4.2 Analysis and Empirical Results |
|
|
268 | (10) |
|
|
278 | (7) |
|
10 A Data Mining Software Package Including Data Preparation and Reduction: KEEL |
|
|
285 | (30) |
|
10.1 Data Mining Softwares and Toolboxes |
|
|
285 | (2) |
|
10.2 KEEL: Knowledge Extraction Based on Evolutionary Learning |
|
|
287 | (7) |
|
|
288 | (1) |
|
|
289 | (2) |
|
10.2.3 Design of Experiments: Off-Line Module |
|
|
291 | (2) |
|
10.2.4 Computer-Based Education: On-Line Module |
|
|
293 | (1) |
|
|
294 | (4) |
|
10.3.1 Data Sets Web Pages |
|
|
294 | (3) |
|
10.3.2 Experimental Study Web Pages |
|
|
297 | (1) |
|
10.4 Integration of New Algorithms into the KEEL Tool |
|
|
298 | (5) |
|
10.4.1 Introduction to the KEEL Codification Features |
|
|
298 | (5) |
|
10.5 KEEL Statistical Tests |
|
|
303 | (7) |
|
|
304 | (6) |
|
10.6 Summarizing Comments |
|
|
310 | (5) |
|
|
311 | (4) |
Index |
|
315 | |