Preface |
|
xv | |
Acknowledgements |
|
xxi | |
|
1 Fundamentals of Pattern Recognition |
|
|
1 | (48) |
|
1.1 Basic Concepts: Class, Feature, Data Set |
|
|
1 | (8) |
|
1.1.1 Classes and Class Labels |
|
|
1 | (1) |
|
|
2 | (1) |
|
|
3 | (3) |
|
1.1.4 Generate Your Own Data |
|
|
6 | (3) |
|
1.2 Classifier, Discriminant Functions, Classification Regions |
|
|
9 | (2) |
|
1.3 Classification Error and Classification Accuracy |
|
|
11 | (8) |
|
1.3.1 Where Does the Error Come From? Bias and Variance |
|
|
11 | (2) |
|
1.3.2 Estimation of the Error |
|
|
13 | (1) |
|
1.3.3 Confusion Matrices and Loss Matrices |
|
|
14 | (1) |
|
1.3.4 Training and Testing Protocols |
|
|
15 | (2) |
|
1.3.5 Overtraining and Peeking |
|
|
17 | (2) |
|
1.4 Experimental Comparison of Classifiers |
|
|
19 | (11) |
|
1.4.1 Two Trained Classifiers and a Fixed Testing Set |
|
|
20 | (2) |
|
1.4.2 Two Classifier Models and a Single Data Set |
|
|
22 | (4) |
|
1.4.3 Two Classifier Models and Multiple Data Sets |
|
|
26 | (1) |
|
1.4.4 Multiple Classifier Models and Multiple Data Sets |
|
|
27 | (3) |
|
1.5 Bayes Decision Theory |
|
|
30 | (5) |
|
1.5.1 Probabilistic Framework |
|
|
30 | (1) |
|
1.5.2 Discriminant Functions and Decision Boundaries |
|
|
31 | (2) |
|
|
33 | (2) |
|
1.6 Clustering and Feature Selection |
|
|
35 | (5) |
|
|
35 | (2) |
|
|
37 | (3) |
|
1.7 Challenges of Real-Life Data |
|
|
40 | (9) |
|
|
41 | (1) |
|
|
41 | (1) |
|
1.A.2 Comparison of Classifiers |
|
|
42 | (1) |
|
1.A.2.1 MATLAB Functions for Comparing Classifiers |
|
|
42 | (3) |
|
1.A.2.2 Critical Values for Wilcoxon and Sign Test |
|
|
45 | (2) |
|
|
47 | (2) |
|
|
49 | (45) |
|
2.1 Linear and Quadratic Classifiers |
|
|
49 | (6) |
|
2.1.1 Linear Discriminant Classifier |
|
|
49 | (3) |
|
2.1.2 Nearest Mean Classifier |
|
|
52 | (1) |
|
2.1.3 Quadratic Discriminant Classifier |
|
|
52 | (1) |
|
2.1.4 Stability of LDC and QDC |
|
|
53 | (2) |
|
2.2 Decision Tree Classifiers |
|
|
55 | (11) |
|
2.2.1 Basics and Terminology |
|
|
55 | (2) |
|
2.2.2 Training of Decision Tree Classifiers |
|
|
57 | (1) |
|
2.2.3 Selection of the Feature for a Node |
|
|
58 | (2) |
|
|
60 | (3) |
|
2.2.5 Pruning of the Decision Tree |
|
|
63 | (1) |
|
|
64 | (1) |
|
2.2.7 Instability of Decision Trees |
|
|
64 | (1) |
|
|
65 | (1) |
|
2.3 The Naive Bayes Classifier |
|
|
66 | (2) |
|
|
68 | (5) |
|
|
68 | (2) |
|
2.4.2 Rosenblatt's Perceptron |
|
|
70 | (1) |
|
2.4.3 Multi-Layer Perceptron |
|
|
71 | (2) |
|
2.5 Support Vector Machines |
|
|
73 | (7) |
|
|
73 | (1) |
|
2.5.2 Classification Margins |
|
|
74 | (2) |
|
2.5.3 Optimal Linear Boundary |
|
|
76 | (2) |
|
2.5.4 Parameters and Classification Boundaries of SVM |
|
|
78 | (2) |
|
2.6 The κ-Nearest Neighbor Classifier (A:-nn) |
|
|
80 | (2) |
|
|
82 | (12) |
|
2.7.1 Simple or Complex Models? |
|
|
82 | (1) |
|
2.7.2 The Triangle Diagram |
|
|
83 | (2) |
|
2.7.3 Choosing a Base Classifier for Ensembles |
|
|
85 | (1) |
|
|
85 | (1) |
|
2.A.1 MATLAB Code for the Fish Data |
|
|
85 | (1) |
|
2.A.2 MATLAB Code for Individual Classifiers |
|
|
86 | (1) |
|
|
86 | (3) |
|
|
89 | (1) |
|
2.A.2.3 Multi-Layer Perceptron |
|
|
90 | (2) |
|
|
92 | (2) |
|
3 An Overview of the Field |
|
|
94 | (17) |
|
|
94 | (4) |
|
|
98 | (2) |
|
3.2.1 The Wisdom of the "Classifier Crowd" |
|
|
98 | (1) |
|
3.2.2 The Power of Divide-and-Conquer |
|
|
98 | (2) |
|
3.3 Structure of the Area |
|
|
100 | (5) |
|
|
100 | (1) |
|
3.3.2 A Taxonomy of Classifier Ensemble Methods |
|
|
100 | (4) |
|
3.3.3 Classifier Fusion and Classifier Selection |
|
|
104 | (1) |
|
|
105 | (6) |
|
3.4.1 Reinventing the Wheel? |
|
|
105 | (1) |
|
3.4.2 The Illusion of Progress? |
|
|
106 | (1) |
|
3.4.3 A Bibliometric Snapshot |
|
|
107 | (4) |
|
4 Combining Label Outputs |
|
|
111 | (32) |
|
4.1 Types of Classifier Outputs |
|
|
111 | (1) |
|
4.2 A Probabilistic Framework for Combining Label Outputs |
|
|
112 | (1) |
|
|
113 | (12) |
|
4.3.1 "Democracy" in Classifier Combination |
|
|
113 | (1) |
|
4.3.2 Accuracy of the Majority Vote |
|
|
114 | (3) |
|
4.3.3 Limits on the Majority Vote Accuracy: An Example |
|
|
117 | (2) |
|
4.3.4 Patterns of Success and Failure |
|
|
119 | (5) |
|
4.3.5 Optimality of the Majority Vote Combiner |
|
|
124 | (1) |
|
4.4 Weighted Majority Vote |
|
|
125 | (3) |
|
|
126 | (1) |
|
4.4.2 Optimality of the Weighted Majority Vote Combiner |
|
|
127 | (1) |
|
|
128 | (4) |
|
4.5.1 Optimality of the Naive Bayes Combiner |
|
|
128 | (2) |
|
4.5.2 Implementation of the NB Combiner |
|
|
130 | (2) |
|
|
132 | (3) |
|
4.7 Comparison of Combination Methods for Label Outputs |
|
|
135 | (8) |
|
|
137 | (1) |
|
4.A.1 Matan's Proof for the Limits on the Majority Vote Accuracy |
|
|
137 | (2) |
|
4.A.2 Selected MATLAB Code |
|
|
139 | (4) |
|
5 Combining Continuous-Valued Outputs |
|
|
143 | (43) |
|
|
143 | (1) |
|
5.2 How Do We Get Probability Outputs? |
|
|
144 | (6) |
|
5.2.1 Probabilities Based on Discriminant Scores |
|
|
144 | (3) |
|
5.2.2 Probabilities Based on Counts: Laplace Estimator |
|
|
147 | (3) |
|
5.3 Nontrainable (Fixed) Combination Rules |
|
|
150 | (16) |
|
5.3.1 A Generic Formulation |
|
|
150 | (2) |
|
5.3.2 Equivalence of Simple Combination Rules |
|
|
152 | (1) |
|
5.3.3 Generalized Mean Combiner |
|
|
153 | (3) |
|
5.3.4 A Theoretical Comparison of Simple Combiners |
|
|
156 | (4) |
|
5.3.5 Where Do They Come From? |
|
|
160 | (6) |
|
5.4 The Weighted Average (Linear Combiner) |
|
|
166 | (6) |
|
|
166 | (1) |
|
5.4.2 Added Error for the Weighted Mean Combination |
|
|
167 | (1) |
|
|
168 | (4) |
|
5.5 A Classifier as a Combiner |
|
|
172 | (3) |
|
5.5.1 The Supra Bayesian Approach |
|
|
172 | (1) |
|
|
173 | (2) |
|
5.5.3 A Linear Classifier |
|
|
175 | (1) |
|
5.6 An Example of Nine Combiners for Continuous-Valued Outputs |
|
|
175 | (1) |
|
5.7 To Train or Not to Train? |
|
|
176 | (10) |
|
|
178 | (1) |
|
5.A.1 Theoretical Classification Error for the Simple Combiners |
|
|
178 | (1) |
|
5.A.1.1 Set-up and Assumptions |
|
|
178 | (2) |
|
|
180 | (1) |
|
5.A.1.3 Minimum and Maximum |
|
|
180 | (1) |
|
|
181 | (1) |
|
5.A.1.5 Median and Majority Vote |
|
|
182 | (1) |
|
|
183 | (1) |
|
5.A.2 Selected MATLAB Code |
|
|
183 | (3) |
|
|
186 | (44) |
|
|
186 | (4) |
|
6.1.1 The Origins: Bagging Predictors |
|
|
186 | (1) |
|
6.1.2 Why Does Bagging Work? |
|
|
187 | (2) |
|
6.1.3 Out-of-bag Estimates |
|
|
189 | (1) |
|
6.1.4 Variants of Bagging |
|
|
190 | (1) |
|
|
190 | (2) |
|
|
192 | (11) |
|
6.3.1 The AdaBoost Algorithm |
|
|
192 | (2) |
|
6.3.2 The arc-x4 Algorithm |
|
|
194 | (1) |
|
6.3.3 Why Does AdaBoost Work? |
|
|
195 | (4) |
|
6.3.4 Variants of Boosting |
|
|
199 | (1) |
|
6.3.5 A Famous Application: AdaBoost for Face Detection |
|
|
199 | (4) |
|
6.4 Random Subspace Ensembles |
|
|
203 | (1) |
|
|
204 | (4) |
|
|
208 | (3) |
|
6.7 Error Correcting Output Codes (ECOC) |
|
|
211 | (19) |
|
|
212 | (2) |
|
|
214 | (2) |
|
6.7.3 Ensembles of Nested Dichotomies |
|
|
216 | (2) |
|
|
218 | (1) |
|
|
218 | (2) |
|
|
220 | (3) |
|
|
223 | (2) |
|
|
225 | (3) |
|
6.A.5 Random Linear Oracle |
|
|
228 | (1) |
|
|
229 | (1) |
|
|
230 | (17) |
|
|
230 | (1) |
|
7.2 Why Classifier Selection Works |
|
|
231 | (2) |
|
7.3 Estimating Local Competence Dynamically |
|
|
233 | (6) |
|
7.3.1 Decision-Independent Estimates |
|
|
233 | (5) |
|
7.3.2 Decision-Dependent Estimates |
|
|
238 | (1) |
|
7.4 Pre-Estimation of the Competence Regions |
|
|
239 | (3) |
|
7.4.1 Bespoke Classifiers |
|
|
240 | (1) |
|
7.4.2 Clustering and Selection |
|
|
241 | (1) |
|
7.5 Simultaneous Training of Regions and Classifiers |
|
|
242 | (2) |
|
|
244 | (3) |
|
Appendix: Selected MATLAB Code |
|
|
244 | (1) |
|
|
244 | (1) |
|
7.A.2 Evolutionary Algorithm for a Selection Ensemble for the Banana Data |
|
|
245 | (2) |
|
8 Diversity in Classifier Ensembles |
|
|
247 | (43) |
|
|
247 | (3) |
|
8.1.1 Diversity for a Point-Value Estimate |
|
|
248 | (1) |
|
8.1.2 Diversity in Software Engineering |
|
|
248 | (1) |
|
8.1.3 Statistical Measures of Relationship |
|
|
249 | (1) |
|
8.2 Measuring Diversity in Classifier Ensembles |
|
|
250 | (6) |
|
|
250 | (1) |
|
8.2.2 Nonpairwise Measures |
|
|
251 | (5) |
|
8.3 Relationship Between Diversity and Accuracy |
|
|
256 | (14) |
|
|
256 | (2) |
|
8.3.2 Relationship Patterns |
|
|
258 | (4) |
|
8.3.3 A Caveat: Independent Outputs ≠ Independent Errors |
|
|
262 | (3) |
|
8.3.4 Independence Is Not the Best Scenario |
|
|
265 | (2) |
|
8.3.5 Diversity and Ensemble Margins |
|
|
267 | (3) |
|
|
270 | (9) |
|
8.4.1 Diversity for Finding Bounds and Theoretical Relationships |
|
|
270 | (1) |
|
8.4.2 Kappa-error Diagrams and Ensemble Maps |
|
|
271 | (4) |
|
8.4.3 Overproduce and Select |
|
|
275 | (4) |
|
8.5 Conclusions: Diversity of Diversity |
|
|
279 | (11) |
|
|
280 | (1) |
|
8.A.1 Derivation of Diversity Measures for Oracle Outputs |
|
|
280 | (1) |
|
|
280 | (1) |
|
8.A.1.2 Interrater Agreement κ |
|
|
281 | (1) |
|
8.A.2 Diversity Measure Equivalence |
|
|
282 | (2) |
|
8.A.3 Independent Outputs ≠ Independent Errors |
|
|
284 | (2) |
|
8.A.4 A Bound on the Kappa-Error Diagram |
|
|
286 | (1) |
|
8.A.5 Calculation of the Pareto Frontier |
|
|
287 | (3) |
|
9 Ensemble Feature Selection |
|
|
290 | (36) |
|
|
290 | (5) |
|
9.1.1 Right and Wrong Protocols |
|
|
290 | (4) |
|
9.1.2 Ensemble Feature Selection Approaches |
|
|
294 | (1) |
|
|
294 | (1) |
|
9.2 Ranking by Decision Tree Ensembles |
|
|
295 | (4) |
|
9.2.1 Simple Count and Split Criterion |
|
|
295 | (2) |
|
9.2.2 Permuted Features or the "Noised-up" Method |
|
|
297 | (2) |
|
|
299 | (6) |
|
|
299 | (1) |
|
9.3.2 Ranking Methods (Criteria) |
|
|
300 | (5) |
|
9.4 Random Feature Selection for the Ensemble |
|
|
305 | (10) |
|
9.4.1 Random Subspace Revisited |
|
|
305 | (1) |
|
9.4.2 Usability, Coverage, and Feature Diversity |
|
|
306 | (6) |
|
|
312 | (3) |
|
|
315 | (2) |
|
9.5.1 The "Favorite Class" Model |
|
|
315 | (1) |
|
9.5.2 The Iterative Model |
|
|
315 | (1) |
|
9.5.3 The Incremental Model |
|
|
316 | (1) |
|
|
317 | (9) |
|
9.6.1 Consistency Between a Pair of Subsets |
|
|
317 | (2) |
|
9.6.2 A Stability Index for K Sequences |
|
|
319 | (1) |
|
9.6.3 An Example of Applying the Stability Index |
|
|
320 | (2) |
|
|
322 | (1) |
|
9.A.1 MATLAB Code for the Numerical Example of Ensemble Ranking |
|
|
322 | (1) |
|
|
322 | (2) |
|
9.A.3 MATLAB Code for the Stability Index |
|
|
324 | (2) |
|
|
326 | (1) |
References |
|
327 | (26) |
Index |
|
353 | |