Preface to the Second Edition |
|
xi | |
Preface to the First Edition |
|
xiii | |
Acknowledgments |
|
xix | |
Examples |
|
xxi | |
|
|
1 | (38) |
|
|
1 | (1) |
|
|
2 | (26) |
|
|
3 | (1) |
|
|
3 | (2) |
|
1.1.1.2 Primates and Human Origin |
|
|
5 | (1) |
|
1.1.1.3 Gene Presence-Absence Profiles |
|
|
6 | (2) |
|
1.1.1.4 Knowledge Structure: Algebraic Functions |
|
|
8 | (2) |
|
|
10 | (1) |
|
1.1.2.1 Describing Iris Genera |
|
|
10 | (2) |
|
|
12 | (1) |
|
|
12 | (1) |
|
1.1.3.1 Digits and Patterns of Confusion between Them |
|
|
13 | (3) |
|
|
16 | (2) |
|
|
18 | (4) |
|
1.1.5 Visualization of Data Structure |
|
|
22 | (1) |
|
1.1.5.1 One-Dimensional Data |
|
|
22 | (1) |
|
1.1.5.2 One-Dimensional Data within Groups |
|
|
23 | (1) |
|
1.1.5.3 Two-Dimensional Display |
|
|
24 | (1) |
|
|
25 | (2) |
|
|
27 | (1) |
|
1.1.5.6 Visualization Using an Inherent Topology |
|
|
27 | (1) |
|
|
28 | (11) |
|
1.2.1 Definition: Data and Cluster Structure |
|
|
28 | (1) |
|
|
28 | (1) |
|
1.2.1.2 Cluster Structure |
|
|
29 | (1) |
|
1.2.2 Criteria for Obtaining a Good Cluster Structure |
|
|
30 | (1) |
|
1.2.3 Three Types of Cluster Description |
|
|
31 | (1) |
|
1.2.4 Stages of a Clustering Application |
|
|
32 | (1) |
|
1.2.5 Clustering and Other Disciplines |
|
|
33 | (1) |
|
1.2.6 Different Perspectives of Clustering |
|
|
34 | (1) |
|
1.2.6.1 Classical Statistics Perspective |
|
|
34 | (1) |
|
1.2.6.2 Machine-Learning Perspective |
|
|
35 | (1) |
|
1.2.6.3 Data-Mining Perspective |
|
|
35 | (1) |
|
1.2.6.4 Classification and Knowledge-Discovery Perspective |
|
|
36 | (3) |
|
|
39 | (48) |
|
|
39 | (2) |
|
2.1 Feature Characteristics |
|
|
41 | (7) |
|
2.1.1 Feature Scale Types |
|
|
41 | (2) |
|
|
43 | (4) |
|
|
47 | (1) |
|
|
48 | (14) |
|
2.2.1 Two Quantitative Variables |
|
|
49 | (2) |
|
2.2.2 Nominal and Quantitative Variables |
|
|
51 | (1) |
|
2.2.3 Two Nominal Variables Cross Classified |
|
|
52 | (6) |
|
2.2.4 Relation between the Correlation and Contingency Measures |
|
|
58 | (1) |
|
2.2.5 Meaning of the Correlation |
|
|
59 | (3) |
|
2.3 Feature Space and Data Scatter |
|
|
62 | (4) |
|
|
62 | (1) |
|
2.3.2 Feature Space: Distance and Inner Product |
|
|
63 | (3) |
|
|
66 | (1) |
|
2.4 Pre-Processing and Standardizing Mixed Data |
|
|
66 | (7) |
|
|
73 | (14) |
|
|
73 | (1) |
|
2.5.2 Contingency and Redistribution Tables |
|
|
74 | (3) |
|
2.5.3 Affinity and Kernel Data |
|
|
77 | (2) |
|
|
79 | (1) |
|
2.5.5 Similarity Data Pre-Processing |
|
|
80 | (1) |
|
2.5.5.1 Removal of Low Similarities: Thresholding |
|
|
81 | (1) |
|
2.5.5.2 Subtraction of Background Noise |
|
|
82 | (1) |
|
2.5.5.3 Laplace Transformation |
|
|
83 | (4) |
|
3 K-Means Clustering and Related Approaches |
|
|
87 | (46) |
|
|
87 | (2) |
|
|
89 | (9) |
|
|
89 | (4) |
|
3.1.2 Square Error Criterion |
|
|
93 | (3) |
|
3.1.3 Incremental Versions of K-Means |
|
|
96 | (2) |
|
3.2 Choice of K and Initialization of K-Means |
|
|
98 | (12) |
|
3.2.1 Conventional Approaches to Initial Setting |
|
|
98 | (1) |
|
3.2.1.1 Random Selection of Centroids |
|
|
99 | (1) |
|
3.2.1.2 Expert-Driven Selection of Centroids |
|
|
99 | (1) |
|
3.2.2 MaxMin for Producing Deviate Centroids |
|
|
100 | (2) |
|
3.2.3 Anomalous Centroids with Anomalous Pattern |
|
|
102 | (2) |
|
3.2.4 Anomalous Centroids with Method Build |
|
|
104 | (2) |
|
3.2.5 Choosing the Number of Clusters at the Post-Processing Stage |
|
|
106 | (1) |
|
3.2.5.1 Variance-Based Approach |
|
|
106 | (1) |
|
3.2.5.2 Within-Cluster Cohesion versus Between-Cluster Separation |
|
|
107 | (1) |
|
3.2.5.3 Combining Multiple Clusterings |
|
|
108 | (1) |
|
3.2.5.4 Resampling Methods |
|
|
108 | (1) |
|
3.2.5.5 Data Structure or Granularity Level? |
|
|
109 | (1) |
|
3.3 Intelligent K-Means: Iterated Anomalous Pattern |
|
|
110 | (4) |
|
3.4 Minkowski Metric K-Means and Feature Weighting |
|
|
114 | (6) |
|
3.4.1 Minkowski Distance and Minkowski Centers |
|
|
114 | (2) |
|
3.4.2 Feature Weighting at Minkowski Metric K-Means |
|
|
116 | (4) |
|
3.5 Extensions of K-Means Clustering |
|
|
120 | (11) |
|
3.5.1 Clustering Criteria and Implementation |
|
|
120 | (2) |
|
3.5.2 Partitioning around Medoids |
|
|
122 | (1) |
|
|
123 | (2) |
|
3.5.4 Regression-Wise Clustering |
|
|
125 | (1) |
|
3.5.5 Mixture of Distributions and EM Algorithm |
|
|
126 | (3) |
|
3.5.6 Kohonen Self-Organizing Maps |
|
|
129 | (2) |
|
|
131 | (2) |
|
4 Least-Squares Hierarchical Clustering |
|
|
133 | (28) |
|
|
133 | (1) |
|
4.1 Hierarchical Cluster Structures |
|
|
134 | (3) |
|
4.2 Agglomeration: Ward Algorithm |
|
|
137 | (4) |
|
4.3 Least-Squares Divisive Clustering |
|
|
141 | (11) |
|
4.3.1 Ward Criterion and Distance |
|
|
141 | (2) |
|
4.3.2 Bisecting K-Means: 2-Splitting |
|
|
143 | (1) |
|
4.3.3 Splitting by Separation |
|
|
144 | (3) |
|
4.3.4 Principal Direction Partitioning |
|
|
147 | (2) |
|
4.3.5 Beating the Noise by Randomness |
|
|
149 | (2) |
|
4.3.6 Gower's Controversy |
|
|
151 | (1) |
|
4.4 Conceptual Clustering |
|
|
152 | (4) |
|
4.5 Extensions of Ward Clustering |
|
|
156 | (2) |
|
4.5.1 Agglomerative Clustering with Dissimilarity Data |
|
|
156 | (1) |
|
4.5.2 Hierarchical Clustering for Contingency Data |
|
|
156 | (2) |
|
|
158 | (3) |
|
5 Similarity Clustering: Uniform, Modularity, Additive, Spectral, Consensus, and Single Linkage |
|
|
161 | (40) |
|
|
161 | (3) |
|
5.1 Summary Similarity Clustering |
|
|
164 | (10) |
|
5.1.1 Summary Similarity Clusters at Genuine Similarity Data |
|
|
165 | (2) |
|
5.1.2 Summary Similarity Criterion at Flat Network Data |
|
|
167 | (5) |
|
5.1.3 Summary Similarity Clustering at Affinity Data |
|
|
172 | (2) |
|
5.2 Normalized Cut and Spectral Clustering |
|
|
174 | (5) |
|
|
179 | (8) |
|
5.3.1 Additive Cluster Model |
|
|
179 | (1) |
|
5.3.2 One-by-One Additive Clustering Strategy |
|
|
180 | (7) |
|
|
187 | (8) |
|
5.4.1 Ensemble and Combined Consensus Concepts |
|
|
187 | (7) |
|
5.4.2 Experimental Verification of Least-Squares Consensus Methods |
|
|
194 | (1) |
|
5.5 Single Linkage, Minimum Spanning Tree, and Connected Components |
|
|
195 | (4) |
|
|
199 | (2) |
|
6 Validation and Interpretation |
|
|
201 | (60) |
|
|
201 | (2) |
|
6.1 General: Internal and External Validity |
|
|
203 | (1) |
|
6.2 Testing Internal Validity |
|
|
204 | (9) |
|
6.2.1 Scoring Correspondence between Clusters and Data |
|
|
204 | (1) |
|
6.2.1.1 Measures of Cluster Cohesion versus Isolation |
|
|
204 | (1) |
|
6.2.1.2 Indexes Derived Using the Data Recovery Approach |
|
|
205 | (1) |
|
6.2.1.3 Indexes Derived from Probabilistic Clustering Models |
|
|
206 | (1) |
|
6.2.2 Resampling Data for Validation |
|
|
206 | (4) |
|
6.2.3 Cross Validation of iK-Means Results |
|
|
210 | (3) |
|
6.3 Interpretation Aids in the Data Recovery Perspective |
|
|
213 | (16) |
|
6.3.1 Conventional Interpretation Aids |
|
|
213 | (1) |
|
6.3.2 Contribution and Relative Contribution Tables |
|
|
214 | (7) |
|
6.3.3 Cluster Representatives |
|
|
221 | (3) |
|
6.3.4 Measures of Association from ScaD Tables |
|
|
224 | (1) |
|
6.3.4.1 Quantitative Feature Case: Correlation Ratio |
|
|
224 | (1) |
|
6.3.4.2 Categorical Feature Case: Chi-Squared and Other Contingency Coefficients |
|
|
224 | (2) |
|
6.3.5 Interpretation Aids for Cluster Up-Hierarchies |
|
|
226 | (3) |
|
6.4 Conceptual Description of Clusters |
|
|
229 | (12) |
|
6.4.1 False Positives and Negatives |
|
|
229 | (1) |
|
6.4.2 Describing a Cluster with Production Rules |
|
|
230 | (1) |
|
6.4.3 Comprehensive Conjunctive Description of a Cluster |
|
|
231 | (3) |
|
6.4.4 Describing a Partition with Classification Trees |
|
|
234 | (7) |
|
6.5 Mapping Clusters to Knowledge |
|
|
241 | (19) |
|
6.5.1 Mapping a Cluster to Category |
|
|
241 | (2) |
|
6.5.2 Mapping between Partitions |
|
|
243 | (4) |
|
6.5.2.1 Match-Based Similarity versus Quetelet Association |
|
|
247 | (1) |
|
6.5.2.2 Average Distance in a Set of Partitions |
|
|
248 | (1) |
|
|
249 | (1) |
|
|
250 | (4) |
|
|
254 | (6) |
|
|
260 | (1) |
|
7 Least-Squares Data Recovery Clustering Models |
|
|
261 | (68) |
|
|
261 | (2) |
|
7.1 Statistics Modeling as Data Recovery |
|
|
263 | (12) |
|
7.1.1 Data Recovery Equation |
|
|
263 | (1) |
|
|
264 | (1) |
|
|
265 | (1) |
|
7.1.4 Principal Component Analysis |
|
|
266 | (5) |
|
7.1.5 Correspondence Factor Analysis |
|
|
271 | (3) |
|
7.1.6 Data Summarization versus Learning in Data Recovery |
|
|
274 | (1) |
|
7.2 K-Means as a Data Recovery Method |
|
|
275 | (13) |
|
7.2.1 Clustering Equation and Data Scatter Decomposition |
|
|
275 | (1) |
|
7.2.2 Contributions of Clusters, Features and Entities |
|
|
276 | (1) |
|
7.2.3 Correlation Ratio as Contribution |
|
|
277 | (1) |
|
7.2.4 Partition Contingency Coefficients |
|
|
278 | (1) |
|
7.2.5 Equivalent Reformulations of the Least-Squares Clustering Criterion |
|
|
279 | (3) |
|
7.2.6 Principal Cluster Analysis: Anomalous Pattern Clustering Method |
|
|
282 | (2) |
|
7.2.7 Weighting Variables in K-Means Model and Minkowski Metric |
|
|
284 | (4) |
|
7.3 Data Recovery Models for Hierarchical Clustering |
|
|
288 | (15) |
|
7.3.1 Data Recovery Models with Cluster Hierarchies |
|
|
288 | (1) |
|
7.3.2 Covariances, Variances and Data Scatter Decomposed |
|
|
289 | (2) |
|
7.3.3 Split Base Vectors and Matrix Equations for the Data Recovery Model |
|
|
291 | (1) |
|
7.3.4 Divisive Partitioning: Four Splitting Algorithms |
|
|
292 | (1) |
|
7.3.4.1 Bisecting K-Means, or 2-Splitting |
|
|
293 | (1) |
|
7.3.4.2 Principal Direction Division |
|
|
294 | (2) |
|
7.3.4.3 Conceptual Clustering |
|
|
296 | (2) |
|
7.3.4.4 Separating a Cluster |
|
|
298 | (1) |
|
7.3.5 Organizing an Up-Hierarchy: To Split or Not to Split |
|
|
299 | (1) |
|
7.3.6 A Straightforward Proof of the Equivalence between Bisecting K-Means and Ward Criteria |
|
|
300 | (1) |
|
7.3.7 Anomalous Pattern versus Splitting |
|
|
301 | (2) |
|
7.4 Data Recovery Models for Similarity Clustering |
|
|
303 | (13) |
|
7.4.1 Cut, Normalized Cut, and Spectral Clustering |
|
|
303 | (4) |
|
7.4.2 Similarity Clustering Induced by K-Means and Ward Criteria |
|
|
307 | (1) |
|
7.4.2.1 All Clusters at Once |
|
|
308 | (1) |
|
7.4.2.2 Hierarchical Clustering |
|
|
309 | (2) |
|
7.4.2.3 One-by-One Clustering |
|
|
311 | (2) |
|
7.4.3 Additive Clustering |
|
|
313 | (2) |
|
7.4.4 Agglomeration and Aggregation of Contingency Data |
|
|
315 | (1) |
|
7.5 Consensus and Ensemble Clustering |
|
|
316 | (10) |
|
7.5.1 Ensemble Clustering |
|
|
316 | (5) |
|
7.5.2 Combined Consensus Clustering |
|
|
321 | (2) |
|
7.5.3 Concordant Partition |
|
|
323 | (1) |
|
7.5.4 Muchnik's Consensus Partition Test |
|
|
324 | (2) |
|
7.5.5 Algorithms for Consensus Partition |
|
|
326 | (1) |
|
|
326 | (3) |
References |
|
329 | (12) |
Index |
|
341 | |