|
|
xv | |
|
|
xix | |
Preface |
|
xxi | |
|
I Data Clustering and C++ Preliminaries |
|
|
1 | (100) |
|
1 Introduction to Data Clustering |
|
|
3 | (26) |
|
|
3 | (4) |
|
1.1.1 Clustering versus Classification |
|
|
4 | (1) |
|
1.1.2 Definition of Clusters |
|
|
5 | (2) |
|
|
7 | (1) |
|
1.3 Dissimilarity and Similarity Measures |
|
|
8 | (3) |
|
1.3.1 Measures for Continuous Data |
|
|
9 | (1) |
|
1.3.2 Measures for Discrete Data |
|
|
10 | (1) |
|
1.3.3 Measures for Mixed-Type Data |
|
|
10 | (1) |
|
1.4 Hierarchical Clustering Algorithms |
|
|
11 | (4) |
|
1.4.1 Agglomerative Hierarchical Algorithms |
|
|
12 | (2) |
|
1.4.2 Divisive Hierarchical Algorithms |
|
|
14 | (1) |
|
1.4.3 Other Hierarchical Algorithms |
|
|
14 | (1) |
|
|
15 | (1) |
|
1.5 Partitional Clustering Algorithms |
|
|
15 | (8) |
|
1.5.1 Center-Based Clustering Algorithms |
|
|
17 | (1) |
|
1.5.2 Search-Based Clustering Algorithms |
|
|
18 | (1) |
|
1.5.3 Graph-Based Clustering Algorithms |
|
|
19 | (1) |
|
1.5.4 Grid-Based Clustering Algorithms |
|
|
20 | (1) |
|
1.5.5 Density-Based Clustering Algorithms |
|
|
20 | (1) |
|
1.5.6 Model-Based Clustering Algorithms |
|
|
21 | (1) |
|
1.5.7 Subspace Clustering Algorithms |
|
|
22 | (1) |
|
1.5.8 Neural Network-Based Clustering Algorithms |
|
|
22 | (1) |
|
1.5.9 Fuzzy Clustering Algorithms |
|
|
23 | (1) |
|
|
23 | (1) |
|
1.7 Clustering Applications |
|
|
24 | (1) |
|
1.8 Literature of Clustering Algorithms |
|
|
25 | (3) |
|
1.8.1 Books on Data Clustering |
|
|
25 | (1) |
|
1.8.2 Surveys on Data Clustering |
|
|
26 | (2) |
|
|
28 | (1) |
|
2 The Unified Modeling Language |
|
|
29 | (12) |
|
|
29 | (3) |
|
|
32 | (4) |
|
|
36 | (2) |
|
|
38 | (1) |
|
|
39 | (1) |
|
|
40 | (1) |
|
3 Object-Oriented Programming and C++ |
|
|
41 | (16) |
|
3.1 Object-Oriented Programming |
|
|
41 | (1) |
|
3.2 The C++ Programming Language |
|
|
42 | (3) |
|
|
45 | (3) |
|
|
48 | (2) |
|
|
50 | (4) |
|
3.5.1 Dynamic Polymorphism |
|
|
51 | (1) |
|
3.5.2 Static Polymorphism |
|
|
52 | (2) |
|
|
54 | (2) |
|
|
56 | (1) |
|
|
57 | (20) |
|
|
58 | (3) |
|
|
61 | (3) |
|
|
64 | (3) |
|
|
67 | (2) |
|
|
69 | (3) |
|
|
72 | (3) |
|
|
75 | (2) |
|
5 C++ Libraries and Tools |
|
|
77 | (24) |
|
5.1 The Standard Template Library |
|
|
77 | (9) |
|
|
77 | (5) |
|
|
82 | (2) |
|
|
84 | (2) |
|
|
86 | (9) |
|
|
87 | (2) |
|
|
89 | (1) |
|
|
90 | (2) |
|
|
92 | (1) |
|
5.2.5 Unit Test Framework |
|
|
93 | (2) |
|
|
95 | (3) |
|
|
96 | (1) |
|
|
97 | (1) |
|
|
97 | (1) |
|
5.3.4 Using GNU Autotools |
|
|
98 | (1) |
|
|
98 | (1) |
|
|
99 | (2) |
|
II A C++ Data Clustering Framework |
|
|
101 | (82) |
|
|
103 | (12) |
|
6.1 Directory Structure and Filenames |
|
|
103 | (2) |
|
|
105 | (4) |
|
|
105 | (1) |
|
|
106 | (3) |
|
6.3 Macros and typedef Declarations |
|
|
109 | (2) |
|
|
111 | (1) |
|
|
112 | (1) |
|
6.6 Compilation and Installation |
|
|
113 | (1) |
|
|
114 | (1) |
|
|
115 | (16) |
|
|
115 | (7) |
|
7.1.1 The Attribute Value Class |
|
|
115 | (2) |
|
7.1.2 The Base Attribute Information Class |
|
|
117 | (2) |
|
7.1.3 The Continuous Attribute Information Class |
|
|
119 | (1) |
|
7.1.4 The Discrete Attribute Information Class |
|
|
120 | (2) |
|
|
122 | (3) |
|
|
122 | (2) |
|
|
124 | (1) |
|
|
125 | (2) |
|
|
127 | (3) |
|
|
130 | (1) |
|
|
131 | (8) |
|
|
131 | (2) |
|
8.2 Partitional Clustering |
|
|
133 | (2) |
|
8.3 Hierarchical Clustering |
|
|
135 | (3) |
|
|
138 | (1) |
|
|
139 | (10) |
|
9.1 The Distance Base Class |
|
|
139 | (1) |
|
|
140 | (1) |
|
|
141 | (1) |
|
9.4 Simple Matching Distance |
|
|
142 | (1) |
|
|
143 | (1) |
|
|
144 | (3) |
|
|
147 | (2) |
|
|
149 | (12) |
|
|
149 | (1) |
|
|
150 | (1) |
|
|
151 | (3) |
|
10.4 A Dummy Clustering Algorithm |
|
|
154 | (4) |
|
|
158 | (3) |
|
|
161 | (22) |
|
|
161 | (3) |
|
11.2 The Double-Key Map Class |
|
|
164 | (3) |
|
11.3 The Dataset Adapters |
|
|
167 | (8) |
|
11.3.1 A CSV Dataset Reader |
|
|
167 | (3) |
|
11.3.2 A Dataset Generator |
|
|
170 | (3) |
|
11.3.3 A Dataset Normalizer |
|
|
173 | (2) |
|
|
175 | (2) |
|
11.4.1 The Join Value Visitor |
|
|
175 | (1) |
|
11.4.2 The Partition Creation Visitor |
|
|
176 | (1) |
|
11.5 The Dendrogram Class |
|
|
177 | (2) |
|
11.6 The Dendrogram Visitor |
|
|
179 | (1) |
|
|
180 | (3) |
|
III Data Clustering Algorithms |
|
|
183 | (140) |
|
12 Agglomerative Hierarchical Algorithms |
|
|
185 | (32) |
|
12.1 Description of the Algorithm |
|
|
185 | (2) |
|
|
187 | (10) |
|
12.2.1 The Single Linkage Algorithm |
|
|
192 | (1) |
|
12.2.2 The Complete Linkage Algorithm |
|
|
192 | (1) |
|
12.2.3 The Group Average Algorithm |
|
|
193 | (1) |
|
12.2.4 The Weighted Group Average Algorithm |
|
|
194 | (1) |
|
12.2.5 The Centroid Algorithm |
|
|
194 | (1) |
|
12.2.6 The Median Algorithm |
|
|
195 | (1) |
|
|
196 | (1) |
|
|
197 | (17) |
|
12.3.1 The Single Linkage Algorithm |
|
|
198 | (2) |
|
12.3.2 The Complete Linkage Algorithm |
|
|
200 | (2) |
|
12.3.3 The Group Average Algorithm |
|
|
202 | (2) |
|
12.3.4 The Weighted Group Average Algorithm |
|
|
204 | (3) |
|
12.3.5 The Centroid Algorithm |
|
|
207 | (3) |
|
12.3.6 The Median Algorithm |
|
|
210 | (2) |
|
|
212 | (2) |
|
|
214 | (3) |
|
|
217 | (12) |
|
13.1 Description of the Algorithm |
|
|
217 | (1) |
|
|
218 | (5) |
|
|
223 | (4) |
|
|
227 | (2) |
|
|
229 | (12) |
|
14.1 Description of the Algorithm |
|
|
229 | (1) |
|
|
230 | (5) |
|
|
235 | (5) |
|
|
240 | (1) |
|
|
241 | (14) |
|
15.1 Description of the Algorithm |
|
|
241 | (1) |
|
|
242 | (4) |
|
|
246 | (7) |
|
|
253 | (2) |
|
16 The k-prototypes Algorithm |
|
|
255 | (10) |
|
16.1 Description of the Algorithm |
|
|
255 | (1) |
|
|
256 | (2) |
|
|
258 | (5) |
|
|
263 | (2) |
|
17 The Genetic k-modes Algorithm |
|
|
265 | (14) |
|
17.1 Description of the Algorithm |
|
|
265 | (2) |
|
|
267 | (7) |
|
|
274 | (3) |
|
|
277 | (2) |
|
|
279 | (12) |
|
18.1 Description of the Algorithm |
|
|
279 | (2) |
|
|
281 | (3) |
|
|
284 | (6) |
|
|
290 | (1) |
|
19 The Gaussian Mixture Algorithm |
|
|
291 | (16) |
|
19.1 Description of the Algorithm |
|
|
291 | (2) |
|
|
293 | (7) |
|
|
300 | (6) |
|
|
306 | (1) |
|
20 A Parallel k-means Algorithm |
|
|
307 | (16) |
|
20.1 Message Passing Interface |
|
|
307 | (3) |
|
20.2 Description of the Algorithm |
|
|
310 | (1) |
|
|
311 | (5) |
|
|
316 | (4) |
|
|
320 | (3) |
|
|
323 | (2) |
|
|
325 | (136) |
|
B.1 Files in Folder ClusLib |
|
|
325 | (3) |
|
B.1.1 Configuration File configure.ac |
|
|
325 | (1) |
|
B.1.2 m4 Macro File ac include. m4 |
|
|
326 | (1) |
|
|
327 | (1) |
|
|
328 | (3) |
|
|
328 | (1) |
|
B.2.2 Macros and typedef Declarations |
|
|
328 | (1) |
|
|
329 | (2) |
|
B.3 Files in Folder cl/algorithms |
|
|
331 | (37) |
|
|
331 | (1) |
|
|
332 | (2) |
|
|
334 | (1) |
|
|
334 | (1) |
|
|
335 | (4) |
|
|
339 | (1) |
|
|
339 | (4) |
|
|
343 | (4) |
|
|
347 | (6) |
|
|
353 | (5) |
|
|
358 | (3) |
|
|
361 | (1) |
|
|
362 | (2) |
|
|
364 | (1) |
|
|
365 | (1) |
|
|
366 | (1) |
|
|
367 | (1) |
|
B.4 Files in Folder cl/clusters |
|
|
368 | (8) |
|
|
368 | (1) |
|
B.4.2 Class CenterCluster |
|
|
368 | (1) |
|
|
369 | (1) |
|
|
370 | (2) |
|
|
372 | (3) |
|
B.4.6 Class SubspaceCluster |
|
|
375 | (1) |
|
B.5 Files in Folder cl/datasets |
|
|
376 | (16) |
|
|
376 | (1) |
|
|
376 | (1) |
|
|
377 | (2) |
|
|
379 | (2) |
|
|
381 | (3) |
|
|
384 | (2) |
|
|
386 | (2) |
|
|
388 | (4) |
|
B.6 Files in Folder cl/distances |
|
|
392 | (6) |
|
|
392 | (1) |
|
|
392 | (1) |
|
B.6.3 Class EuclideanDistance |
|
|
393 | (1) |
|
B.6.4 Class MahalanobisDistance |
|
|
394 | (1) |
|
B.6.5 Class MinkowskiDistance |
|
|
395 | (1) |
|
B.6.6 Class MixedDistance |
|
|
396 | (1) |
|
B.6.7 Class SimpleMatchingDistance |
|
|
397 | (1) |
|
B.7 Files in Folder cl/patterns |
|
|
398 | (10) |
|
|
398 | (1) |
|
B.7.2 Class DendrogramVisitor |
|
|
399 | (2) |
|
|
401 | (2) |
|
|
403 | (1) |
|
|
404 | (1) |
|
|
405 | (1) |
|
B.7.7 Class JoinValueVisitor |
|
|
405 | (2) |
|
|
407 | (1) |
|
B.8 Files in Folder cl/utilities |
|
|
408 | (18) |
|
|
408 | (1) |
|
|
409 | (2) |
|
|
411 | (1) |
|
B.8.4 Class DatasetGenerator |
|
|
411 | (2) |
|
B.8.5 Class DatasetNormalizer |
|
|
413 | (2) |
|
B.8.6 Class DatasetReader |
|
|
415 | (3) |
|
|
418 | (3) |
|
|
421 | (2) |
|
|
423 | (2) |
|
|
425 | (1) |
|
B.9 Files in Folder examples |
|
|
426 | (24) |
|
|
426 | (1) |
|
B.9.2 Agglomerative Hierarchical Algorithms |
|
|
426 | (3) |
|
B.9.3 A Divisive Hierarchical Algorithm |
|
|
429 | (1) |
|
B.9.4 The k-means Algorithm |
|
|
430 | (3) |
|
B.9.5 The c-means Algorithm |
|
|
433 | (2) |
|
B.9.6 The k-prototypes Algorithm |
|
|
435 | (2) |
|
B.9.7 The Genetic k-modes Algorithm |
|
|
437 | (2) |
|
|
439 | (2) |
|
B.9.9 The Ganssian Mixture Clustering Algorithm |
|
|
441 | (3) |
|
B.9.10 A Parallel k-means Algorithm |
|
|
444 | (6) |
|
B.10 Files in Folder test-suite |
|
|
450 | (11) |
|
|
450 | (1) |
|
B.10.2 The Master Test Suite |
|
|
451 | (1) |
|
|
451 | (2) |
|
|
453 | (1) |
|
|
454 | (2) |
|
|
456 | (2) |
|
|
458 | (1) |
|
|
459 | (2) |
|
|
461 | (8) |
|
C.1 An Introduction to Makefiles |
|
|
461 | (2) |
|
|
461 | (1) |
|
|
462 | (1) |
|
|
463 | (2) |
|
|
463 | (1) |
|
C.2.2 Boost for Cygwin or Linux |
|
|
464 | (1) |
|
|
465 | (1) |
|
|
465 | (1) |
|
C.5 Installing MPICH2 and Boost MPI |
|
|
466 | (3) |
Bibliography |
|
469 | (18) |
Author Index |
|
487 | (6) |
Subject Index |
|
493 | |