|
1 On Some Facets of the Partition Set of a Finite Set |
|
|
1 | (60) |
|
1.1 Lattice of Partition Set of a Finite Set |
|
|
1 | (17) |
|
1.1.1 Definition and General Properties |
|
|
1 | (8) |
|
|
9 | (9) |
|
1.2 Partitions of an Integer |
|
|
18 | (3) |
|
|
18 | (2) |
|
|
20 | (1) |
|
1.3 Type of a Partition and Cardinality of the Associated Equivalence Binary Relation |
|
|
21 | (9) |
|
1.4 Ultrametric Spaces and Partition Chain Representation |
|
|
30 | (22) |
|
1.4.1 Definition and Properties of Ultrametric Spaces |
|
|
30 | (3) |
|
1.4.2 Partition Lattice Chains of a Finite Set and the Associated Ultrametric Spaces |
|
|
33 | (4) |
|
1.4.3 Partition Lattice Chains and the Associated Ultrametric Preordonances |
|
|
37 | (2) |
|
1.4.4 Partition Hierarchies and Dendrograms |
|
|
39 | (6) |
|
1.4.5 From a Symmetrical Binary Hierarchy to a Directed Binary Hierarchy |
|
|
45 | (7) |
|
1.5 Polyhedral Representation of the Partition Set of a Finite Set |
|
|
52 | (9) |
|
|
58 | (3) |
|
2 Two Methods of Non-hierarchical Clustering |
|
|
61 | (40) |
|
|
61 | (1) |
|
2.2 Central Partition Method |
|
|
62 | (18) |
|
2.2.1 Data Structure and Clustering Criterion |
|
|
62 | (7) |
|
2.2.2 Transfer Algorithm and Central Partition |
|
|
69 | (3) |
|
2.2.3 Objects with the Same Representation |
|
|
72 | (2) |
|
2.2.4 Statistical Asymptotic Analysis |
|
|
74 | (4) |
|
2.2.5 Remarks on the Application of the Central Partition Method and Developments |
|
|
78 | (2) |
|
2.3 Dynamic and Adaptative Clustering Method |
|
|
80 | (21) |
|
2.3.1 Data Structure and Clustering Criterion |
|
|
80 | (4) |
|
2.3.2 The AT-Means Algorithm |
|
|
84 | (2) |
|
2.3.3 Dynamic Cluster Algorithm |
|
|
86 | (5) |
|
2.3.4 Following the Definition of the Algorithm |
|
|
91 | (7) |
|
|
98 | (3) |
|
3 Structure and Mathematical Representation of Data |
|
|
101 | (48) |
|
3.1 Objects, Categories and Attributes |
|
|
101 | (2) |
|
3.2 Representation of the Attributes of Type I |
|
|
103 | (6) |
|
3.2.1 The Boolean Attribute |
|
|
104 | (1) |
|
3.2.2 The Numerical Attribute |
|
|
105 | (2) |
|
3.2.3 Defining a Categorical Attribute from a Numerical One |
|
|
107 | (2) |
|
3.3 Representation of the Attributes of Type II |
|
|
109 | (12) |
|
3.3.1 The Nominal Categorical Attribute |
|
|
110 | (3) |
|
3.3.2 The Ordinal Categorical Attribute |
|
|
113 | (3) |
|
3.3.3 The Ranking Attribute |
|
|
116 | (2) |
|
3.3.4 The Categorical Attribute Valuated by a Numerical Similarity |
|
|
118 | (2) |
|
3.3.5 The Valuated Binary Relation Attribute |
|
|
120 | (1) |
|
3.4 Representation of the Attributes of Type III |
|
|
121 | (16) |
|
3.4.1 The Preordonance Categorical Attribute |
|
|
121 | (3) |
|
3.4.2 The Taxonomic Categorical Attribute |
|
|
124 | (5) |
|
3.4.3 The Taxonomic Preordonance Attribute |
|
|
129 | (3) |
|
3.4.4 Coding the Different Attributes in Terms of Preordonance or Similarity Categorical Attributes |
|
|
132 | (5) |
|
3.5 Attribute Representations When Describing a Set C of Categories |
|
|
137 | (12) |
|
|
137 | (1) |
|
3.5.2 Attributes of Type I |
|
|
138 | (1) |
|
3.5.3 Nominal or Ordinal Categorical Attributes |
|
|
138 | (4) |
|
3.5.4 Ordinal (preordonance) or Numerical Similarity Categorical Attributes |
|
|
142 | (1) |
|
3.5.5 The Data Table: A Tarski System Tor a Statistical System S |
|
|
143 | (3) |
|
|
146 | (3) |
|
4 Ordinal and Metrical Analysis of the Resemblance Notion |
|
|
149 | (50) |
|
|
149 | (3) |
|
4.2 Formal Analysis in the Case of a Description of an Object Set O by Attributes of Type I; Extensions |
|
|
152 | (26) |
|
4.2.1 Similarity Index in the Case of Boolean Data |
|
|
152 | (13) |
|
4.2.2 Preordonance Associated with a Similarity Index in the Case of Boolean Data |
|
|
165 | (13) |
|
4.3 Extension of the Indices Defined in the Boolean Case to Attributes of Type II or III |
|
|
178 | (21) |
|
|
178 | (2) |
|
4.3.2 Comparing Nominal Categorical Attributes |
|
|
180 | (3) |
|
4.3.3 Comparing Ordinal Categorical Attributes |
|
|
183 | (8) |
|
4.3.4 Comparing Preordonance Categorical Attributes |
|
|
191 | (5) |
|
|
196 | (3) |
|
5 Comparing Attributes by Probabilistic and Statistical Association I |
|
|
199 | (52) |
|
|
199 | (2) |
|
5.2 Comparing Attributes of Type I for an Object Set Description by the Likelihood Linkage Analysis Approach |
|
|
201 | (32) |
|
|
201 | (20) |
|
5.2.2 Comparing Numerical Attributes in the LLA approach |
|
|
221 | (12) |
|
5.3 Comparing Attributes for a Description of a Set of Categories |
|
|
233 | (18) |
|
|
233 | (1) |
|
5.3.2 Case of a Description by Boolean Attributes |
|
|
234 | (8) |
|
5.3.3 Comparing Distributions of Numerical, Ordinal Categorical and Nominal Categorical Attributes |
|
|
242 | (5) |
|
|
247 | (4) |
|
6 Comparing Attributes by a Probabilistic and Statistical Association II |
|
|
251 | (74) |
|
|
251 | (1) |
|
6.2 Comparing Attributes of Type II for an Object Set Description; the LLA Approach |
|
|
252 | (73) |
|
6.2.1 Introduction; Alternatives in Normalizing Association Coefficients |
|
|
252 | (4) |
|
6.2.2 Comparing Two Ranking Attributes |
|
|
256 | (5) |
|
6.2.3 Comparing Two Nominal Categorical Attributes |
|
|
261 | (15) |
|
6.2.4 Comparing Two Ordinal Categorical Attributes |
|
|
276 | (10) |
|
6.2.5 Comparing Two Valuated Binary Relation Attributes |
|
|
286 | (23) |
|
6.2.6 From the Total Association to the Partial One |
|
|
309 | (12) |
|
|
321 | (4) |
|
7 Comparing Objects or Categories Described by Attributes |
|
|
325 | (32) |
|
|
325 | (3) |
|
7.2 Comparing Objects or Categories by the LLA Method |
|
|
328 | (29) |
|
7.2.1 The Outline of the LLA Method for Comparing Objects or Categories |
|
|
328 | (3) |
|
7.2.2 Similarity Index Between Objects Described by Numerical or Boolean Attributes |
|
|
331 | (3) |
|
7.2.3 Similarity Index Between Objects Described by Nominal or Ordinal Categorical Attributes |
|
|
334 | (4) |
|
7.2.4 Similarity Index Between Objects Described by Preordonance or Valuated Categorical Attributes |
|
|
338 | (3) |
|
7.2.5 Similarity Index Between Objects Described by Taxonomic Attributes. A Solution for the Classification Consensus Problem |
|
|
341 | (3) |
|
7.2.6 Similarity Index Between Objects Described by a Mixed Attribute Types: Heterogenous Description |
|
|
344 | (1) |
|
7.2.7 The Goodall Similarity Index |
|
|
345 | (4) |
|
7.2.8 Similarity Index Between Rows of a Juxtaposition of Contingency Tables |
|
|
349 | (4) |
|
7.2.9 Other Similarity Indices on the Row Set I of a Contingency Table |
|
|
353 | (2) |
|
|
355 | (2) |
|
8 The Notion of "Natural" Class, Tools for Its Interpretation. The Classifiability Concept |
|
|
357 | (78) |
|
8.1 Introduction; Monothetic Class and Polythetic Class |
|
|
357 | (6) |
|
8.1.1 The Intuitive Approaches of Beckner and Adanson; from Beckner to Adanson |
|
|
360 | (3) |
|
8.2 Discriminating a Cluster of Objects by a Descriptive Attribute |
|
|
363 | (6) |
|
|
363 | (1) |
|
8.2.2 Case of Attributes of Type I: Numerical and Boolean |
|
|
364 | (2) |
|
8.2.3 Discrimination a Partition by a Categorical Attribute |
|
|
366 | (3) |
|
8.3 "Responsibility" Degree of an Object in an Attribute Cluster Formation |
|
|
369 | (8) |
|
8.3.1 A is Composed of Attributes of Type I |
|
|
370 | (5) |
|
8.3.2 The Attribute Set A is Composed of Categorical or Ranking Attributes |
|
|
375 | (2) |
|
8.4 Rows or Columns of Contingency Tables |
|
|
377 | (5) |
|
8.4.1 Case of a Single Contingency Table |
|
|
377 | (4) |
|
8.4.2 Case of an Horizontal Juxtaposition of Contingency Tables |
|
|
381 | (1) |
|
8.5 On Two Ways of Measuring the "Importance" of a Descriptive Attribute |
|
|
382 | (9) |
|
|
382 | (4) |
|
8.5.2 Comparing Clustering "Importance" and Projective "Importance" of a Descriptive Attribute |
|
|
386 | (5) |
|
8.6 Crossing Fuzzy Categorical Attributes or Fuzzy Classifications (Clusterings) |
|
|
391 | (28) |
|
8.6.1 General Introduction |
|
|
391 | (3) |
|
8.6.2 Crossing Net Classifications; Introduction to Other Crossings |
|
|
394 | (6) |
|
8.6.3 Crossing a Net and a Fuzzy Dichotomous Classifications |
|
|
400 | (4) |
|
8.6.4 Crossing Two Fuzzy Dichotomous Classifications |
|
|
404 | (4) |
|
8.6.5 Crossing Two Typologies |
|
|
408 | (3) |
|
8.6.6 Extension to Crossing Fuzzy Relational Categorical Attributes |
|
|
411 | (8) |
|
|
419 | (16) |
|
|
419 | (1) |
|
8.7.2 Discrepancy Between the Preordonance Structure and that Ultrametric, on a Data Set |
|
|
420 | (6) |
|
8.7.3 Classifiability Distribution Under a Random Hypothesis of Non-ultrametricity |
|
|
426 | (5) |
|
8.7.4 The Murtagh Contribution |
|
|
431 | (1) |
|
|
432 | (3) |
|
9 Quality Measures in Clustering |
|
|
435 | (78) |
|
|
435 | (3) |
|
9.2 The Direct Clustering Approach: An Example of a Criterion |
|
|
438 | (5) |
|
9.2.1 General Presentation |
|
|
438 | (2) |
|
|
440 | (3) |
|
9.3 Quality of a Partition Based on the Pairwise Similarities |
|
|
443 | (46) |
|
9.3.1 Criteria Based on a Data Preordonance |
|
|
444 | (7) |
|
9.3.2 Approximating a Symmetrical Binary Relation by an Equivalence Relation: The Zahn Problem |
|
|
451 | (5) |
|
9.3.3 Comparing Two Basic Criteria |
|
|
456 | (12) |
|
9.3.4 Distribution of the Intersection Criterion on the Partition Set with a Fixed Type |
|
|
468 | (6) |
|
9.3.5 Extensions of the Previous Criterion |
|
|
474 | (9) |
|
9.3.6 "Significant Levels" and "Significant Nodes" of a Classification Tree |
|
|
483 | (6) |
|
9.4 Measuring the Fitting Quality of a Partition Chain (Classification Tree) |
|
|
489 | (24) |
|
|
489 | (1) |
|
9.4.2 Generalization of the Set Theoretic and Metrical Criteria |
|
|
490 | (3) |
|
9.4.3 Distribution of the Cardinality of the Graph Intersection Criterion |
|
|
493 | (9) |
|
9.4.4 Pure Ordinal Criteria: The Lateral Order and the Lexicographic Order Criteria |
|
|
502 | (2) |
|
9.4.5 Lexicographic Ranking and Inversion Number Criteria |
|
|
504 | (7) |
|
|
511 | (2) |
|
10 Building a Classification Tree |
|
|
513 | (70) |
|
|
513 | (6) |
|
10.2 "Lexicographic" Ordinal Algorithm |
|
|
519 | (8) |
|
10.2.1 Definition of an Ultrametric Preordonance Associated with a Preordonance Data |
|
|
519 | (2) |
|
10.2.2 Algorithm for Determining ωu Defined by the H Function |
|
|
521 | (2) |
|
10.2.3 Property of Optimality |
|
|
523 | (1) |
|
10.2.4 Case Where ω Is a Total Ordonance |
|
|
524 | (3) |
|
10.3 Ascendant Agglomerative Hierarchical Clustering Algorithm; Classical Aggregation Criteria |
|
|
527 | (8) |
|
|
527 | (1) |
|
10.3.2 "Single Linkage", "Complete Linkage" and "Average Linkage" Criteria |
|
|
528 | (2) |
|
10.3.3 "Inertia Variation (or Ward) Criterion" |
|
|
530 | (4) |
|
10.3.4 From "Lexicographic" Ordinal Algorithm to "Single Linkage" or "Maximal Link" Algorithm |
|
|
534 | (1) |
|
10.4 AAHC Algorithms; Likelihood Linkage Criteria |
|
|
535 | (14) |
|
10.4.1 Family of Criteria of the Maximal Likelihood Linkage |
|
|
535 | (10) |
|
10.4.2 Minimal Likelihood Linkage and Average Likelihood Linkage in the LLA Analysis |
|
|
545 | (4) |
|
10.5 AAHC for Clustering Rows or Columns of a Contingency Table |
|
|
549 | (6) |
|
|
549 | (1) |
|
10.5.2 Chi Square Criterion: A Transposition of the Ward Criterion |
|
|
550 | (2) |
|
10.5.3 Mutual Information Criterion |
|
|
552 | (3) |
|
10.6 Efficient Algorithms in Ascendant Agglomerative Hierarchical Classification (Clustering) |
|
|
555 | (28) |
|
|
555 | (3) |
|
10.6.2 Complexity Considerations of the Basic AAHC Algorithm |
|
|
558 | (2) |
|
10.6.3 Reactualization Formulas in the Cases of Binary and Multiple Aggregations |
|
|
560 | (6) |
|
10.6.4 Reducibility, Monotonic Criterion, Reducible Neighborhoods and Reciprocal Nearest Neighborhoods |
|
|
566 | (6) |
|
10.6.5 Ascendant Agglomerative Hierarchical Clustering (AAHC) Under a Contiguity Constraint |
|
|
572 | (4) |
|
10.6.6 Ascendant Agglomerative Parallel Hierarchical Clustering |
|
|
576 | (4) |
|
|
580 | (3) |
|
11 Applying the LLA Method to Real Data |
|
|
583 | (56) |
|
11.1 Introduction: the CHAVL Software (Classification Hierarchique par Analyse de la Vraisemblance des Liens) |
|
|
583 | (3) |
|
11.2 Real Data: Outline Presentation of Some Processings |
|
|
586 | (4) |
|
11.3 Types of Child Characters Through Children's Literature |
|
|
590 | (10) |
|
11.3.1 Preamble: Technical Data Sheet |
|
|
590 | (1) |
|
11.3.2 General Objective and Data Description |
|
|
591 | (2) |
|
11.3.3 Profiles Extracted from the Classification Tree on A |
|
|
593 | (3) |
|
|
596 | (1) |
|
11.3.5 Standardized Association Coefficient with Respect to the Hypergeometric Model |
|
|
596 | (1) |
|
11.3.6 Return to Individuals |
|
|
597 | (3) |
|
11.4 Dayhoff, Henikoffs and LLA Matrices for Comparing Proteic Sequences |
|
|
600 | (29) |
|
11.4.1 Preamble: Technical Data Sheet |
|
|
600 | (1) |
|
|
601 | (2) |
|
11.4.3 Construction of the Dayhoff Matrix |
|
|
603 | (9) |
|
11.4.4 The Henikoffs Matrix: Comparison with the Dayhoff Matrix |
|
|
612 | (4) |
|
|
616 | (3) |
|
11.4.6 LLA Similarity Index on a Set of Proteic Aligned Sequences |
|
|
619 | (6) |
|
|
625 | (4) |
|
11.5 Specific Results in Clustering Categorical Attributes by LLA Methodology |
|
|
629 | (10) |
|
11.5.1 Structuring the Sets of Values of Categorical Attributes |
|
|
629 | (3) |
|
11.5.2 From Total Associations Between Categorical Attributes to Partial Ones |
|
|
632 | (5) |
|
|
637 | (2) |
|
12 Conclusion and Thoughts for Future Works |
|
|
639 | |
|
12.1 Contribution to Challenges in Cluster Analysis |
|
|
639 | (2) |
|
12.2 Around Two Books Concerning Relational Aspects |
|
|
641 | (2) |
|
12.3 Developments in the Framework of the LLA Approach |
|
|
643 | (3) |
|
12.3.1 Principal Component Analysis |
|
|
643 | (1) |
|
12.3.2 Multidimensional Scaling |
|
|
644 | (1) |
|
12.3.3 In What LLA Hierarchical Clustering Method Is a Probabilistic Method? |
|
|
645 | (1) |
|
12.3.4 Semi-supervised Hierarchical Classification |
|
|
645 | (1) |
|
|
646 | |
|
|
646 | |