Muutke küpsiste eelistusi

E-raamat: Clustering: A Data Recovery Approach, Second Edition

  • Formaat: 374 pages
  • Ilmumisaeg: 19-Apr-2016
  • Kirjastus: Chapman & Hall/CRC
  • Keel: eng
  • ISBN-13: 9781439838426
  • Formaat - PDF+DRM
  • Hind: 80,59 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 374 pages
  • Ilmumisaeg: 19-Apr-2016
  • Kirjastus: Chapman & Hall/CRC
  • Keel: eng
  • ISBN-13: 9781439838426

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Often considered more of an art than a science, books on clustering have been dominated by learning through example with techniques chosen almost through trial and error. Even the two most popular, and most related, clustering methodsK-Means for partitioning and Ward's method for hierarchical clusteringhave lacked the theoretical underpinning required to establish a firm relationship between the two methods and relevant interpretation aids. Other approaches, such as spectral clustering or consensus clustering, are considered absolutely unrelated to each other or to the two above mentioned methods.





Clustering: A Data Recovery Approach, Second Edition presents a unified modeling approach for the most popular clustering methods: the K-Means and hierarchical techniques, especially for divisive clustering. It significantly expands coverage of the mathematics of data recovery, and includes a new chapter covering more recent popular network clustering approachesspectral, modularity and uniform, additive, and consensustreated within the same data recovery approach. Another added chapter covers cluster validation and interpretation, including recent developments for ontology-driven interpretation of clusters. Altogether, the insertions added a hundred pages to the book, even in spite of the fact that fragments unrelated to the main topics were removed.





Illustrated using a set of small real-world datasets and more than a hundred examples, the book is oriented towards students, practitioners, and theoreticians of cluster analysis. Covering topics that are beyond the scope of most texts, the authors explanations of data recovery methods, theory-based advice, pre- and post-processing issues and his clear, practical instructions for real-world data mining make this book ideally suited for teaching, self-study, and professional reference.

Arvustused

"This book represents the second edition, aiming to consolidate, strengthen, and extend the presentation of K-means partitioning and Ward hierarchical clustering by adding new material such as five equivalent formulations for K-means, usage of split base vectors in hierarchical clustering, an effective version of least-squares divisive clustering, consensus clustering, etc. In addition, the book presents state-of-the-art material on validation and interpretation of clusters. The book is intended for teaching, self-study, and professional use." -Marina Gorunescu, Zentralblatt MATH 1297 "The second edition is a refinement of Mirkin's well-received first edition. ... an excellent starting point for those interested in the algorithmic underpinning and theory of cluster analysis..." -Journal of the American Statistical Association, June 2014 Praise for the First Edition: "The particular decomposition studied in this book is the decomposition of the total sum of squares matrix into, between, and within cluster components, and the book develops this decomposition, and its associated diagnostics, further than I have seen them developed for cluster analysis before. Overall, the book presents an unusual ... approach to cluster analysis, from the perspective of someone who is clearly an enthusiast for the insights these tools can bring to understanding data." -D.J. Hand, Short Book Reviews of the ISI

Preface to the Second Edition xi
Preface to the First Edition xiii
Acknowledgments xix
Examples xxi
1 What Is Clustering?
1(38)
Key Concepts
1(1)
1.1 Case Study Problems
2(26)
1.1.1 Structuring
3(1)
1.1.1.1 Market Towns
3(2)
1.1.1.2 Primates and Human Origin
5(1)
1.1.1.3 Gene Presence-Absence Profiles
6(2)
1.1.1.4 Knowledge Structure: Algebraic Functions
8(2)
1.1.2 Description
10(1)
1.1.2.1 Describing Iris Genera
10(2)
1.1.2.2 Body Mass
12(1)
1.1.3 Association
12(1)
1.1.3.1 Digits and Patterns of Confusion between Them
13(3)
1.1.3.2 Colleges
16(2)
1.1.4 Generalization
18(4)
1.1.5 Visualization of Data Structure
22(1)
1.1.5.1 One-Dimensional Data
22(1)
1.1.5.2 One-Dimensional Data within Groups
23(1)
1.1.5.3 Two-Dimensional Display
24(1)
1.1.5.4 Block Structure
25(2)
1.1.5.5 Structure
27(1)
1.1.5.6 Visualization Using an Inherent Topology
27(1)
1.2 Bird's-Eye View
28(11)
1.2.1 Definition: Data and Cluster Structure
28(1)
1.2.1.1 Data
28(1)
1.2.1.2 Cluster Structure
29(1)
1.2.2 Criteria for Obtaining a Good Cluster Structure
30(1)
1.2.3 Three Types of Cluster Description
31(1)
1.2.4 Stages of a Clustering Application
32(1)
1.2.5 Clustering and Other Disciplines
33(1)
1.2.6 Different Perspectives of Clustering
34(1)
1.2.6.1 Classical Statistics Perspective
34(1)
1.2.6.2 Machine-Learning Perspective
35(1)
1.2.6.3 Data-Mining Perspective
35(1)
1.2.6.4 Classification and Knowledge-Discovery Perspective
36(3)
2 What Is Data?
39(48)
Key Concepts
39(2)
2.1 Feature Characteristics
41(7)
2.1.1 Feature Scale Types
41(2)
2.1.2 Quantitative Case
43(4)
2.1.3 Categorical Case
47(1)
2.2 Bivariate Analysis
48(14)
2.2.1 Two Quantitative Variables
49(2)
2.2.2 Nominal and Quantitative Variables
51(1)
2.2.3 Two Nominal Variables Cross Classified
52(6)
2.2.4 Relation between the Correlation and Contingency Measures
58(1)
2.2.5 Meaning of the Correlation
59(3)
2.3 Feature Space and Data Scatter
62(4)
2.3.1 Data Matrix
62(1)
2.3.2 Feature Space: Distance and Inner Product
63(3)
2.3.3 Data Scatter
66(1)
2.4 Pre-Processing and Standardizing Mixed Data
66(7)
2.5 Similarity Data
73(14)
2.5.1 General
73(1)
2.5.2 Contingency and Redistribution Tables
74(3)
2.5.3 Affinity and Kernel Data
77(2)
2.5.4 Network Data
79(1)
2.5.5 Similarity Data Pre-Processing
80(1)
2.5.5.1 Removal of Low Similarities: Thresholding
81(1)
2.5.5.2 Subtraction of Background Noise
82(1)
2.5.5.3 Laplace Transformation
83(4)
3 K-Means Clustering and Related Approaches
87(46)
Key Concepts
87(2)
3.1 Conventional K-Means
89(9)
3.1.1 Generic K-Means
89(4)
3.1.2 Square Error Criterion
93(3)
3.1.3 Incremental Versions of K-Means
96(2)
3.2 Choice of K and Initialization of K-Means
98(12)
3.2.1 Conventional Approaches to Initial Setting
98(1)
3.2.1.1 Random Selection of Centroids
99(1)
3.2.1.2 Expert-Driven Selection of Centroids
99(1)
3.2.2 MaxMin for Producing Deviate Centroids
100(2)
3.2.3 Anomalous Centroids with Anomalous Pattern
102(2)
3.2.4 Anomalous Centroids with Method Build
104(2)
3.2.5 Choosing the Number of Clusters at the Post-Processing Stage
106(1)
3.2.5.1 Variance-Based Approach
106(1)
3.2.5.2 Within-Cluster Cohesion versus Between-Cluster Separation
107(1)
3.2.5.3 Combining Multiple Clusterings
108(1)
3.2.5.4 Resampling Methods
108(1)
3.2.5.5 Data Structure or Granularity Level?
109(1)
3.3 Intelligent K-Means: Iterated Anomalous Pattern
110(4)
3.4 Minkowski Metric K-Means and Feature Weighting
114(6)
3.4.1 Minkowski Distance and Minkowski Centers
114(2)
3.4.2 Feature Weighting at Minkowski Metric K-Means
116(4)
3.5 Extensions of K-Means Clustering
120(11)
3.5.1 Clustering Criteria and Implementation
120(2)
3.5.2 Partitioning around Medoids
122(1)
3.5.3 Fuzzy Clustering
123(2)
3.5.4 Regression-Wise Clustering
125(1)
3.5.5 Mixture of Distributions and EM Algorithm
126(3)
3.5.6 Kohonen Self-Organizing Maps
129(2)
3.6 Overall Assessment
131(2)
4 Least-Squares Hierarchical Clustering
133(28)
Key Concepts
133(1)
4.1 Hierarchical Cluster Structures
134(3)
4.2 Agglomeration: Ward Algorithm
137(4)
4.3 Least-Squares Divisive Clustering
141(11)
4.3.1 Ward Criterion and Distance
141(2)
4.3.2 Bisecting K-Means: 2-Splitting
143(1)
4.3.3 Splitting by Separation
144(3)
4.3.4 Principal Direction Partitioning
147(2)
4.3.5 Beating the Noise by Randomness
149(2)
4.3.6 Gower's Controversy
151(1)
4.4 Conceptual Clustering
152(4)
4.5 Extensions of Ward Clustering
156(2)
4.5.1 Agglomerative Clustering with Dissimilarity Data
156(1)
4.5.2 Hierarchical Clustering for Contingency Data
156(2)
4.6 Overall Assessment
158(3)
5 Similarity Clustering: Uniform, Modularity, Additive, Spectral, Consensus, and Single Linkage
161(40)
Key Concepts
161(3)
5.1 Summary Similarity Clustering
164(10)
5.1.1 Summary Similarity Clusters at Genuine Similarity Data
165(2)
5.1.2 Summary Similarity Criterion at Flat Network Data
167(5)
5.1.3 Summary Similarity Clustering at Affinity Data
172(2)
5.2 Normalized Cut and Spectral Clustering
174(5)
5.3 Additive Clustering
179(8)
5.3.1 Additive Cluster Model
179(1)
5.3.2 One-by-One Additive Clustering Strategy
180(7)
5.4 Consensus Clustering
187(8)
5.4.1 Ensemble and Combined Consensus Concepts
187(7)
5.4.2 Experimental Verification of Least-Squares Consensus Methods
194(1)
5.5 Single Linkage, Minimum Spanning Tree, and Connected Components
195(4)
5.6 Overall Assessment
199(2)
6 Validation and Interpretation
201(60)
Key Concepts
201(2)
6.1 General: Internal and External Validity
203(1)
6.2 Testing Internal Validity
204(9)
6.2.1 Scoring Correspondence between Clusters and Data
204(1)
6.2.1.1 Measures of Cluster Cohesion versus Isolation
204(1)
6.2.1.2 Indexes Derived Using the Data Recovery Approach
205(1)
6.2.1.3 Indexes Derived from Probabilistic Clustering Models
206(1)
6.2.2 Resampling Data for Validation
206(4)
6.2.3 Cross Validation of iK-Means Results
210(3)
6.3 Interpretation Aids in the Data Recovery Perspective
213(16)
6.3.1 Conventional Interpretation Aids
213(1)
6.3.2 Contribution and Relative Contribution Tables
214(7)
6.3.3 Cluster Representatives
221(3)
6.3.4 Measures of Association from ScaD Tables
224(1)
6.3.4.1 Quantitative Feature Case: Correlation Ratio
224(1)
6.3.4.2 Categorical Feature Case: Chi-Squared and Other Contingency Coefficients
224(2)
6.3.5 Interpretation Aids for Cluster Up-Hierarchies
226(3)
6.4 Conceptual Description of Clusters
229(12)
6.4.1 False Positives and Negatives
229(1)
6.4.2 Describing a Cluster with Production Rules
230(1)
6.4.3 Comprehensive Conjunctive Description of a Cluster
231(3)
6.4.4 Describing a Partition with Classification Trees
234(7)
6.5 Mapping Clusters to Knowledge
241(19)
6.5.1 Mapping a Cluster to Category
241(2)
6.5.2 Mapping between Partitions
243(4)
6.5.2.1 Match-Based Similarity versus Quetelet Association
247(1)
6.5.2.2 Average Distance in a Set of Partitions
248(1)
6.5.3 External Tree
249(1)
6.5.4 External Taxonomy
250(4)
6.5.5 Lifting Method
254(6)
6.6 Overall Assessment
260(1)
7 Least-Squares Data Recovery Clustering Models
261(68)
Key Concepts
261(2)
7.1 Statistics Modeling as Data Recovery
263(12)
7.1.1 Data Recovery Equation
263(1)
7.1.2 Averaging
264(1)
7.1.3 Linear Regression
265(1)
7.1.4 Principal Component Analysis
266(5)
7.1.5 Correspondence Factor Analysis
271(3)
7.1.6 Data Summarization versus Learning in Data Recovery
274(1)
7.2 K-Means as a Data Recovery Method
275(13)
7.2.1 Clustering Equation and Data Scatter Decomposition
275(1)
7.2.2 Contributions of Clusters, Features and Entities
276(1)
7.2.3 Correlation Ratio as Contribution
277(1)
7.2.4 Partition Contingency Coefficients
278(1)
7.2.5 Equivalent Reformulations of the Least-Squares Clustering Criterion
279(3)
7.2.6 Principal Cluster Analysis: Anomalous Pattern Clustering Method
282(2)
7.2.7 Weighting Variables in K-Means Model and Minkowski Metric
284(4)
7.3 Data Recovery Models for Hierarchical Clustering
288(15)
7.3.1 Data Recovery Models with Cluster Hierarchies
288(1)
7.3.2 Covariances, Variances and Data Scatter Decomposed
289(2)
7.3.3 Split Base Vectors and Matrix Equations for the Data Recovery Model
291(1)
7.3.4 Divisive Partitioning: Four Splitting Algorithms
292(1)
7.3.4.1 Bisecting K-Means, or 2-Splitting
293(1)
7.3.4.2 Principal Direction Division
294(2)
7.3.4.3 Conceptual Clustering
296(2)
7.3.4.4 Separating a Cluster
298(1)
7.3.5 Organizing an Up-Hierarchy: To Split or Not to Split
299(1)
7.3.6 A Straightforward Proof of the Equivalence between Bisecting K-Means and Ward Criteria
300(1)
7.3.7 Anomalous Pattern versus Splitting
301(2)
7.4 Data Recovery Models for Similarity Clustering
303(13)
7.4.1 Cut, Normalized Cut, and Spectral Clustering
303(4)
7.4.2 Similarity Clustering Induced by K-Means and Ward Criteria
307(1)
7.4.2.1 All Clusters at Once
308(1)
7.4.2.2 Hierarchical Clustering
309(2)
7.4.2.3 One-by-One Clustering
311(2)
7.4.3 Additive Clustering
313(2)
7.4.4 Agglomeration and Aggregation of Contingency Data
315(1)
7.5 Consensus and Ensemble Clustering
316(10)
7.5.1 Ensemble Clustering
316(5)
7.5.2 Combined Consensus Clustering
321(2)
7.5.3 Concordant Partition
323(1)
7.5.4 Muchnik's Consensus Partition Test
324(2)
7.5.5 Algorithms for Consensus Partition
326(1)
7.6 Overall Assessment
326(3)
References 329(12)
Index 341
Boris Mirkin is a professor of computer science at the University of London, UK.