Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Clustering: A Data Recovery Approach, Second Edition

4.50/5 (4 hinnangut Goodreads-ist)

Boris Mirkin

Formaat: 374 pages
Ilmumisaeg: 19-Apr-2016
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781439838426

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 80,59 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 374 pages
Ilmumisaeg: 19-Apr-2016
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781439838426

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Often considered more of an art than a science, books on clustering have been dominated by learning through example with techniques chosen almost through trial and error. Even the two most popular, and most related, clustering methodsK-Means for partitioning and Ward's method for hierarchical clusteringhave lacked the theoretical underpinning required to establish a firm relationship between the two methods and relevant interpretation aids. Other approaches, such as spectral clustering or consensus clustering, are considered absolutely unrelated to each other or to the two above mentioned methods.

Clustering: A Data Recovery Approach, Second Edition presents a unified modeling approach for the most popular clustering methods: the K-Means and hierarchical techniques, especially for divisive clustering. It significantly expands coverage of the mathematics of data recovery, and includes a new chapter covering more recent popular network clustering approachesspectral, modularity and uniform, additive, and consensustreated within the same data recovery approach. Another added chapter covers cluster validation and interpretation, including recent developments for ontology-driven interpretation of clusters. Altogether, the insertions added a hundred pages to the book, even in spite of the fact that fragments unrelated to the main topics were removed.

Illustrated using a set of small real-world datasets and more than a hundred examples, the book is oriented towards students, practitioners, and theoreticians of cluster analysis. Covering topics that are beyond the scope of most texts, the authors explanations of data recovery methods, theory-based advice, pre- and post-processing issues and his clear, practical instructions for real-world data mining make this book ideally suited for teaching, self-study, and professional reference.

Arvustused

"This book represents the second edition, aiming to consolidate, strengthen, and extend the presentation of K-means partitioning and Ward hierarchical clustering by adding new material such as five equivalent formulations for K-means, usage of split base vectors in hierarchical clustering, an effective version of least-squares divisive clustering, consensus clustering, etc. In addition, the book presents state-of-the-art material on validation and interpretation of clusters. The book is intended for teaching, self-study, and professional use." -Marina Gorunescu, Zentralblatt MATH 1297 "The second edition is a refinement of Mirkin's well-received first edition. ... an excellent starting point for those interested in the algorithmic underpinning and theory of cluster analysis..." -Journal of the American Statistical Association, June 2014 Praise for the First Edition: "The particular decomposition studied in this book is the decomposition of the total sum of squares matrix into, between, and within cluster components, and the book develops this decomposition, and its associated diagnostics, further than I have seen them developed for cluster analysis before. Overall, the book presents an unusual ... approach to cluster analysis, from the perspective of someone who is clearly an enthusiast for the insights these tools can bring to understanding data." -D.J. Hand, Short Book Reviews of the ISI

Preface to the Second Edition

Preface to the First Edition

xiii

Acknowledgments

xix

Examples

xxi

1 What Is Clustering?

(38)

Key Concepts

(1)

1.1 Case Study Problems

(26)

1.1.1 Structuring

(1)

1.1.1.1 Market Towns

(2)

1.1.1.2 Primates and Human Origin

(1)

1.1.1.3 Gene Presence-Absence Profiles

(2)

1.1.1.4 Knowledge Structure: Algebraic Functions

(2)

1.1.2 Description

(1)

1.1.2.1 Describing Iris Genera

(2)

1.1.2.2 Body Mass

(1)

1.1.3 Association

(1)

1.1.3.1 Digits and Patterns of Confusion between Them

(3)

1.1.3.2 Colleges

(2)

1.1.4 Generalization

(4)

1.1.5 Visualization of Data Structure

(1)

1.1.5.1 One-Dimensional Data

(1)

1.1.5.2 One-Dimensional Data within Groups

(1)

1.1.5.3 Two-Dimensional Display

(1)

1.1.5.4 Block Structure

(2)

1.1.5.5 Structure

(1)

1.1.5.6 Visualization Using an Inherent Topology

(1)

1.2 Bird's-Eye View

(11)

1.2.1 Definition: Data and Cluster Structure

(1)

1.2.1.1 Data

(1)

1.2.1.2 Cluster Structure

(1)

1.2.2 Criteria for Obtaining a Good Cluster Structure

(1)

1.2.3 Three Types of Cluster Description

(1)

1.2.4 Stages of a Clustering Application

(1)

1.2.5 Clustering and Other Disciplines

(1)

1.2.6 Different Perspectives of Clustering

(1)

1.2.6.1 Classical Statistics Perspective

(1)

1.2.6.2 Machine-Learning Perspective

(1)

1.2.6.3 Data-Mining Perspective

(1)

1.2.6.4 Classification and Knowledge-Discovery Perspective

(3)

2 What Is Data?

(48)

Key Concepts

(2)

2.1 Feature Characteristics

(7)

2.1.1 Feature Scale Types

(2)

2.1.2 Quantitative Case

(4)

2.1.3 Categorical Case

(1)

2.2 Bivariate Analysis

(14)

2.2.1 Two Quantitative Variables

(2)

2.2.2 Nominal and Quantitative Variables

(1)

2.2.3 Two Nominal Variables Cross Classified

(6)

2.2.4 Relation between the Correlation and Contingency Measures

(1)

2.2.5 Meaning of the Correlation

(3)

2.3 Feature Space and Data Scatter

(4)

2.3.1 Data Matrix

(1)

2.3.2 Feature Space: Distance and Inner Product

(3)

2.3.3 Data Scatter

(1)

2.4 Pre-Processing and Standardizing Mixed Data

(7)

2.5 Similarity Data

(14)

2.5.1 General

(1)

2.5.2 Contingency and Redistribution Tables

(3)

2.5.3 Affinity and Kernel Data

(2)

2.5.4 Network Data

(1)

2.5.5 Similarity Data Pre-Processing

(1)

2.5.5.1 Removal of Low Similarities: Thresholding

(1)

2.5.5.2 Subtraction of Background Noise

(1)

2.5.5.3 Laplace Transformation

(4)

3 K-Means Clustering and Related Approaches

(46)

Key Concepts

(2)

3.1 Conventional K-Means

(9)

3.1.1 Generic K-Means

(4)

3.1.2 Square Error Criterion

(3)

3.1.3 Incremental Versions of K-Means

(2)

3.2 Choice of K and Initialization of K-Means

(12)

3.2.1 Conventional Approaches to Initial Setting

(1)

3.2.1.1 Random Selection of Centroids

(1)

3.2.1.2 Expert-Driven Selection of Centroids

(1)

3.2.2 MaxMin for Producing Deviate Centroids

100

(2)

3.2.3 Anomalous Centroids with Anomalous Pattern

102

(2)

3.2.4 Anomalous Centroids with Method Build

104

(2)

3.2.5 Choosing the Number of Clusters at the Post-Processing Stage

106

(1)

3.2.5.1 Variance-Based Approach

106

(1)

3.2.5.2 Within-Cluster Cohesion versus Between-Cluster Separation

107

(1)

3.2.5.3 Combining Multiple Clusterings

108

(1)

3.2.5.4 Resampling Methods

108

(1)

3.2.5.5 Data Structure or Granularity Level?

109

(1)

3.3 Intelligent K-Means: Iterated Anomalous Pattern

110

(4)

3.4 Minkowski Metric K-Means and Feature Weighting

114

(6)

3.4.1 Minkowski Distance and Minkowski Centers

114

(2)

3.4.2 Feature Weighting at Minkowski Metric K-Means

116

(4)

3.5 Extensions of K-Means Clustering

120

(11)

3.5.1 Clustering Criteria and Implementation

120

(2)

3.5.2 Partitioning around Medoids

122

(1)

3.5.3 Fuzzy Clustering

123

(2)

3.5.4 Regression-Wise Clustering

125

(1)

3.5.5 Mixture of Distributions and EM Algorithm

126

(3)

3.5.6 Kohonen Self-Organizing Maps

129

(2)

3.6 Overall Assessment

131

(2)

4 Least-Squares Hierarchical Clustering

133

(28)

Key Concepts

133

(1)

4.1 Hierarchical Cluster Structures

134

(3)

4.2 Agglomeration: Ward Algorithm

137

(4)

4.3 Least-Squares Divisive Clustering

141

(11)

4.3.1 Ward Criterion and Distance

141

(2)

4.3.2 Bisecting K-Means: 2-Splitting

143

(1)

4.3.3 Splitting by Separation

144

(3)

4.3.4 Principal Direction Partitioning

147

(2)

4.3.5 Beating the Noise by Randomness

149

(2)

4.3.6 Gower's Controversy

151

(1)

4.4 Conceptual Clustering

152

(4)

4.5 Extensions of Ward Clustering

156

(2)

4.5.1 Agglomerative Clustering with Dissimilarity Data

156

(1)

4.5.2 Hierarchical Clustering for Contingency Data

156

(2)

4.6 Overall Assessment

158

(3)

5 Similarity Clustering: Uniform, Modularity, Additive, Spectral, Consensus, and Single Linkage

161

(40)

Key Concepts

161

(3)

5.1 Summary Similarity Clustering

164

(10)

5.1.1 Summary Similarity Clusters at Genuine Similarity Data

165

(2)

5.1.2 Summary Similarity Criterion at Flat Network Data

167

(5)

5.1.3 Summary Similarity Clustering at Affinity Data

172

(2)

5.2 Normalized Cut and Spectral Clustering

174

(5)

5.3 Additive Clustering

179

(8)

5.3.1 Additive Cluster Model

179

(1)

5.3.2 One-by-One Additive Clustering Strategy

180

(7)

5.4 Consensus Clustering

187

(8)

5.4.1 Ensemble and Combined Consensus Concepts

187

(7)

5.4.2 Experimental Verification of Least-Squares Consensus Methods

194

(1)

5.5 Single Linkage, Minimum Spanning Tree, and Connected Components

195

(4)

5.6 Overall Assessment

199

(2)

6 Validation and Interpretation

201

(60)

Key Concepts

201

(2)

6.1 General: Internal and External Validity

203

(1)

6.2 Testing Internal Validity

204

(9)

6.2.1 Scoring Correspondence between Clusters and Data

204

(1)

6.2.1.1 Measures of Cluster Cohesion versus Isolation

204

(1)

6.2.1.2 Indexes Derived Using the Data Recovery Approach

205

(1)

6.2.1.3 Indexes Derived from Probabilistic Clustering Models

206

(1)

6.2.2 Resampling Data for Validation

206

(4)

6.2.3 Cross Validation of iK-Means Results

210

(3)

6.3 Interpretation Aids in the Data Recovery Perspective

213

(16)

6.3.1 Conventional Interpretation Aids

213

(1)

6.3.2 Contribution and Relative Contribution Tables

214

(7)

6.3.3 Cluster Representatives

221

(3)

6.3.4 Measures of Association from ScaD Tables

224

(1)

6.3.4.1 Quantitative Feature Case: Correlation Ratio

224

(1)

6.3.4.2 Categorical Feature Case: Chi-Squared and Other Contingency Coefficients

224

(2)

6.3.5 Interpretation Aids for Cluster Up-Hierarchies

226

(3)

6.4 Conceptual Description of Clusters

229

(12)

6.4.1 False Positives and Negatives

229

(1)

6.4.2 Describing a Cluster with Production Rules

230

(1)

6.4.3 Comprehensive Conjunctive Description of a Cluster

231

(3)

6.4.4 Describing a Partition with Classification Trees

234

(7)

6.5 Mapping Clusters to Knowledge

241

(19)

6.5.1 Mapping a Cluster to Category

241

(2)

6.5.2 Mapping between Partitions

243

(4)

6.5.2.1 Match-Based Similarity versus Quetelet Association

247

(1)

6.5.2.2 Average Distance in a Set of Partitions

248

(1)

6.5.3 External Tree

249

(1)

6.5.4 External Taxonomy

250

(4)

6.5.5 Lifting Method

254

(6)

6.6 Overall Assessment

260

(1)

7 Least-Squares Data Recovery Clustering Models

261

(68)

Key Concepts

261

(2)

7.1 Statistics Modeling as Data Recovery

263

(12)

7.1.1 Data Recovery Equation

263

(1)

7.1.2 Averaging

264

(1)

7.1.3 Linear Regression

265

(1)

7.1.4 Principal Component Analysis

266

(5)

7.1.5 Correspondence Factor Analysis

271

(3)

7.1.6 Data Summarization versus Learning in Data Recovery

274

(1)

7.2 K-Means as a Data Recovery Method

275

(13)

7.2.1 Clustering Equation and Data Scatter Decomposition

275

(1)

7.2.2 Contributions of Clusters, Features and Entities

276

(1)

7.2.3 Correlation Ratio as Contribution

277

(1)

7.2.4 Partition Contingency Coefficients

278

(1)

7.2.5 Equivalent Reformulations of the Least-Squares Clustering Criterion

279

(3)

7.2.6 Principal Cluster Analysis: Anomalous Pattern Clustering Method

282

(2)

7.2.7 Weighting Variables in K-Means Model and Minkowski Metric

284

(4)

7.3 Data Recovery Models for Hierarchical Clustering

288

(15)

7.3.1 Data Recovery Models with Cluster Hierarchies

288

(1)

7.3.2 Covariances, Variances and Data Scatter Decomposed

289

(2)

7.3.3 Split Base Vectors and Matrix Equations for the Data Recovery Model

291

(1)

7.3.4 Divisive Partitioning: Four Splitting Algorithms

292

(1)

7.3.4.1 Bisecting K-Means, or 2-Splitting

293

(1)

7.3.4.2 Principal Direction Division

294

(2)

7.3.4.3 Conceptual Clustering

296

(2)

7.3.4.4 Separating a Cluster

298

(1)

7.3.5 Organizing an Up-Hierarchy: To Split or Not to Split

299

(1)

7.3.6 A Straightforward Proof of the Equivalence between Bisecting K-Means and Ward Criteria

300

(1)

7.3.7 Anomalous Pattern versus Splitting

301

(2)

7.4 Data Recovery Models for Similarity Clustering

303

(13)

7.4.1 Cut, Normalized Cut, and Spectral Clustering

303

(4)

7.4.2 Similarity Clustering Induced by K-Means and Ward Criteria

307

(1)

7.4.2.1 All Clusters at Once

308

(1)

7.4.2.2 Hierarchical Clustering

309

(2)

7.4.2.3 One-by-One Clustering

311

(2)

7.4.3 Additive Clustering

313

(2)

7.4.4 Agglomeration and Aggregation of Contingency Data

315

(1)

7.5 Consensus and Ensemble Clustering

316

(10)

7.5.1 Ensemble Clustering

316

(5)

7.5.2 Combined Consensus Clustering

321

(2)

7.5.3 Concordant Partition

323

(1)

7.5.4 Muchnik's Consensus Partition Test

324

(2)

7.5.5 Algorithms for Consensus Partition

326

(1)

7.6 Overall Assessment

326

(3)

References

329

(12)

Index

341

Boris Mirkin is a professor of computer science at the University of London, UK.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97814398384262e.html

Märksõnad:

E-raamat: Clustering: A Data Recovery Approach, Second Edition

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv