Muutke küpsiste eelistusi

Introduction to Clustering Large and High-Dimensional Data [Pehme köide]

(University of Maryland, Baltimore)
  • Formaat: Paperback / softback, 222 pages, kõrgus x laius x paksus: 229x153x15 mm, kaal: 307 g, Worked examples or Exercises
  • Ilmumisaeg: 13-Nov-2006
  • Kirjastus: Cambridge University Press
  • ISBN-10: 0521617936
  • ISBN-13: 9780521617932
Teised raamatud teemal:
  • Formaat: Paperback / softback, 222 pages, kõrgus x laius x paksus: 229x153x15 mm, kaal: 307 g, Worked examples or Exercises
  • Ilmumisaeg: 13-Nov-2006
  • Kirjastus: Cambridge University Press
  • ISBN-10: 0521617936
  • ISBN-13: 9780521617932
Teised raamatud teemal:
Focuses on a few of the important clustering algorithms in the context of information retrieval.

There is a growing need for a more automated system of partitioning data sets into groups, or clusters. For example, digital libraries and the World Wide Web continue to grow exponentially, the ability to find useful information increasingly depends on the indexing infrastructure or search engine. Clustering techniques can be used to discover natural groups in data sets and to identify abstract structures that might reside there, without having any background knowledge of the characteristics of the data. Clustering has been used in a variety of areas, including computer vision, VLSI design, data mining, bio-informatics (gene expression analysis), and information retrieval, to name just a few. This book focuses on a few of the most important clustering algorithms, providing a detailed account of these major models in an information retrieval context. The beginning chapters introduce the classic algorithms in detail, while the later chapters describe clustering through divergences and show recent research for more advanced audiences.

Arvustused

"...this book may serve as a useful reference for scientists and engineers who need to understand the concepts of clustering in general and/or to focus on text mining applications. It is also appropriate for students who are attending a course in pattern recognition, data mining, or classification and are interested in learning more about issues related to the k-means scheme for an undergraduate or master's thesis project. Last, it supplies very interesting material for instructors." Nicolas Loménie, IAPR Newsletter

Muu info

Focuses on a few of the important clustering algorithms in the context of information retrieval.
Foreword by Michael W. Berry xi
Preface xiii
1 Introduction and motivation
1(8)
1.1 A way to embed ASCII documents into a finite dimensional Euclidean space
3(2)
1.2 Clustering and this book
5(1)
1.3 Bibliographic notes
6(3)
2 Quadratic k-means algorithm
9(32)
2.1 Classical batch k-means algorithm
10(11)
2.1.1 Quadratic distance and centroids
12(1)
2.1.2 Batch k-means clustering algorithm
13(1)
2.1.3 Batch k-means: advantages and deficiencies
14(7)
2.2 Incremental algorithm
21(8)
2.2.1 Quadratic functions
21(4)
2.2.2 Incremental k-means algorithm
25(4)
2.3 Quadratic k-means: summary
29(8)
2.3.1 Numerical experiments with quadratic k-means
29(2)
2.3.2 Stable partitions
31(4)
2.3.3 Quadratic k-means
35(2)
2.4 Spectral relaxation
37(1)
2.5 Bibliographic notes
38(3)
3 BIRCH
41(10)
3.1 Balanced iterative reducing and clustering algorithm
41(3)
3.2 BIRCH-like k-means
44(5)
3.3 Bibliographic notes
49(2)
4 Spherical k-means algorithm
51(22)
4.1 Spherical batch k-means algorithm
51(6)
4.1.1 Spherical batch k-means: advantages and deficiencies
53(2)
4.1.2 Computational considerations
55(2)
4.2 Spherical two-cluster partition of one-dimensional data
57(7)
4.2.1 One-dimensional line vs. the unit circle
57(3)
4.2.2 Optimal two cluster partition on the unit circle
60(4)
4.3 Spherical batch and incremental clustering algorithms
64(8)
4.3.1 First variation for spherical k-means
65(3)
4.3.2 Spherical incremental iterations–computations complexity
68(1)
4.3.3 The "ping-pong" algorithm
69(2)
4.3.4 Quadratic and spherical k-means
71(1)
4.4 Bibliographic notes
72(1)
5 Linear algebra techniques
73(18)
5.1 Two approximation problems
73(1)
5.2 Nearest line
74(3)
5.3 Principal directions divisive partitioning
77(10)
5.3.1 Principal direction divisive partitioning (PDDP)
77(3)
5.3.2 Spherical principal directions divisive partitioning (sPDDP)
80(2)
5.3.3 Clustering with PDDP and sPDDP
82(5)
5.4 Largest eigenvector
87(2)
5.4.1 Power method
88(1)
5.4.2 An application: hubs and authorities
88(1)
5.5 Bibliographic notes
89(2)
6 Information theoretic clustering
91(10)
6.1 Kullback–Leibler divergence
91(3)
6.2 k-means with Kullback–Leibler divergence
94(2)
6.3 Numerical experiments
96(2)
6.4 Distance between partitions
98(1)
6.5 Bibliographic notes
99(2)
7 Clustering with optimization techniques
101(24)
7.1 Optimization framework
102(1)
7.2 Smoothing k-means algorithm
103(6)
7.3 Convergence
109(5)
7.4 Numerical experiments
114(8)
7.5 Bibliographic notes
122(3)
8 k-means clustering with divergences
125(30)
8.1 Bregman distance
125(3)
8.2 φ-divergences
128(4)
8.3 Clustering with entropy-like distances
132(3)
8.4 BIRCH-type clustering with entropy-like distances
135(5)
8.5 Numerical experiments with (upsilon, μ) k-means
140(4)
8.6 Smoothing with entropy-like distances
144(2)
8.7 Numerical experiments with (upsilon, μ) smoka
146(6)
8.8 Bibliographic notes
152(3)
9 Assessment of clustering results
155(6)
9.1 Internal criteria
155(1)
9.2 External criteria
156(4)
9.3 Bibliographic notes
160(1)
10 Appendix: Optimization and linear algebra background 161(18)
10.1 Eigenvalues of a symmetric matrix
161(2)
10.2 Lagrange multipliers
163(1)
10.3 Elements of convex analysis
164(14)
10.3.1 Conjugate functions
166(3)
10.3.2 Asymptotic cones
169(4)
10.3.3 Asymptotic functions
173(3)
10.3.4 Smoothing
176(2)
10.4 Bibliographic notes
178(1)
11 Solutions to selected problems 179(10)
Bibliography 189(14)
Index 203


Jacob Kogan is an Associate Professor in the Department of Mathematics and Statistics at the University of Maryland, Baltimore County. Dr. Kogan received his PhD in Mathematics from Weizmann Institute of Science, has held teaching and research positions at the University of Toronto and Purdue University. His research interests include Text and Data Mining, Optimization, Calculus of Variations, Optimal Control Theory, and Robust Stability of Control Systems. Dr. Kogan is the author of Bifurcations of Extremals in Optimal Control and Robust Stability and Convexity: An Introduction. Since 2001, he has also been affiliated with the Department of Computer Science and Electrical Engineering at UMBC. Dr. Kogan is a recipient of 20042005 Fulbright Fellowship to Israel. Together with Charles Nicholas of UMBC and Marc Teboulle of Tel-Aviv University he is co-editor of the volume Grouping Multidimensional Data: Recent Advances in Clustering.