Muutke küpsiste eelistusi

E-raamat: Data Clustering in C++: An Object-Oriented Approach

  • Formaat: 520 pages
  • Ilmumisaeg: 28-Mar-2011
  • Kirjastus: Chapman & Hall/CRC
  • Keel: eng
  • ISBN-13: 9781439862247
  • Formaat - PDF+DRM
  • Hind: 80,59 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 520 pages
  • Ilmumisaeg: 28-Mar-2011
  • Kirjastus: Chapman & Hall/CRC
  • Keel: eng
  • ISBN-13: 9781439862247

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Data clustering is a highly interdisciplinary field, the goal of which is to divide a set of objects into homogeneous groups such that objects in the same group are similar and objects in different groups are quite distinct. Thousands of theoretical papers and a number of books on data clustering have been published over the past 50 years. However, few books exist to teach people how to implement data clustering algorithms. This book was written for anyone who wants to implement or improve their data clustering algorithms.





Using object-oriented design and programming techniques, Data Clustering in C++ exploits the commonalities of all data clustering algorithms to create a flexible set of reusable classes that simplifies the implementation of any data clustering algorithm. Readers can follow the development of the base data clustering classes and several popular data clustering algorithms. Additional topics such as data pre-processing, data visualization, cluster visualization, and cluster interpretation are briefly covered.





This book is divided into three parts--



















Data Clustering and C++ Preliminaries: A review of basic concepts of data clustering, the unified modeling language, object-oriented programming in C++, and design patterns





A C++ Data Clustering Framework: The development of data clustering base classes





Data Clustering Algorithms: The implementation of several popular data clustering algorithms











A key to learning a clustering algorithm is to implement and experiment the clustering algorithm. Complete listings of classes, examples, unit test cases, and GNU configuration files are included in the appendices of this book as well as in the downloadable resources. The only requirements to compile the code are a modern C++ compiler and the Boost C++ libraries.
List of Figures
xv
List of Tables
xix
Preface xxi
I Data Clustering and C++ Preliminaries
1(100)
1 Introduction to Data Clustering
3(26)
1.1 Data Clustering
3(4)
1.1.1 Clustering versus Classification
4(1)
1.1.2 Definition of Clusters
5(2)
1.2 Data Types
7(1)
1.3 Dissimilarity and Similarity Measures
8(3)
1.3.1 Measures for Continuous Data
9(1)
1.3.2 Measures for Discrete Data
10(1)
1.3.3 Measures for Mixed-Type Data
10(1)
1.4 Hierarchical Clustering Algorithms
11(4)
1.4.1 Agglomerative Hierarchical Algorithms
12(2)
1.4.2 Divisive Hierarchical Algorithms
14(1)
1.4.3 Other Hierarchical Algorithms
14(1)
1.4.4 Dendrograms
15(1)
1.5 Partitional Clustering Algorithms
15(8)
1.5.1 Center-Based Clustering Algorithms
17(1)
1.5.2 Search-Based Clustering Algorithms
18(1)
1.5.3 Graph-Based Clustering Algorithms
19(1)
1.5.4 Grid-Based Clustering Algorithms
20(1)
1.5.5 Density-Based Clustering Algorithms
20(1)
1.5.6 Model-Based Clustering Algorithms
21(1)
1.5.7 Subspace Clustering Algorithms
22(1)
1.5.8 Neural Network-Based Clustering Algorithms
22(1)
1.5.9 Fuzzy Clustering Algorithms
23(1)
1.6 Cluster Validity
23(1)
1.7 Clustering Applications
24(1)
1.8 Literature of Clustering Algorithms
25(3)
1.8.1 Books on Data Clustering
25(1)
1.8.2 Surveys on Data Clustering
26(2)
1.9 Summary
28(1)
2 The Unified Modeling Language
29(12)
2.1 Package Diagrams
29(3)
2.2 Class Diagrams
32(4)
2.3 Use Case Diagrams
36(2)
2.4 Activity Diagrams
38(1)
2.5 Notes
39(1)
2.6 Summary
40(1)
3 Object-Oriented Programming and C++
41(16)
3.1 Object-Oriented Programming
41(1)
3.2 The C++ Programming Language
42(3)
3.3 Encapsulation
45(3)
3.4 Inheritance
48(2)
3.5 Polymorphism
50(4)
3.5.1 Dynamic Polymorphism
51(1)
3.5.2 Static Polymorphism
52(2)
3.6 Exception Handling
54(2)
3.7 Summary
56(1)
4 Design Patterns
57(20)
4.1 Singleton
58(3)
4.2 Composite
61(3)
4.3 Prototype
64(3)
4.4 Strategy
67(2)
4.5 Template Method
69(3)
4.6 Visitor
72(3)
4.7 Summary
75(2)
5 C++ Libraries and Tools
77(24)
5.1 The Standard Template Library
77(9)
5.1.1 Containers
77(5)
5.1.2 Iterators
82(2)
5.1.3 Algorithms
84(2)
5.2 Boost C++ Libraries
86(9)
5.2.1 Smart Pointers
87(2)
5.2.2 Variant
89(1)
5.2.3 Variant versus Any
90(2)
5.2.4 Tokenizer
92(1)
5.2.5 Unit Test Framework
93(2)
5.3 GNU Build System
95(3)
5.3.1 Autoconf
96(1)
5.3.2 Automake
97(1)
5.3.3 Libtool
97(1)
5.3.4 Using GNU Autotools
98(1)
5.4 Cygwin
98(1)
5.5 Summary
99(2)
II A C++ Data Clustering Framework
101(82)
6 The Clustering Library
103(12)
6.1 Directory Structure and Filenames
103(2)
6.2 Specification Files
105(4)
6.2.1 configure.ac
105(1)
6.2.2 Makefile.am
106(3)
6.3 Macros and typedef Declarations
109(2)
6.4 Error Handling
111(1)
6.5 Unit Testing
112(1)
6.6 Compilation and Installation
113(1)
6.7 Summary
114(1)
7 Datasets
115(16)
7.1 Attributes
115(7)
7.1.1 The Attribute Value Class
115(2)
7.1.2 The Base Attribute Information Class
117(2)
7.1.3 The Continuous Attribute Information Class
119(1)
7.1.4 The Discrete Attribute Information Class
120(2)
7.2 Records
122(3)
7.2.1 The Record Class
122(2)
7.2.2 The Schema Class
124(1)
7.3 Datasets
125(2)
7.4 A Dataset Example
127(3)
7.5 Summary
130(1)
8 Clusters
131(8)
8.1 Clusters
131(2)
8.2 Partitional Clustering
133(2)
8.3 Hierarchical Clustering
135(3)
8.4 Summary
138(1)
9 Dissimilarity Measures
139(10)
9.1 The Distance Base Class
139(1)
9.2 Minkowski Distance
140(1)
9.3 Euclidean Distance
141(1)
9.4 Simple Matching Distance
142(1)
9.5 Mixed Distance
143(1)
9.6 Mahalanobis Distance
144(3)
9.7 Summary
147(2)
10 Clustering Algorithms
149(12)
10.1 Arguments
149(1)
10.2 Results
150(1)
10.3 Algorithms
151(3)
10.4 A Dummy Clustering Algorithm
154(4)
10.5 Summary
158(3)
11 Utility Classes
161(22)
11.1 The Container Class
161(3)
11.2 The Double-Key Map Class
164(3)
11.3 The Dataset Adapters
167(8)
11.3.1 A CSV Dataset Reader
167(3)
11.3.2 A Dataset Generator
170(3)
11.3.3 A Dataset Normalizer
173(2)
11.4 The Node Visitors
175(2)
11.4.1 The Join Value Visitor
175(1)
11.4.2 The Partition Creation Visitor
176(1)
11.5 The Dendrogram Class
177(2)
11.6 The Dendrogram Visitor
179(1)
11.7 Summary
180(3)
III Data Clustering Algorithms
183(140)
12 Agglomerative Hierarchical Algorithms
185(32)
12.1 Description of the Algorithm
185(2)
12.2 Implementation
187(10)
12.2.1 The Single Linkage Algorithm
192(1)
12.2.2 The Complete Linkage Algorithm
192(1)
12.2.3 The Group Average Algorithm
193(1)
12.2.4 The Weighted Group Average Algorithm
194(1)
12.2.5 The Centroid Algorithm
194(1)
12.2.6 The Median Algorithm
195(1)
12.2.7 Ward's Algorithm
196(1)
12.3 Examples
197(17)
12.3.1 The Single Linkage Algorithm
198(2)
12.3.2 The Complete Linkage Algorithm
200(2)
12.3.3 The Group Average Algorithm
202(2)
12.3.4 The Weighted Group Average Algorithm
204(3)
12.3.5 The Centroid Algorithm
207(3)
12.3.6 The Median Algorithm
210(2)
12.3.7 Ward's Algorithm
212(2)
12.4 Summary
214(3)
13 DIANA
217(12)
13.1 Description of the Algorithm
217(1)
13.2 Implementation
218(5)
13.3 Examples
223(4)
13.4 Summary
227(2)
14 The k-means Algorithm
229(12)
14.1 Description of the Algorithm
229(1)
14.2 Implementation
230(5)
14.3 Examples
235(5)
14.4 Summary
240(1)
15 The c-means Algorithm
241(14)
15.1 Description of the Algorithm
241(1)
15.2 Implementaion
242(4)
15.3 Examples
246(7)
15.4 Summary
253(2)
16 The k-prototypes Algorithm
255(10)
16.1 Description of the Algorithm
255(1)
16.2 Implementation
256(2)
16.3 Examples
258(5)
16.4 Summary
263(2)
17 The Genetic k-modes Algorithm
265(14)
17.1 Description of the Algorithm
265(2)
17.2 Implementation
267(7)
17.3 Examples
274(3)
17.4 Summary
277(2)
18 The FSC Algorithm
279(12)
18.1 Description of the Algorithm
279(2)
18.2 Implementation
281(3)
18.3 Examples
284(6)
18.4 Summary
290(1)
19 The Gaussian Mixture Algorithm
291(16)
19.1 Description of the Algorithm
291(2)
19.2 Implementation
293(7)
19.3 Examples
300(6)
19.4 Summary
306(1)
20 A Parallel k-means Algorithm
307(16)
20.1 Message Passing Interface
307(3)
20.2 Description of the Algorithm
310(1)
20.3 Implementation
311(5)
20.4 Examples
316(4)
20.5 Summary
320(3)
A Exercises and Projects
323(2)
B Listings
325(136)
B.1 Files in Folder ClusLib
325(3)
B.1.1 Configuration File configure.ac
325(1)
B.1.2 m4 Macro File ac include. m4
326(1)
B.1.3 Makefile
327(1)
B.2 Files in Folder cl
328(3)
B.2.1 Makefile
328(1)
B.2.2 Macros and typedef Declarations
328(1)
B.2.3 Class Error
329(2)
B.3 Files in Folder cl/algorithms
331(37)
B.3.1 Makefile
331(1)
B.3.2 Class Algorithm
332(2)
B.3.3 Class Average
334(1)
B.3.4 Class Centroid
334(1)
B.3.5 Class Cmean
335(4)
B.3.6 Class Complete
339(1)
B.3.7 Class Diana
339(4)
B.3.8 Class FSC
343(4)
B.3.9 Class GKmode
347(6)
B.3.10 Class GMC
353(5)
B.3.11 Class Kmean
358(3)
B.3.12 Class Kprototype
361(1)
B.3.13 Class LW
362(2)
B.3.14 Class Median
364(1)
B.3.15 Class Single
365(1)
B.3.16 Class Ward
366(1)
B.3.17 Class Weighted
367(1)
B.4 Files in Folder cl/clusters
368(8)
B.4.1 Makefile
368(1)
B.4.2 Class CenterCluster
368(1)
B.4.3 Class Cluster
369(1)
B.4.4 Class HClustering
370(2)
B.4.5 Class PClustering
372(3)
B.4.6 Class SubspaceCluster
375(1)
B.5 Files in Folder cl/datasets
376(16)
B.5.1 Makefile
376(1)
B.5.2 Class AttrValue
376(1)
B.5.3 Class AttrInfo
377(2)
B.5.4 Class CAttrInfo
379(2)
B.5.5 Class DAttrInfo
381(3)
B.5.6 Class Record
384(2)
B.5.7 Class Schema
386(2)
B.5.8 Class Dataset
388(4)
B.6 Files in Folder cl/distances
392(6)
B.6.1 Makefile
392(1)
B.6.2 Class Distance
392(1)
B.6.3 Class EuclideanDistance
393(1)
B.6.4 Class MahalanobisDistance
394(1)
B.6.5 Class MinkowskiDistance
395(1)
B.6.6 Class MixedDistance
396(1)
B.6.7 Class SimpleMatchingDistance
397(1)
B.7 Files in Folder cl/patterns
398(10)
B.7.1 Makefile
398(1)
B.7.2 Class DendrogramVisitor
399(2)
B.7.3 Class InternalNode
401(2)
B.7.4 Class LeafNode
403(1)
B.7.5 Class Node
404(1)
B.7.6 Class NodeVisitor
405(1)
B.7.7 Class JoinValueVisitor
405(2)
B.7.8 Class PCVisitor
407(1)
B.8 Files in Folder cl/utilities
408(18)
B.8.1 Makefile
408(1)
B.8.2 Class Container
409(2)
B.8.3 Class DataAdapter
411(1)
B.8.4 Class DatasetGenerator
411(2)
B.8.5 Class DatasetNormalizer
413(2)
B.8.6 Class DatasetReader
415(3)
B.8.7 Class Dendrogram
418(3)
B.8.8 Class nnMap
421(2)
B.8.9 Matrix Functions
423(2)
B.8.10 Null Types
425(1)
B.9 Files in Folder examples
426(24)
B.9.1 Makefile
426(1)
B.9.2 Agglomerative Hierarchical Algorithms
426(3)
B.9.3 A Divisive Hierarchical Algorithm
429(1)
B.9.4 The k-means Algorithm
430(3)
B.9.5 The c-means Algorithm
433(2)
B.9.6 The k-prototypes Algorithm
435(2)
B.9.7 The Genetic k-modes Algorithm
437(2)
B.9.8 The FSC Algorithm
439(2)
B.9.9 The Ganssian Mixture Clustering Algorithm
441(3)
B.9.10 A Parallel k-means Algorithm
444(6)
B.10 Files in Folder test-suite
450(11)
B.10.1 Makefile
450(1)
B.10.2 The Master Test Suite
451(1)
B.10.3 Test of AttrInfo
451(2)
B.10.4 Test of Dataset
453(1)
B.10.5 Test of Distance
454(2)
B.10.6 Test of nnMap
456(2)
B.10.7 Test of Matrices
458(1)
B.10.8 Test of Schema
459(2)
C Software
461(8)
C.1 An Introduction to Makefiles
461(2)
C.1.1 Rules
461(1)
C.1.2 Variables
462(1)
C.2 Installing Boost
463(2)
C.2.1 Boost for Windows
463(1)
C.2.2 Boost for Cygwin or Linux
464(1)
C.3 Installing Cygwin
465(1)
C.4 Installing GMP
465(1)
C.5 Installing MPICH2 and Boost MPI
466(3)
Bibliography 469(18)
Author Index 487(6)
Subject Index 493
Guojun Gan, Manulife Financial, Toronto, Canada