Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Data Clustering in C++: An Object-Oriented Approach

3.50/5 (4 hinnangut Goodreads-ist)

Guojun Gan

Formaat: 520 pages
Ilmumisaeg: 28-Mar-2011
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781439862247

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 80,59 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 520 pages
Ilmumisaeg: 28-Mar-2011
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781439862247

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Data clustering is a highly interdisciplinary field, the goal of which is to divide a set of objects into homogeneous groups such that objects in the same group are similar and objects in different groups are quite distinct. Thousands of theoretical papers and a number of books on data clustering have been published over the past 50 years. However, few books exist to teach people how to implement data clustering algorithms. This book was written for anyone who wants to implement or improve their data clustering algorithms.

Using object-oriented design and programming techniques, Data Clustering in C++ exploits the commonalities of all data clustering algorithms to create a flexible set of reusable classes that simplifies the implementation of any data clustering algorithm. Readers can follow the development of the base data clustering classes and several popular data clustering algorithms. Additional topics such as data pre-processing, data visualization, cluster visualization, and cluster interpretation are briefly covered.

This book is divided into three parts--

Data Clustering and C++ Preliminaries: A review of basic concepts of data clustering, the unified modeling language, object-oriented programming in C++, and design patterns

A C++ Data Clustering Framework: The development of data clustering base classes

Data Clustering Algorithms: The implementation of several popular data clustering algorithms

A key to learning a clustering algorithm is to implement and experiment the clustering algorithm. Complete listings of classes, examples, unit test cases, and GNU configuration files are included in the appendices of this book as well as in the downloadable resources. The only requirements to compile the code are a modern C++ compiler and the Boost C++ libraries.

List of Figures

List of Tables

xix

Preface

xxi

I Data Clustering and C++ Preliminaries

(100)

1 Introduction to Data Clustering

(26)

1.1 Data Clustering

(4)

1.1.1 Clustering versus Classification

(1)

1.1.2 Definition of Clusters

(2)

1.2 Data Types

(1)

1.3 Dissimilarity and Similarity Measures

(3)

1.3.1 Measures for Continuous Data

(1)

1.3.2 Measures for Discrete Data

(1)

1.3.3 Measures for Mixed-Type Data

(1)

1.4 Hierarchical Clustering Algorithms

(4)

1.4.1 Agglomerative Hierarchical Algorithms

(2)

1.4.2 Divisive Hierarchical Algorithms

(1)

1.4.3 Other Hierarchical Algorithms

(1)

1.4.4 Dendrograms

(1)

1.5 Partitional Clustering Algorithms

(8)

1.5.1 Center-Based Clustering Algorithms

(1)

1.5.2 Search-Based Clustering Algorithms

(1)

1.5.3 Graph-Based Clustering Algorithms

(1)

1.5.4 Grid-Based Clustering Algorithms

(1)

1.5.5 Density-Based Clustering Algorithms

(1)

1.5.6 Model-Based Clustering Algorithms

(1)

1.5.7 Subspace Clustering Algorithms

(1)

1.5.8 Neural Network-Based Clustering Algorithms

(1)

1.5.9 Fuzzy Clustering Algorithms

(1)

1.6 Cluster Validity

(1)

1.7 Clustering Applications

(1)

1.8 Literature of Clustering Algorithms

(3)

1.8.1 Books on Data Clustering

(1)

1.8.2 Surveys on Data Clustering

(2)

1.9 Summary

(1)

2 The Unified Modeling Language

(12)

2.1 Package Diagrams

(3)

2.2 Class Diagrams

(4)

2.3 Use Case Diagrams

(2)

2.4 Activity Diagrams

(1)

2.5 Notes

(1)

2.6 Summary

(1)

3 Object-Oriented Programming and C++

(16)

3.1 Object-Oriented Programming

(1)

3.2 The C++ Programming Language

(3)

3.3 Encapsulation

(3)

3.4 Inheritance

(2)

3.5 Polymorphism

(4)

3.5.1 Dynamic Polymorphism

(1)

3.5.2 Static Polymorphism

(2)

3.6 Exception Handling

(2)

3.7 Summary

(1)

4 Design Patterns

(20)

4.1 Singleton

(3)

4.2 Composite

(3)

4.3 Prototype

(3)

4.4 Strategy

(2)

4.5 Template Method

(3)

4.6 Visitor

(3)

4.7 Summary

(2)

5 C++ Libraries and Tools

(24)

5.1 The Standard Template Library

(9)

5.1.1 Containers

(5)

5.1.2 Iterators

(2)

5.1.3 Algorithms

(2)

5.2 Boost C++ Libraries

(9)

5.2.1 Smart Pointers

(2)

5.2.2 Variant

(1)

5.2.3 Variant versus Any

(2)

5.2.4 Tokenizer

(1)

5.2.5 Unit Test Framework

(2)

5.3 GNU Build System

(3)

5.3.1 Autoconf

(1)

5.3.2 Automake

(1)

5.3.3 Libtool

(1)

5.3.4 Using GNU Autotools

(1)

5.4 Cygwin

(1)

5.5 Summary

(2)

II A C++ Data Clustering Framework

101

(82)

6 The Clustering Library

103

(12)

6.1 Directory Structure and Filenames

103

(2)

6.2 Specification Files

105

(4)

6.2.1 configure.ac

105

(1)

6.2.2 Makefile.am

106

(3)

6.3 Macros and typedef Declarations

109

(2)

6.4 Error Handling

111

(1)

6.5 Unit Testing

112

(1)

6.6 Compilation and Installation

113

(1)

6.7 Summary

114

(1)

7 Datasets

115

(16)

7.1 Attributes

115

(7)

7.1.1 The Attribute Value Class

115

(2)

7.1.2 The Base Attribute Information Class

117

(2)

7.1.3 The Continuous Attribute Information Class

119

(1)

7.1.4 The Discrete Attribute Information Class

120

(2)

7.2 Records

122

(3)

7.2.1 The Record Class

122

(2)

7.2.2 The Schema Class

124

(1)

7.3 Datasets

125

(2)

7.4 A Dataset Example

127

(3)

7.5 Summary

130

(1)

8 Clusters

131

(8)

8.1 Clusters

131

(2)

8.2 Partitional Clustering

133

(2)

8.3 Hierarchical Clustering

135

(3)

8.4 Summary

138

(1)

9 Dissimilarity Measures

139

(10)

9.1 The Distance Base Class

139

(1)

9.2 Minkowski Distance

140

(1)

9.3 Euclidean Distance

141

(1)

9.4 Simple Matching Distance

142

(1)

9.5 Mixed Distance

143

(1)

9.6 Mahalanobis Distance

144

(3)

9.7 Summary

147

(2)

10 Clustering Algorithms

149

(12)

10.1 Arguments

149

(1)

10.2 Results

150

(1)

10.3 Algorithms

151

(3)

10.4 A Dummy Clustering Algorithm

154

(4)

10.5 Summary

158

(3)

11 Utility Classes

161

(22)

11.1 The Container Class

161

(3)

11.2 The Double-Key Map Class

164

(3)

11.3 The Dataset Adapters

167

(8)

11.3.1 A CSV Dataset Reader

167

(3)

11.3.2 A Dataset Generator

170

(3)

11.3.3 A Dataset Normalizer

173

(2)

11.4 The Node Visitors

175

(2)

11.4.1 The Join Value Visitor

175

(1)

11.4.2 The Partition Creation Visitor

176

(1)

11.5 The Dendrogram Class

177

(2)

11.6 The Dendrogram Visitor

179

(1)

11.7 Summary

180

(3)

III Data Clustering Algorithms

183

(140)

12 Agglomerative Hierarchical Algorithms

185

(32)

12.1 Description of the Algorithm

185

(2)

12.2 Implementation

187

(10)

12.2.1 The Single Linkage Algorithm

192

(1)

12.2.2 The Complete Linkage Algorithm

192

(1)

12.2.3 The Group Average Algorithm

193

(1)

12.2.4 The Weighted Group Average Algorithm

194

(1)

12.2.5 The Centroid Algorithm

194

(1)

12.2.6 The Median Algorithm

195

(1)

12.2.7 Ward's Algorithm

196

(1)

12.3 Examples

197

(17)

12.3.1 The Single Linkage Algorithm

198

(2)

12.3.2 The Complete Linkage Algorithm

200

(2)

12.3.3 The Group Average Algorithm

202

(2)

12.3.4 The Weighted Group Average Algorithm

204

(3)

12.3.5 The Centroid Algorithm

207

(3)

12.3.6 The Median Algorithm

210

(2)

12.3.7 Ward's Algorithm

212

(2)

12.4 Summary

214

(3)

13 DIANA

217

(12)

13.1 Description of the Algorithm

217

(1)

13.2 Implementation

218

(5)

13.3 Examples

223

(4)

13.4 Summary

227

(2)

14 The k-means Algorithm

229

(12)

14.1 Description of the Algorithm

229

(1)

14.2 Implementation

230

(5)

14.3 Examples

235

(5)

14.4 Summary

240

(1)

15 The c-means Algorithm

241

(14)

15.1 Description of the Algorithm

241

(1)

15.2 Implementaion

242

(4)

15.3 Examples

246

(7)

15.4 Summary

253

(2)

16 The k-prototypes Algorithm

255

(10)

16.1 Description of the Algorithm

255

(1)

16.2 Implementation

256

(2)

16.3 Examples

258

(5)

16.4 Summary

263

(2)

17 The Genetic k-modes Algorithm

265

(14)

17.1 Description of the Algorithm

265

(2)

17.2 Implementation

267

(7)

17.3 Examples

274

(3)

17.4 Summary

277

(2)

18 The FSC Algorithm

279

(12)

18.1 Description of the Algorithm

279

(2)

18.2 Implementation

281

(3)

18.3 Examples

284

(6)

18.4 Summary

290

(1)

19 The Gaussian Mixture Algorithm

291

(16)

19.1 Description of the Algorithm

291

(2)

19.2 Implementation

293

(7)

19.3 Examples

300

(6)

19.4 Summary

306

(1)

20 A Parallel k-means Algorithm

307

(16)

20.1 Message Passing Interface

307

(3)

20.2 Description of the Algorithm

310

(1)

20.3 Implementation

311

(5)

20.4 Examples

316

(4)

20.5 Summary

320

(3)

A Exercises and Projects

323

(2)

B Listings

325

(136)

B.1 Files in Folder ClusLib

325

(3)

B.1.1 Configuration File configure.ac

325

(1)

B.1.2 m4 Macro File ac include. m4

326

(1)

B.1.3 Makefile

327

(1)

B.2 Files in Folder cl

328

(3)

B.2.1 Makefile

328

(1)

B.2.2 Macros and typedef Declarations

328

(1)

B.2.3 Class Error

329

(2)

B.3 Files in Folder cl/algorithms

331

(37)

B.3.1 Makefile

331

(1)

B.3.2 Class Algorithm

332

(2)

B.3.3 Class Average

334

(1)

B.3.4 Class Centroid

334

(1)

B.3.5 Class Cmean

335

(4)

B.3.6 Class Complete

339

(1)

B.3.7 Class Diana

339

(4)

B.3.8 Class FSC

343

(4)

B.3.9 Class GKmode

347

(6)

B.3.10 Class GMC

353

(5)

B.3.11 Class Kmean

358

(3)

B.3.12 Class Kprototype

361

(1)

B.3.13 Class LW

362

(2)

B.3.14 Class Median

364

(1)

B.3.15 Class Single

365

(1)

B.3.16 Class Ward

366

(1)

B.3.17 Class Weighted

367

(1)

B.4 Files in Folder cl/clusters

368

(8)

B.4.1 Makefile

368

(1)

B.4.2 Class CenterCluster

368

(1)

B.4.3 Class Cluster

369

(1)

B.4.4 Class HClustering

370

(2)

B.4.5 Class PClustering

372

(3)

B.4.6 Class SubspaceCluster

375

(1)

B.5 Files in Folder cl/datasets

376

(16)

B.5.1 Makefile

376

(1)

B.5.2 Class AttrValue

376

(1)

B.5.3 Class AttrInfo

377

(2)

B.5.4 Class CAttrInfo

379

(2)

B.5.5 Class DAttrInfo

381

(3)

B.5.6 Class Record

384

(2)

B.5.7 Class Schema

386

(2)

B.5.8 Class Dataset

388

(4)

B.6 Files in Folder cl/distances

392

(6)

B.6.1 Makefile

392

(1)

B.6.2 Class Distance

392

(1)

B.6.3 Class EuclideanDistance

393

(1)

B.6.4 Class MahalanobisDistance

394

(1)

B.6.5 Class MinkowskiDistance

395

(1)

B.6.6 Class MixedDistance

396

(1)

B.6.7 Class SimpleMatchingDistance

397

(1)

B.7 Files in Folder cl/patterns

398

(10)

B.7.1 Makefile

398

(1)

B.7.2 Class DendrogramVisitor

399

(2)

B.7.3 Class InternalNode

401

(2)

B.7.4 Class LeafNode

403

(1)

B.7.5 Class Node

404

(1)

B.7.6 Class NodeVisitor

405

(1)

B.7.7 Class JoinValueVisitor

405

(2)

B.7.8 Class PCVisitor

407

(1)

B.8 Files in Folder cl/utilities

408

(18)

B.8.1 Makefile

408

(1)

B.8.2 Class Container

409

(2)

B.8.3 Class DataAdapter

411

(1)

B.8.4 Class DatasetGenerator

411

(2)

B.8.5 Class DatasetNormalizer

413

(2)

B.8.6 Class DatasetReader

415

(3)

B.8.7 Class Dendrogram

418

(3)

B.8.8 Class nnMap

421

(2)

B.8.9 Matrix Functions

423

(2)

B.8.10 Null Types

425

(1)

B.9 Files in Folder examples

426

(24)

B.9.1 Makefile

426

(1)

B.9.2 Agglomerative Hierarchical Algorithms

426

(3)

B.9.3 A Divisive Hierarchical Algorithm

429

(1)

B.9.4 The k-means Algorithm

430

(3)

B.9.5 The c-means Algorithm

433

(2)

B.9.6 The k-prototypes Algorithm

435

(2)

B.9.7 The Genetic k-modes Algorithm

437

(2)

B.9.8 The FSC Algorithm

439

(2)

B.9.9 The Ganssian Mixture Clustering Algorithm

441

(3)

B.9.10 A Parallel k-means Algorithm

444

(6)

B.10 Files in Folder test-suite

450

(11)

B.10.1 Makefile

450

(1)

B.10.2 The Master Test Suite

451

(1)

B.10.3 Test of AttrInfo

451

(2)

B.10.4 Test of Dataset

453

(1)

B.10.5 Test of Distance

454

(2)

B.10.6 Test of nnMap

456

(2)

B.10.7 Test of Matrices

458

(1)

B.10.8 Test of Schema

459

(2)

C Software

461

(8)

C.1 An Introduction to Makefiles

461

(2)

C.1.1 Rules

461

(1)

C.1.2 Variables

462

(1)

C.2 Installing Boost

463

(2)

C.2.1 Boost for Windows

463

(1)

C.2.2 Boost for Cygwin or Linux

464

(1)

C.3 Installing Cygwin

465

(1)

C.4 Installing GMP

465

(1)

C.5 Installing MPICH2 and Boost MPI

466

(3)

Bibliography

469

(18)

Author Index

487

(6)

Subject Index

493

Guojun Gan, Manulife Financial, Toronto, Canada

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97814398622472e.html

Märksõnad:

E-raamat: Data Clustering in C++: An Object-Oriented Approach

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv