Tasuta saatmine! | Klienditugi: 7440010 (E-R 10-18)

E-raamat: Data Mining: Concepts and Techniques

3.89/5 (417 hinnangut Goodreads-ist)

Jiawei Han (Professor, Department of Computer ScienceUniversity of Illinois, Urbana Champaign, USA), Jian Pei (Simon Fraser University, Burnaby, Canada), Hanghang Tong (Associate Professor, Department of Computer Science, University of Illinois at Urbana-Champaign, Urb)

Formaat: EPUB+DRM
Sari: The Morgan Kaufmann Series in Data Management Systems
Ilmumisaeg: 02-Jul-2022
Kirjastus: Morgan Kaufmann Publishers In
Keel: eng
ISBN-13: 9780128117613

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 76,43 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: EPUB+DRM
Sari: The Morgan Kaufmann Series in Data Management Systems
Ilmumisaeg: 02-Jul-2022
Kirjastus: Morgan Kaufmann Publishers In
Keel: eng
ISBN-13: 9780128117613

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Data Mining: Concepts and Techniques, Fourth Edition provides the theories and methods for processing data or information used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from collected data, known as KDD. The book focuses on the feasibility, usefulness, effectiveness and scalability of techniques of large datasets. After describing data mining, the authors explain the methods of knowing, preprocessing, processing and warehousing data. They then present information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described.

The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. Users from computer science students, application developers, business professionals, and researchers who seek information on data mining will find this resource very helpful.

Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects
Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields
Provides a comprehensive, practical look at the concepts and techniques needed to get the most out of your data

Foreword

xvii

Foreword to second edition

xix

Preface

xxi

Acknowledgments

xxvii

About the authors

xxix

Chapter 1 Introduction

(22)

1.1 What is data mining?

(1)

1.2 Data mining: an essential step in knowledge discovery

(2)

1.3 Diversity of data types for data mining

(1)

1.4 Mining various kinds of knowledge

(7)

1.4.1 Multidimensional data summarization

(1)

1.4.2 Mining frequent patterns, associations, and correlations

(1)

1.4.3 Classification and regression for predictive analysis

(2)

1.4.4 Cluster analysis

(1)

1.4.5 Deep learning

(1)

1.4.6 Outlier analysis

(1)

1.4.7 Are all mining results interesting?

(2)

1.5 Data mining: confluence of multiple disciplines

(5)

1.5.1 Statistics and data mining

(1)

1.5.2 Machine learning and data mining

(2)

1.5.3 Database technology and data mining

(1)

1.5.4 Data mining and data science

(1)

1.5.5 Data mining and other disciplines

(1)

1.6 Data mining and applications

(2)

1.7 Data mining and society

(1)

1.8 Summary

(1)

1.9 Exercises

(1)

1.10 Bibliographic notes

(2)

Chapter 2 Data, measurements, and data preprocessing

(62)

2.1 Datatypes

(3)

2.1.1 Nominal attributes

(1)

2.1.2 Binary attributes

(1)

2.1.3 Ordinal attributes

(1)

2.1.4 Numeric attributes

(1)

2.1.5 Discrete vs. continuous attributes

(1)

2.2 Statistics of data

(16)

2.2.1 Measuring the central tendency

(3)

2.2.2 Measuring the dispersion of data

(3)

2.2.3 Covariance and correlation analysis

(4)

2.2.4 Graphic displays of basic statistics of data

(5)

2.3 Similarity and distance measures

(12)

2.3.1 Data matrix vs. dissimilarity matrix

(1)

2.3.2 Proximity measures for nominal attributes

(2)

2.3.3 Proximity measures for binary attributes

(2)

2.3.4 Dissimilarity of numeric data: Minkowski distance

(1)

2.3.5 Proximity measures for ordinal attributes

(1)

2.3.6 Dissimilarity for attributes of mixed types

(2)

2.3.7 Cosine similarity

(1)

2.3.8 Measuring similar distributions: the Kullback-Leibler divergence

(2)

2.3.9 Capturing hidden semantics in similarity measures

(1)

2.4 Data quality, data cleaning, and data integration

(8)

2.4.1 Data quality measures

(1)

2.4.2 Data cleaning

(6)

2.4.3 Data integration

(1)

2.5 Data transformation

(8)

2.5.1 Normalization

(1)

2.5.2 Discretization

(3)

2.5.3 Data compression

(2)

2.5.4 Sampling

(1)

2.6 Dimensionality reduction

(8)

2.6.1 Principal components analysis

(1)

2.6.2 Attribute subset selection

(2)

2.6.3 Nonlinear dimensionality reduction methods

(5)

2.7 Summary

(1)

2.8 Exercises

(3)

2.9 Bibliographic notes

(2)

Chapter 3 Data warehousing and online analytical processing

(60)

3.1 Data warehouse

(11)

3.1.1 Data warehouse: what and why?

(3)

3.1.2 Architecture of data warehouses: enterprise data warehouses and data marts

(5)

3.1.3 Data lakes

(3)

3.2 Data warehouse modeling: schema and measures

(10)

3.2.1 Data cube: a multidimensional data model

(2)

3.2.2 Schemas for multidimensional data models: stars, snowflakes, and fact constellations

(4)

3.2.3 Concept hierarchies

103

(2)

3.2.4 Measures: categorization and computation

105

(1)

3.3 OLAP operations

106

(7)

3.3.1 Typical OLAP operations

106

(2)

3.3.2 Indexing OLAP data: bitmap index and join index

108

(3)

3.3.3 Storage implementation: column-based databases

111

(2)

3.4 Data cube computation

113

(7)

3.4.1 Terminology of data cube computation

113

(2)

3.4.2 Data cube materialization: ideas

115

(2)

3.4.3 OLAP server architectures: ROLAP vs. MOLAP vs. HOLAP

117

(2)

3.4.4 General strategies for data cube computation

119

(1)

3.5 Data cube computation methods

120

(13)

3.5.1 Multiway array aggregation for full cube computation

121

(4)

3.5.2 BUC: computing iceberg cubes from the apex cuboid downward

125

(4)

3.5.3 Precomputing shell fragments for fast high-dimensional OLAP

129

(3)

3.5.4 Efficient processing of OLAP queries using cuboids

132

(1)

3.6 Summary

133

(2)

3.7 Exercises

135

(7)

3.8 Bibliographic notes

142

(3)

Chapter 4 Pattern mining: basic concepts and methods

145

(30)

4.1 Basic concepts

145

(4)

4.1.1 Market basket analysis: a motivating example

145

(2)

4.1.2 Frequent itemsets, closed itemsets, and association rules

147

(2)

4.2 Frequent itemset mining methods

149

(14)

4.2.1 Apriori algorithm: finding frequent itemsets by confined candidate generation

150

(3)

4.2.2 Generating association rules from frequent itemsets

153

(2)

4.2.3 Improving the efficiency of Apriori

155

(2)

4.2.4 A pattern-growth approach for mining frequent itemsets

157

(3)

4.2.5 Mining frequent itemsets using the vertical data format

160

(2)

4.2.6 Mining closed and max patterns

162

(1)

4.3 Which patterns are interesting?--Pattern evaluation methods

163

(6)

4.3.1 Strong rules are not necessarily interesting

163

(1)

4.3.2 From association analysis to correlation analysis

164

(1)

4.3.3 A comparison of pattern evaluation measures

165

(4)

4.4 Summary

169

(1)

4.5 Exercises

170

(3)

4.6 Bibliographic notes

173

(2)

Chapter 5 Pattern mining: advanced methods

175

(64)

5.1 Mining various kinds of patterns

175

(12)

5.1.1 Mining multilevel associations

175

(4)

5.1.2 Mining multidimensional associations

179

(1)

5.1.3 Mining quantitative association rules

180

(3)

5.1.4 Mining high-dimensional data

183

(2)

5.1.5 Mining rare patterns and negative patterns

185

(2)

5.2 Mining compressed or approximate patterns

187

(4)

5.2.1 Mining compressed patterns by pattern clustering

187

(2)

5.2.2 Extracting redundancy-aware top-fc patterns

189

(2)

5.3 Constraint-based pattern mining

191

(7)

5.3.1 Pruning pattern space with pattern pruning constraints

193

(3)

5.3.2 Pruning data space with data pruning constraints

196

(1)

5.3.3 Mining space pruning with succinctness constraints

197

(1)

5.4 Mining sequential patterns

198

(13)

5.4.1 Sequential pattern mining: concepts and primitives

198

(2)

5.4.2 Scalable methods for mining sequential patterns

200

(10)

5.4.3 Constraint-based mining of sequential patterns

210

(1)

5.5 Mining subgraph patterns

211

(12)

5.5.1 Methods for mining frequent subgraphs

212

(7)

5.5.2 Mining variant and constrained substructure patterns

219

(4)

5.6 Pattern mining: application examples

223

(9)

5.6.1 Phrase mining in massive text data

223

(7)

5.6.2 Mining copy and paste bugs in software programs

230

(2)

5.7 Summary

232

(1)

5.8 Exercises

233

(2)

5.9 Bibliographic notes

235

(4)

Chapter 6 Classification: basic concepts and methods

239

(68)

6.1 Basic concepts

239

(4)

6.1.1 What is classification?

239

(1)

6.1.2 General approach to classification

240

(3)

6.2 Decision tree induction

243

(16)

6.2.1 Decision tree induction

244

(4)

6.2.2 Attribute selection measures

248

(9)

6.2.3 Tree pruning

257

(2)

6.3 Bayes classification methods

259

(7)

6.3.1 Bayes' theorem

260

(2)

6.3.2 Naive Bayesian classification

262

(4)

6.4 Lazy learners (or learning from your neighbors)

266

(3)

6.4.1 FC-nearest-neighbor classifiers

266

(3)

6.4.2 Case-based reasoning

269

(1)

6.5 Linear classifiers

269

(9)

6.5.1 Linear regression

270

(2)

6.5.2 Perceptron: turning linear regression to classification

272

(2)

6.5.3 Logistic regression

274

(4)

6.6 Model evaluation and selection

278

(12)

6.6.1 Metrics for evaluating classifier performance

278

(5)

6.6.2 Holdout method and random subsampling

283

(1)

6.6.3 Cross-validation

283

(1)

6.6.4 Bootstrap

284

(1)

6.6.5 Model selection using statistical tests of significance

285

(1)

6.6.6 Comparing classifiers based on cost-benefit and ROC curves

286

(4)

6.7 Techniques to improve classification accuracy

290

(8)

6.7.1 Introducing ensemble methods

290

(1)

6.7.2 Bagging

291

(1)

6.7.3 Boosting

292

(4)

6.7.4 Random forests

296

(1)

6.7.5 Improving classification accuracy of class-imbalanced data

297

(1)

6.8 Summary

298

(1)

6.9 Exercises

299

(3)

6.10 Bibliographic notes

302

(5)

Chapter 7 Classification: advanced methods

307

(72)

7.1 Feature selection and engineering

307

(8)

7.1.1 Filter methods

308

(3)

7.1.2 Wrapper methods

311

(1)

7.1.3 Embedded methods

312

(3)

7.2 Bayesian belief networks

315

(3)

7.2.1 Concepts and mechanisms

315

(2)

7.2.2 Training Bayesian belief networks

317

(1)

7.3 Support vector machines

318

(9)

7.3.1 Linear support vector machines

319

(5)

7.3.2 Nonlinear support vector machines

324

(3)

7.4 Rule-based and pattern-based classification

327

(15)

7.4.1 Using IF-THEN rules for classification

328

(2)

7.4.2 Rule extraction from a decision tree

330

(1)

7.4.3 Rule induction using a sequential covering algorithm

331

(4)

7.4.4 Associative classification

335

(3)

7.4.5 Discriminative frequent pattern-based classification

338

(4)

7.5 Classification with weak supervision

342

(9)

7.5.1 Semisupervised classification

343

(2)

7.5.2 Active learning

345

(1)

7.5.3 Transfer learning

346

(2)

7.5.4 Distant supervision

348

(1)

7.5.5 Zero-shot learning

349

(2)

7.6 Classification with rich data type

351

(8)

7.6.1 Stream data classification

352

(2)

7.6.2 Sequence classification

354

(1)

7.6.3 Graph data classification

355

(4)

7.7 Potpourri: other related techniques

359

(10)

7.7.1 Multiclass classification

359

(3)

7.7.2 Distance metric learning

362

(2)

7.7.3 Interpretability of classification

364

(3)

7.7.4 Genetic algorithms

367

(1)

7.7.5 Reinforcement learning

367

(2)

7.8 Summary

369

(1)

7.9 Exercises

370

(4)

7.10 Bibliographic notes

374

(5)

Chapter 8 Cluster analysis: basic concepts and methods

379

(52)

8.1 Cluster analysis

379

(6)

8.1.1 What is cluster analysis?

380

(1)

8.1.2 Requirements for cluster analysis

381

(2)

8.1.3 Overview of basic clustering methods

383

(2)

8.2 Partitioning methods

385

(9)

8.2.1 &-Means: a centroid-based technique

386

(2)

8.2.2 Variations of /c-means

388

(6)

8.3 Hierarchical methods

394

(13)

8.3.1 Basic concepts of hierarchical clustering

394

(3)

8.3.2 Agglomerative hierarchical clustering

397

(3)

8.3.3 Divisive hierarchical clustering

400

(2)

8.3.4 BIRCH: scalable hierarchical clustering using clustering feature trees

402

(2)

8.3.5 Probabilistic hierarchical clustering

404

(3)

8.4 Density-based and grid-based methods

407

(10)

8.4.1 DBSCAN: density-based clustering based on connected regions with high density

408

(3)

8.4.2 DENCLUE: clustering based on density distribution functions

411

(3)

8.4.3 Grid-based methods

414

(3)

8.5 Evaluation of clustering

417

(8)

8.5.1 Assessing clustering tendency

417

(2)

8.5.2 Determining the number of clusters

419

(1)

8.5.3 Measuring clustering quality: extrinsic methods

420

(4)

8.5.4 Intrinsic methods

424

(1)

8.6 Summary

425

(2)

8.7 Exercises

427

(2)

8.8 Bibliographic notes

429

(2)

Chapter 9 Cluster analysis: advanced methods

431

(54)

9.1 Probabilistic model-based clustering

431

(10)

9.1.1 Fuzzy clusters

433

(2)

9.1.2 Probabilistic model-based clusters

435

(3)

9.1.3 Expectation-maximization algorithm

438

(3)

9.2 Clustering high-dimensional data

441

(6)

9.2.1 Why is clustering high-dimensional data challenging?

441

(4)

9.2.2 Axis-parallel subspace approaches

445

(2)

9.2.3 Arbitrarily oriented subspace approaches

447

(1)

9.3 Biclustering

447

(7)

9.3.1 Why and where is biclustering useful?

448

(2)

9.3.2 Types of hucksters

450

(2)

9.3.3 Biclustering methods

452

(1)

9.3.4 Enumerating all biclusters using MaPle

453

(1)

9.4 Dimensionality reduction for clustering

454

(9)

9.4.1 Linear dimensionality reduction methods for clustering

455

(3)

9.4.2 Nonnegative matrix factorization (NMF)

458

(2)

9.4.3 Spectral clustering

460

(3)

9.5 Clustering graph and network data

463

(12)

9.5.1 Applications and challenges

463

(2)

9.5.2 Similarity measures

465

(5)

9.5.3 Graph clustering methods

470

(5)

9.6 Semisupervised clustering

475

(4)

9.6.1 Semisupervised clustering on partially labeled data

475

(1)

9.6.2 Semisupervised clustering on pairwise constraints

476

(1)

9.6.3 Other types of background knowledge for semisupervised clustering

477

(2)

9.7 Summary

479

(1)

9.8 Exercises

480

(2)

9.9 Bibliographic notes

482

(3)

Chapter 10 Deep learning

485

(72)

10.1 Basic concepts

485

(15)

10.1.1 What is deep learning?

485

(4)

10.1.2 Backpropagation algorithm

489

(9)

10.1.3 Key challenges for training deep learning models

498

(1)

10.1.4 Overview of deep learning architecture

499

(1)

10.2 Improve training of deep learning models

500

(17)

10.2.1 Responsive activation functions

500

(1)

10.2.2 Adaptive learning rate

501

(3)

10.2.3 Dropout

504

(3)

10.2.4 Pretraining

507

(2)

10.2.5 Cross-entropy

509

(2)

10.2.6 Autoencoder: unsupervised deep learning

511

(3)

10.2.7 Other techniques

514

(3)

10.3 Convolutional neural networks

517

(9)

10.3.1 Introducing convolution operation

517

(2)

10.3.2 Multidimensional convolution

519

(4)

10.3.3 Convolutional layer

523

(3)

10.4 Recurrent neural networks

526

(13)

10.4.1 Basic RNN models and applications

526

(6)

10.4.2 Gated RNNs

532

(4)

10.4.3 Other techniques for addressing long-term dependence

536

(3)

10.5 Graph neural networks

539

(8)

10.5.1 Basic concepts

540

(1)

10.5.2 Graph convolutional networks

541

(4)

10.5.3 Other types of GNNs

545

(2)

10.6 Summary

547

(1)

10.7 Exercises

548

(4)

10.8 Bibliographic notes

552

(5)

Chapter 11 Outlier detection

557

(48)

11.1 Basic concepts

557

(8)

11.1.1 What are outliers?

558

(1)

11.1.2 Types of outliers

559

(2)

11.1.3 Challenges of outlier detection

561

(1)

11.1.4 An overview of outlier detection methods

562

(3)

11.2 Statistical approaches

565

(7)

11.2.1 Parametric methods

565

(4)

11.2.2 Nonparametric methods

569

(3)

11.3 Proximity-based approaches

572

(4)

11.3.1 Distance-based outlier detection

572

(1)

11.3.2 Density-based outlier detection

573

(3)

11.4 Reconstruction-based approaches

576

(9)

11.4.1 Matrix factorization-based methods for numerical data

577

(5)

11.4.2 Pattern-based compression methods for categorical data

582

(3)

11.5 Clustering- vs. classification-based approaches

585

(5)

11.5.1 Clustering-based approaches

585

(3)

11.5.2 Classification-based approaches

588

(2)

11.6 Mining contextual and collective outliers

590

(3)

11.6.1 Transforming contextual outlier detection to conventional outlier detection

591

(1)

11.6.2 Modeling normal behavior with respect to contexts

591

(1)

11.6.3 Mining collective outliers

592

(1)

11.7 Outlier detection in high-dimensional data

593

(7)

11.7.1 Extending conventional outlier detection

594

(1)

11.7.2 Finding outliers in subspaces

595

(1)

11.7.3 Outlier detection ensemble

596

(1)

11.7.4 Taming high dimensionality by deep learning

597

(2)

11.7.5 Modeling high-dimensional outliers

599

(1)

11.8 Summary

600

(1)

11.9 Exercises

601

(1)

11.10 Bibliographic notes

602

(3)

Chapter 12 Data mining trends and research frontiers

605

(50)

12.1 Mining rich data types

605

(12)

12.1.1 Mining text data

605

(5)

12.1.2 Spatial-temporal data

610

(2)

12.1.3 Graph and networks

612

(5)

12.2 Data mining applications

617

(12)

12.2.1 Data mining for sentiment and opinion

617

(3)

12.2.2 Truth discovery and misinformation identification

620

(3)

12.2.3 Information and disease propagation

623

(3)

12.2.4 Productivity and team science

626

(3)

12.3 Data mining methodologies and systems

629

(13)

12.3.1 Structuring unstructured data for knowledge mining: a data-driven approach

629

(3)

12.3.2 Data augmentation

632

(3)

12.3.3 From correlation to causality

635

(2)

12.3.4 Network as a context

637

(3)

12.3.5 Auto-ML: methods and systems

640

(2)

12.4 Data mining, people, and society `

642

(13)

12.4.1 Privacy-preserving data mining

642

(4)

12.4.2 Human-algorithm interaction

646

(2)

12.4.3 Mining beyond maximizing accuracy: fairness, interpretability, and robustness

648

(4)

12.4.4 Data mining for social good

652

(3)

APPENDIX A Mathematical background

655

(26)

A.1 Probability and statistics

655

(6)

A.1.1 PDF of typical distributions

655

(1)

A.1.2 MLE and MAP

656

(1)

A.1.3 Significance test

657

(1)

A.1.4 Density estimation

658

(1)

A.1.5 Bias-variance tradeoff

659

(1)

A.1.6 Cross-validation and Jackknife

660

(1)

A.2 Numerical optimization

661

(7)

A.2.1 Gradient descent

661

(1)

A.2.2 Variants of gradient descent

662

(2)

A.2.3 Newton's method

664

(2)

A.2.4 Coordinate descent

666

(1)

A.2.5 Quadratic programming

666

(2)

A.3 Matrix and linear algebra

668

(5)

A.3.1 Linear system Ax = b

668

(1)

A.3.2 Norms of vectors and matrices

669

(1)

A.3.3 Matrix decompositions

669

(2)

A.3.4 Subspace

671

(1)

A.3.5 Orthogonality

672

(1)

A.4 Concepts and tools from signal processing

673

(5)

A.4.1 Entropy

673

(1)

A.4.2 Kullback-Leibler divergence (KL-divergence)

674

(1)

A.4.3 Mutual information

675

(1)

A.4.4 Discrete Fourier transform (DFT) and fast Fourier transform (FFT)

676

(2)

A.5 Bibliographic notes

678

(3)

Bibliography

681

(54)

Index

735

Jiawei Han is Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Well known for his research in the areas of data mining and database systems, he has received many awards for his contributions in the field, including the 2004 ACM SIGKDD Innovations Award. He has served as Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data, and on editorial boards of several journals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining and Knowledge Discovery. Jian Pei is currently a Canada Research Chair (Tier 1) in Big Data Science and a Professor in the School of Computing Science at Simon Fraser University. He is also an associate member of the Department of Statistics and Actuarial Science. He is a well-known leading researcher in the general areas of data science, big data, data mining, and database systems. His expertise is on developing effective and efficient data analysis techniques for novel data intensive applications. He is recognized as a Fellow of the Association of Computing Machinery (ACM) for his contributions to the foundation, methodology and applications of data mining” and as a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) for his contributions to data mining and knowledge discovery”. He is the editor-in-chief of the IEEE Transactions of Knowledge and Data Engineering (TKDE), a director of the Special Interest Group on Knowledge Discovery in Data (SIGKDD) of the Association for Computing Machinery (ACM), and a general co-chair or program committee co-chair of many premier conferences. Hanghang Tong Ph.D. is currently an associate professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Before that he was an associate professor at the School of Computing, Informatics, and Decision Systems Engineering (CIDSE), Arizona State University. He received his M.Sc. and Ph.D. degrees from Carnegie Mellon University in 2008 and 2009, both in Machine Learning. His research interest is in large scale data mining for graphs and multimedia. He has received several awards, including SDM/IBM Early Career Data Mining Research award (2018), NSF CAREER award (2017), ICDM 10-Year Highest Impact Paper award (2015), four best paper awards (TUP'14, CIKM'12, SDM'08, ICDM'06), seven 'bests of conference', 1 best demo, honorable mention (SIGMOD'17), and 1 best demo candidate, second place (CIKM'17). He has published over 100 refereed articles. He is the Editor-in-Chief of SIGKDD Explorations (ACM), an action editor of Data Mining and Knowledge Discovery (Springer), and an associate editor of Knowledge and Information Systems (Springer) and Neurocomputing Journal (Elsevier); and has served as a program committee member in multiple data mining, database and artificial intelligence venues (e.g., SIGKDD, SIGMOD, AAAI, WWW, CIKM, etc.).

Lisainfo e-raamatute kohta