Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Data Preprocessing in Data Mining

4.00/5 (2 hinnangut Goodreads-ist)

Salvador Garcķa, Francisco Herrera, Juliįn Luengo

Formaat: PDF+DRM
Sari: Intelligent Systems Reference Library 72
Ilmumisaeg: 30-Aug-2014
Kirjastus: Springer International Publishing AG
Keel: eng
ISBN-13: 9783319102474

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 221,68 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Sari: Intelligent Systems Reference Library 72
Ilmumisaeg: 30-Aug-2014
Kirjastus: Springer International Publishing AG
Keel: eng
ISBN-13: 9783319102474

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Data Preprocessing for Data Mining addresses one of the most important issues within the well-known Knowledge Discovery from Data process. Data directly taken from the source will likely have inconsistencies, errors or most importantly, it is not ready to be considered for a data mining process. Furthermore, the increasing amount of data in recent science, industry and business applications, calls to the requirement of more complex tools to analyze it. Thanks to data preprocessing, it is possible to convert the impossible into possible, adapting the data to fulfill the input demands of each data mining algorithm. Data preprocessing includes the data reduction techniques, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data.

This book is intended to review the tasks that fill the gap between the data acquisition from the source and the data mining process. A comprehensive look from a practical point of view, including basic concepts and surveying the techniques proposed in the specialized literature, is given.Each chapter is a stand-alone guide to a particular data preprocessing topic, from basic concepts and detailed descriptions of classical algorithms, to an incursion of an exhaustive catalog of recent developments. The in-depth technical descriptions make this book suitable for technical professionals, researchers, senior undergraduate and graduate students in data science, computer science and engineering.

Arvustused

From the book reviews:

This book is a comprehensive collection of data preprocessing techniques used in data mining. Any readers who practice data mining will find it beneficial . This book is an excellent guideline in the topic of data preprocessing for data mining. It is suitable for both practitioners and researchers who would like to use datasets in their data mining projects. (Xiannong Meng, Computing Reviews, December, 2014)

1 Introduction

(18)

1.1 Data Mining and Knowledge Discovery

(1)

1.2 Data Mining Methods

(4)

1.3 Supervised Learning

(1)

1.4 Unsupervised Learning

(1)

1.4.1 Pattern Mining

(1)

1.4.2 Outlier Detection

(1)

1.5 Other Learning Paradigms

(2)

1.5.1 unbalanced Learning

(1)

1.5.2 Multi-instance Learning

(1)

1.5.3 Multi-label Classification

(1)

1.5.4 Semi-supervised Learning

(1)

1.5.5 Subgroup Discovery

(1)

1.5.6 Transfer Learning

(1)

1.5.7 Data Stream Learning

(1)

1.6 Introduction to Data Preprocessing

(9)

1.6.1 Data Preparation

(2)

1.6.2 Data Reduction

(3)

References

(3)

2 Data Sets and Proper Statistical Analysis of Data Mining Techniques

(20)

2.1 Data Sets and Partitions

(6)

2.1.1 Data Set Partitioning

(3)

2.1.2 Performance Measures

(1)

2.2 Using Statistical Tests to Compare Methods

(14)

2.2.1 Conditions for the Safe Use of Parametric Tests

(1)

2.2.2 Normality Test over the Group of Data Sets and Algorithms

(2)

2.2.3 Non-parametric Tests for Comparing Two Algorithms in Multiple Data Set Analysis

(3)

2.2.4 Non-parametric Tests for Multiple Comparisons Among More than Two Algorithms

(5)

References

(2)

3 Data Preparation Basic Models

(20)

3.1 Overview

(1)

3.2 Data Integration

(5)

3.2.1 Finding Redundant Attributes

(2)

3.2.2 Detecting Tuple Duplication and Inconsistency

(2)

3.3 Data Cleaning

(1)

3.4 Data Normalization

(2)

3.4.1 Min-Max Normalization

(1)

3.4.2 Z-score Normalization

(1)

3.4.3 Decimal Scaling Normalization

(1)

3.5 Data Transformation

(11)

3.5.1 Linear Transformations

(1)

3.5.2 Quadratic Transformations

(1)

3.5.3 Non-polynomial Approximations of Transformations

(1)

3.5.4 Polynomial Approximations of Transformations

(1)

3.5.5 Rank Transformations

(1)

3.5.6 Box-Cox Transformations

(1)

3.5.7 Spreading the Histogram

(1)

3.5.8 Nominal to Binary Transformation

(1)

3.5.9 Transformations via Data Reduction

(1)

References

(4)

4 Dealing with Missing Values

(48)

4.1 Introduction

(2)

4.2 Assumptions and Missing Data Mechanisms

(2)

4.3 Simple Approaches to Missing Data

(1)

4.4 Maximum Likelihood Imputation Methods

(12)

4.4.1 Expectation-Maximization (EM)

(3)

4.4.2 Multiple Imputation

(4)

4.4.3 Bayesian Principal Component Analysis (BPCA)

(4)

4.5 Imputation of Missing Values. Machine Learning Based Methods

(14)

4.5.1 Imputation with K-Nearest Neighbor (KNNI)

(1)

4.5.2 Weighted Imputation with K-Nearest Neighbour (WKNNI)

(1)

4.5.3 K-means Clustering Imputation (KMI)

(1)

4.5.4 Imputation with Fuzzy K-means Clustering (FKMI)

(1)

4.5.5 Support Vector Machines Imputation (SVMI)

(3)

4.5.6 Event Covering (EC)

(4)

4.5.7 Singular Value Decomposition Imputation (SVDI)

(1)

4.5.8 Local Least Squares Imputation (LLSI)

(4)

4.5.9 Recent Machine Learning Approaches to Missing Values Imputation

(1)

4.6 Experimental Comparative Analysis

(17)

4.6.1 Effect of the Imputation Methods in the Attributes' Relationships

(7)

4.6.2 Best Imputation Methods for Classification Methods

(3)

4.6.3 Interesting Comments

100

(1)

References

101

(6)

5 Dealing with Noisy Data

107

(40)

5.1 Identifying Noise

107

(3)

5.2 Types of Noise Data: Class Noise and Attribute Noise

110

(5)

5.2.1 Noise Introduction Mechanisms

111

(3)

5.2.2 Simulating the Noise of Real-World Data Sets

114

(1)

5.3 Noise Filtering at Data Level

115

(3)

5.3.1 Ensemble Filter

116

(1)

5.3.2 Cross-Validated Committees Filter

117

(1)

5.3.3 Iterative-Partitioning Filter

117

(1)

5.3.4 More Filtering Methods

118

(1)

5.4 Robust Learners Against Noise

118

(7)

5.4.1 Multiple Classifier Systems for Classification Tasks

120

(3)

5.4.2 Addressing Multi-class Classification Problems by Decomposition

123

(2)

5.5 Empirical Analysis of Noise Filters and Robust Strategies

125

(22)

5.5.1 Noise Introduction

125

(2)

5.5.2 Noise Filters for Class Noise

127

(2)

5.5.3 Noise Filtering Efficacy Prediction by Data Complexity Measures

129

(4)

5.5.4 Multiple Classifier Systems with Noise

133

(3)

5.5.5 Analysis of the OVO Decomposition with Noise

136

(4)

References

140

(7)

6 Data Reduction

147

(16)

6.1 Overview

147

(1)

6.2 The Curse of Dimensionality

148

(8)

6.2.1 Principal Components Analysis

149

(2)

6.2.2 Factor Analysis

151

(1)

6.2.3 Multidimensional Scaling

152

(3)

6.2.4 Locally Linear Embedding

155

(1)

6.3 Data Sampling

156

(5)

6.3.1 Data Condensation

158

(1)

6.3.2 Data Squashing

159

(1)

6.3.3 Data Clustering

159

(2)

6.4 Binning and Reduction of Cardinality

161

(2)

References

162

(1)

7 Feature Selection

163

(32)

7.1 Overview

163

(1)

7.2 Perspectives

164

(12)

7.2.1 The Search of a Subset of Features

164

(4)

7.2.2 Selection Criteria

168

(5)

7.2.3 Filter, Wrapper and Embedded Feature Selection

173

(3)

7.3 Aspects

176

(4)

7.3.1 Output of Feature Selection

176

(1)

7.3.2 Evaluation

177

(2)

7.3.3 Drawbacks

179

(1)

7.3.4 Using Decision Trees for Feature Selection

179

(1)

7.4 Description of the Most Representative Feature Selection Methods

180

(5)

7.4.1 Exhaustive Methods

181

(1)

7.4.2 Heuristic Methods

182

(1)

7.4.3 Nondeterministic Methods

182

(2)

7.4.4 Feature Weighting Methods

184

(1)

7.5 Related and Advanced Topics

185

(5)

7.5.1 Leading and Recent Feature Selection Techniques

186

(2)

7.5.2 Feature Extraction

188

(1)

7.5.3 Feature Construction

189

(1)

7.6 Experimental Comparative Analyses in Feature Selection

190

(5)

References

191

(4)

8 Instance Selection

195

(50)

8.1 Introduction

195

(2)

8.2 Training Set Selection Versus Prototype Selection

197

(2)

8.3 Prototype Selection Taxonomy

199

(7)

8.3.1 Common Properties in Prototype Selection Methods

199

(3)

8.3.2 Prototype Selection Methods

202

(1)

8.3.3 Taxonomy of Prototype Selection Methods

202

(4)

8.4 Description of Methods

206

(15)

8.4.1 Condensation Algorithms

206

(4)

8.4.2 Edition Algorithms

210

(2)

8.4.3 Hybrid Algorithms

212

(9)

8.5 Related and Advanced Topics

221

(3)

8.5.1 Prototype Generation

221

(1)

8.5.2 Distance Metrics, Feature Weighting and Combinations with Feature Selection

221

(1)

8.5.3 Hybridizations with Other Learning Methods and Ensembles

222

(1)

8.5.4 Scaling-Up Approaches

223

(1)

8.5.5 Data Complexity

223

(1)

8.6 Experimental Comparative Analysis in Prototype Selection

224

(21)

8.6.1 Analysis and Empirical Results on Small Size Data Sets

225

(5)

8.6.2 Analysis and Empirical Results on Medium Size Data Sets

230

(1)

8.6.3 Global View of the Obtained Results

231

(2)

8.6.4 Visualization of Data Subsets: A Case Study Based on the Banana Data Set

233

(3)

References

236

(9)

9 Discretization

245

(40)

9.1 Introduction

245

(2)

9.2 Perspectives and Background

247

(4)

9.2.1 Discretization Process

247

(3)

9.2.2 Related and Advanced Work

250

(1)

9.3 Properties and Taxonomy

251

(14)

9.3.1 Common Properties

251

(4)

9.3.2 Methods and Taxonomy

255

(4)

9.3.3 Description of the Most Representative Discretization Methods

259

(6)

9.4 Experimental Comparative Analysis

265

(20)

9.4.1 Experimental Set up

265

(3)

9.4.2 Analysis and Empirical Results

268

(10)

References

278

(7)

10 A Data Mining Software Package Including Data Preparation and Reduction: KEEL

285

(30)

10.1 Data Mining Softwares and Toolboxes

285

(2)

10.2 KEEL: Knowledge Extraction Based on Evolutionary Learning

287

(7)

10.2.1 Main Features

288

(1)

10.2.2 Data Management

289

(2)

10.2.3 Design of Experiments: Off-Line Module

291

(2)

10.2.4 Computer-Based Education: On-Line Module

293

(1)

10.3 KEEL-Dataset

294

(4)

10.3.1 Data Sets Web Pages

294

(3)

10.3.2 Experimental Study Web Pages

297

(1)

10.4 Integration of New Algorithms into the KEEL Tool

298

(5)

10.4.1 Introduction to the KEEL Codification Features

298

(5)

10.5 KEEL Statistical Tests

303

(7)

10.5.1 Case Study

304

(6)

10.6 Summarizing Comments

310

(5)

References

311

(4)

Index

315

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97833191024742e.html

Märksõnad:

E-raamat: Data Preprocessing in Data Mining

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv