Muutke küpsiste eelistusi

E-raamat: Data Preprocessing in Data Mining

  • Formaat - PDF+DRM
  • Hind: 221,68 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Data Preprocessing for Data Mining addresses one of the most important issues within the well-known Knowledge Discovery from Data process. Data directly taken from the source will likely have inconsistencies, errors or most importantly, it is not ready to be considered for a data mining process. Furthermore, the increasing amount of data in recent science, industry and business applications, calls to the requirement of more complex tools to analyze it. Thanks to data preprocessing, it is possible to convert the impossible into possible, adapting the data to fulfill the input demands of each data mining algorithm. Data preprocessing includes the data reduction techniques, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data.

This book is intended to review the tasks that fill the gap between the data acquisition from the source and the data mining process. A comprehensive look from a practical point of view, including basic concepts and surveying the techniques proposed in the specialized literature, is given.Each chapter is a stand-alone guide to a particular data preprocessing topic, from basic concepts and detailed descriptions of classical algorithms, to an incursion of an exhaustive catalog of recent developments. The in-depth technical descriptions make this book suitable for technical professionals, researchers, senior undergraduate and graduate students in data science, computer science and engineering.

Arvustused

From the book reviews:

This book is a comprehensive collection of data preprocessing techniques used in data mining. Any readers who practice data mining will find it beneficial . This book is an excellent guideline in the topic of data preprocessing for data mining. It is suitable for both practitioners and researchers who would like to use datasets in their data mining projects. (Xiannong Meng, Computing Reviews, December, 2014)

1 Introduction
1(18)
1.1 Data Mining and Knowledge Discovery
1(1)
1.2 Data Mining Methods
2(4)
1.3 Supervised Learning
6(1)
1.4 Unsupervised Learning
7(1)
1.4.1 Pattern Mining
8(1)
1.4.2 Outlier Detection
8(1)
1.5 Other Learning Paradigms
8(2)
1.5.1 unbalanced Learning
8(1)
1.5.2 Multi-instance Learning
9(1)
1.5.3 Multi-label Classification
9(1)
1.5.4 Semi-supervised Learning
9(1)
1.5.5 Subgroup Discovery
9(1)
1.5.6 Transfer Learning
10(1)
1.5.7 Data Stream Learning
10(1)
1.6 Introduction to Data Preprocessing
10(9)
1.6.1 Data Preparation
11(2)
1.6.2 Data Reduction
13(3)
References
16(3)
2 Data Sets and Proper Statistical Analysis of Data Mining Techniques
19(20)
2.1 Data Sets and Partitions
19(6)
2.1.1 Data Set Partitioning
21(3)
2.1.2 Performance Measures
24(1)
2.2 Using Statistical Tests to Compare Methods
25(14)
2.2.1 Conditions for the Safe Use of Parametric Tests
26(1)
2.2.2 Normality Test over the Group of Data Sets and Algorithms
27(2)
2.2.3 Non-parametric Tests for Comparing Two Algorithms in Multiple Data Set Analysis
29(3)
2.2.4 Non-parametric Tests for Multiple Comparisons Among More than Two Algorithms
32(5)
References
37(2)
3 Data Preparation Basic Models
39(20)
3.1 Overview
39(1)
3.2 Data Integration
40(5)
3.2.1 Finding Redundant Attributes
41(2)
3.2.2 Detecting Tuple Duplication and Inconsistency
43(2)
3.3 Data Cleaning
45(1)
3.4 Data Normalization
46(2)
3.4.1 Min-Max Normalization
46(1)
3.4.2 Z-score Normalization
47(1)
3.4.3 Decimal Scaling Normalization
48(1)
3.5 Data Transformation
48(11)
3.5.1 Linear Transformations
49(1)
3.5.2 Quadratic Transformations
49(1)
3.5.3 Non-polynomial Approximations of Transformations
50(1)
3.5.4 Polynomial Approximations of Transformations
51(1)
3.5.5 Rank Transformations
52(1)
3.5.6 Box-Cox Transformations
53(1)
3.5.7 Spreading the Histogram
54(1)
3.5.8 Nominal to Binary Transformation
54(1)
3.5.9 Transformations via Data Reduction
55(1)
References
55(4)
4 Dealing with Missing Values
59(48)
4.1 Introduction
59(2)
4.2 Assumptions and Missing Data Mechanisms
61(2)
4.3 Simple Approaches to Missing Data
63(1)
4.4 Maximum Likelihood Imputation Methods
64(12)
4.4.1 Expectation-Maximization (EM)
65(3)
4.4.2 Multiple Imputation
68(4)
4.4.3 Bayesian Principal Component Analysis (BPCA)
72(4)
4.5 Imputation of Missing Values. Machine Learning Based Methods
76(14)
4.5.1 Imputation with K-Nearest Neighbor (KNNI)
76(1)
4.5.2 Weighted Imputation with K-Nearest Neighbour (WKNNI)
77(1)
4.5.3 K-means Clustering Imputation (KMI)
78(1)
4.5.4 Imputation with Fuzzy K-means Clustering (FKMI)
78(1)
4.5.5 Support Vector Machines Imputation (SVMI)
79(3)
4.5.6 Event Covering (EC)
82(4)
4.5.7 Singular Value Decomposition Imputation (SVDI)
86(1)
4.5.8 Local Least Squares Imputation (LLSI)
86(4)
4.5.9 Recent Machine Learning Approaches to Missing Values Imputation
90(1)
4.6 Experimental Comparative Analysis
90(17)
4.6.1 Effect of the Imputation Methods in the Attributes' Relationships
90(7)
4.6.2 Best Imputation Methods for Classification Methods
97(3)
4.6.3 Interesting Comments
100(1)
References
101(6)
5 Dealing with Noisy Data
107(40)
5.1 Identifying Noise
107(3)
5.2 Types of Noise Data: Class Noise and Attribute Noise
110(5)
5.2.1 Noise Introduction Mechanisms
111(3)
5.2.2 Simulating the Noise of Real-World Data Sets
114(1)
5.3 Noise Filtering at Data Level
115(3)
5.3.1 Ensemble Filter
116(1)
5.3.2 Cross-Validated Committees Filter
117(1)
5.3.3 Iterative-Partitioning Filter
117(1)
5.3.4 More Filtering Methods
118(1)
5.4 Robust Learners Against Noise
118(7)
5.4.1 Multiple Classifier Systems for Classification Tasks
120(3)
5.4.2 Addressing Multi-class Classification Problems by Decomposition
123(2)
5.5 Empirical Analysis of Noise Filters and Robust Strategies
125(22)
5.5.1 Noise Introduction
125(2)
5.5.2 Noise Filters for Class Noise
127(2)
5.5.3 Noise Filtering Efficacy Prediction by Data Complexity Measures
129(4)
5.5.4 Multiple Classifier Systems with Noise
133(3)
5.5.5 Analysis of the OVO Decomposition with Noise
136(4)
References
140(7)
6 Data Reduction
147(16)
6.1 Overview
147(1)
6.2 The Curse of Dimensionality
148(8)
6.2.1 Principal Components Analysis
149(2)
6.2.2 Factor Analysis
151(1)
6.2.3 Multidimensional Scaling
152(3)
6.2.4 Locally Linear Embedding
155(1)
6.3 Data Sampling
156(5)
6.3.1 Data Condensation
158(1)
6.3.2 Data Squashing
159(1)
6.3.3 Data Clustering
159(2)
6.4 Binning and Reduction of Cardinality
161(2)
References
162(1)
7 Feature Selection
163(32)
7.1 Overview
163(1)
7.2 Perspectives
164(12)
7.2.1 The Search of a Subset of Features
164(4)
7.2.2 Selection Criteria
168(5)
7.2.3 Filter, Wrapper and Embedded Feature Selection
173(3)
7.3 Aspects
176(4)
7.3.1 Output of Feature Selection
176(1)
7.3.2 Evaluation
177(2)
7.3.3 Drawbacks
179(1)
7.3.4 Using Decision Trees for Feature Selection
179(1)
7.4 Description of the Most Representative Feature Selection Methods
180(5)
7.4.1 Exhaustive Methods
181(1)
7.4.2 Heuristic Methods
182(1)
7.4.3 Nondeterministic Methods
182(2)
7.4.4 Feature Weighting Methods
184(1)
7.5 Related and Advanced Topics
185(5)
7.5.1 Leading and Recent Feature Selection Techniques
186(2)
7.5.2 Feature Extraction
188(1)
7.5.3 Feature Construction
189(1)
7.6 Experimental Comparative Analyses in Feature Selection
190(5)
References
191(4)
8 Instance Selection
195(50)
8.1 Introduction
195(2)
8.2 Training Set Selection Versus Prototype Selection
197(2)
8.3 Prototype Selection Taxonomy
199(7)
8.3.1 Common Properties in Prototype Selection Methods
199(3)
8.3.2 Prototype Selection Methods
202(1)
8.3.3 Taxonomy of Prototype Selection Methods
202(4)
8.4 Description of Methods
206(15)
8.4.1 Condensation Algorithms
206(4)
8.4.2 Edition Algorithms
210(2)
8.4.3 Hybrid Algorithms
212(9)
8.5 Related and Advanced Topics
221(3)
8.5.1 Prototype Generation
221(1)
8.5.2 Distance Metrics, Feature Weighting and Combinations with Feature Selection
221(1)
8.5.3 Hybridizations with Other Learning Methods and Ensembles
222(1)
8.5.4 Scaling-Up Approaches
223(1)
8.5.5 Data Complexity
223(1)
8.6 Experimental Comparative Analysis in Prototype Selection
224(21)
8.6.1 Analysis and Empirical Results on Small Size Data Sets
225(5)
8.6.2 Analysis and Empirical Results on Medium Size Data Sets
230(1)
8.6.3 Global View of the Obtained Results
231(2)
8.6.4 Visualization of Data Subsets: A Case Study Based on the Banana Data Set
233(3)
References
236(9)
9 Discretization
245(40)
9.1 Introduction
245(2)
9.2 Perspectives and Background
247(4)
9.2.1 Discretization Process
247(3)
9.2.2 Related and Advanced Work
250(1)
9.3 Properties and Taxonomy
251(14)
9.3.1 Common Properties
251(4)
9.3.2 Methods and Taxonomy
255(4)
9.3.3 Description of the Most Representative Discretization Methods
259(6)
9.4 Experimental Comparative Analysis
265(20)
9.4.1 Experimental Set up
265(3)
9.4.2 Analysis and Empirical Results
268(10)
References
278(7)
10 A Data Mining Software Package Including Data Preparation and Reduction: KEEL
285(30)
10.1 Data Mining Softwares and Toolboxes
285(2)
10.2 KEEL: Knowledge Extraction Based on Evolutionary Learning
287(7)
10.2.1 Main Features
288(1)
10.2.2 Data Management
289(2)
10.2.3 Design of Experiments: Off-Line Module
291(2)
10.2.4 Computer-Based Education: On-Line Module
293(1)
10.3 KEEL-Dataset
294(4)
10.3.1 Data Sets Web Pages
294(3)
10.3.2 Experimental Study Web Pages
297(1)
10.4 Integration of New Algorithms into the KEEL Tool
298(5)
10.4.1 Introduction to the KEEL Codification Features
298(5)
10.5 KEEL Statistical Tests
303(7)
10.5.1 Case Study
304(6)
10.6 Summarizing Comments
310(5)
References
311(4)
Index 315