Klienditugi: 7440010 (E-R 10-18)

Discovering Knowledge in Data: An Introduction to Data Mining 2nd edition [Kõva köide]

3.84/5 (40 hinnangut Goodreads-ist)

Chantal D. Larose (Eastern Connecticut State University (ECSU)), Daniel T. Larose (Central Connecticut State University)

Teised formaadid

Other digital carrier (Hind: 136,44 €) - 01-Mar-2005

Formaat: Hardback, 336 pages, kõrgus x laius x paksus: 244x163x27 mm, kaal: 685 g
Sari: Wiley Series on Methods and Applications in Data Mining
Ilmumisaeg: 11-Jul-2014
Kirjastus: John Wiley & Sons Inc
ISBN-10: 0470908742
ISBN-13: 9780470908747

Teised raamatud teemal:

Data mining - (Hetkel poes: 1 nimetust)

Kõva köide
Hind: 108,03 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja
Raamatukogudele

Formaat: Hardback, 336 pages, kõrgus x laius x paksus: 244x163x27 mm, kaal: 685 g
Sari: Wiley Series on Methods and Applications in Data Mining
Ilmumisaeg: 11-Jul-2014
Kirjastus: John Wiley & Sons Inc
ISBN-10: 0470908742
ISBN-13: 9780470908747

Teised raamatud teemal:

Data mining - (Hetkel poes: 1 nimetust)

Püsilink: https://www.kriso.ee/db/9780470908747.html

Märksõnad:

"This is a new edition of a highly praised, successful reference on data mining, now more important than ever due to the growth of the field and wide range of applications. This edition features new chapters on multivariate statistical analysis, coveringanalysis of variance and chi-square procedures; cost-benefit analyses; and time-series data analysis. There is also extensive coverage of the R statistical programming language. Graduate and advanced undergraduate students of computer science and statistics, managers/CEOs/CFOs, marketing executives, market researchers and analysts, sales analysts, and medical professionals will want this comprehensive reference"--

To help alleviate the shortage of trained and skilled data analysts in business, statisticians Larose and Larose explain the models and techniques to uncover hidden nuggets of information, offer insight into how the data mining algorithms really work, and allow readers to experience data mining on large data sets. In most chapters, a section provides the actual code in R needed to obtain the results shown in the text, along with screen shots of some of the output using R Studio. The topics include data pre-processing, multivariate statistics, decision trees, Kohonen networks, and imputation of missing data. Annotation ©2014 Ringgold, Inc., Portland, OR (protoview.com)

The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.
Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization
Offers extensive coverage of the R statistical programming language
Contains 280 end-of-chapter exercises
Includes a companion website with further resources for all readers, and Powerpoint slides, a solutions manual, and suggested projects for instructors who adopt the book

Preface

Chapter 1 An Introduction To Data Mining

(15)

1.1 What is Data Mining?

(1)

1.2 Wanted: Data Miners

(1)

1.3 The Need for Human Direction of Data Mining

(1)

1.4 The Cross-Industry Standard Practice for Data Mining

(2)

1.4.1 Crisp-DM: The Six Phases

(1)

1.5 Fallacies of Data Mining

(2)

1.6 What Tasks Can Data Mining Accomplish?

(8)

1.6.1 Description

(1)

1.6.2 Estimation

(2)

1.6.3 Prediction

(1)

1.6.4 Classification

(2)

1.6.5 Clustering

(2)

1.6.6 Association

(1)

References

(1)

Exercises

(1)

Chapter 2 Data Preprocessing

(35)

2.1 Why do We Need to Preprocess the Data?

(1)

2.2 Data Cleaning

(2)

2.3 Handling Missing Data

(3)

2.4 Identifying Misclassifications

(1)

2.5 Graphical Methods for Identifying Outliers

(1)

2.6 Measures of Center and Spread

(3)

2.7 Data Transformation

(1)

2.8 Min-Max Normalization

(1)

2.9 Z-Score Standardization

(1)

2.10 Decimal Scaling

(1)

2.11 Transformations to Achieve Normality

(7)

2.12 Numerical Methods for Identifying Outliers

(1)

2.13 Flag Variables

(1)

2.14 Transforming Categorical Variables into Numerical Variables

(1)

2.15 Binning Numerical Variables

(1)

2.16 Reclassifying Categorical Variables

(1)

2.17 Adding an Index Field

(1)

2.18 Removing Variables that are Not Useful

(1)

2.19 Variables that Should Probably Not Be Removed

(1)

2.20 Removal of Duplicate Records

(1)

2.21 A Word About ID Fields

(10)

The R Zone

(6)

References

(1)

Exercises

(2)

Hands-On Analysis

(1)

Chapter 3 Exploratory Data Analysis

(40)

3.1 Hypothesis Testing Versus Exploratory Data Analysis

(1)

3.2 Getting to Know the Data Set

(3)

3.3 Exploring Categorical Variables

(7)

3.4 Exploring Numeric Variables

(7)

3.5 Exploring Multivariate Relationships

(2)

3.6 Selecting Interesting Subsets of the Data for Further Investigation

(1)

3.7 Using EDA to Uncover Anomalous Fields

(1)

3.8 Binning Based on Predictive Value

(2)

3.9 Deriving New Variables: Flag Variables

(3)

3.10 Deriving New Variables: Numerical Variables

(1)

3.11 Using EDA to Investigate Correlated Predictor Variables

(3)

3.12 Summary

(11)

The R Zone

(6)

Reference

(1)

Exercises

(1)

Hands-On Analysis

(2)

Chapter 4 Univariate Statistical Analysis

(18)

4.1 Data Mining Tasks in Discovering Knowledge in Data

(1)

4.2 Statistical Approaches to Estimation and Prediction

(1)

4.3 Statistical Inference

(1)

4.4 How Confident are We in Our Estimates?

(1)

4.5 Confidence Interval Estimation of the Mean

(2)

4.6 How to Reduce the Margin of Error

(1)

4.7 Confidence Interval Estimation of the Proportion

(1)

4.8 Hypothesis Testing for the Mean

(2)

4.9 Assessing the Strength of Evidence Against the Null Hypothesis

101

(1)

4.10 Using Confidence Intervals to Perform Hypothesis Tests

102

(2)

4.11 Hypothesis Testing for the Proportion

104

(5)

The R Zone

105

(1)

Reference

106

(1)

Exercises

106

(3)

Chapter 5 Multivariate Statistics

109

(29)

5.1 Two-Sample t-Test for Difference in Means

110

(1)

5.2 Two-Sample Z-Test for Difference in Proportions

111

(1)

5.3 Test for Homogeneity of Proportions

112

(2)

5.4 Chi-Square Test for Goodness of Fit of Multinomial Data

114

(1)

5.5 Analysis of Variance

115

(3)

5.6 Regression Analysis

118

(4)

5.7 Hypothesis Testing in Regression

122

(1)

5.8 Measuring the Quality of a Regression Model

123

(1)

5.9 Dangers of Extrapolation

123

(2)

5.10 Confidence Intervals for the Mean Value of y Given x

125

(1)

5.11 Prediction Intervals for a Randomly Chosen Value of y Given x

125

(1)

5.12 Multiple Regression

126

(1)

5.13 Verifying Model Assumptions

127

(11)

The R Zone

131

(4)

Reference

135

(1)

Exercises

135

(1)

Hands-On Analysis

136

(2)

Chapter 6 Preparing To Model The Data

138

(11)

6.1 Supervised Versus Unsupervised Methods

138

(1)

6.2 Statistical Methodology and Data Mining Methodology

139

(1)

6.3 Cross-Validation

139

(2)

6.4 Overfitting

141

(1)

6.5 BIAS--Variance Trade-Off

142

(2)

6.6 Balancing the Training Data Set

144

(1)

6.7 Establishing Baseline Performance

145

(4)

The R Zone

146

(1)

Reference

147

(1)

Exercises

147

(2)

Chapter 7 K-Nearest Neighbor Algorithm

149

(16)

7.1 Classification Task

149

(1)

7.2 k-Nearest Neighbor Algorithm

150

(3)

7.3 Distance Function

153

(3)

7.4 Combination Function

156

(2)

7.4.1 Simple Unweighted Voting

156

(1)

7.4.2 Weighted Voting

156

(2)

7.5 Quantifying Attribute Relevance: Stretching the Axes

158

(1)

7.6 Database Considerations

158

(1)

7.7 k-Nearest Neighbor Algorithm for Estimation and Prediction

159

(1)

7.8 Choosing k

160

(1)

7.9 Application of k-Nearest Neighbor Algorithm Using IBM/SPSS Modeler

160

(5)

The R Zone

162

(1)

Exercises

163

(1)

Hands-On Analysis

164

(1)

Chapter 8 Decision Trees

165

(22)

8.1 What is a Decision Tree?

165

(2)

8.2 Requirements for Using Decision Trees

167

(1)

8.3 Classification and Regression Trees

168

(6)

8.4 C4.5 Algorithm

174

(5)

8.5 Decision Rules

179

(1)

8.6 Comparison of the C5.0 and Cart Algorithms Applied to Real Data

180

(7)

The R Zone

183

(1)

References

184

(1)

Exercises

185

(1)

Hands-On Analysis

185

(2)

Chapter 9 Neural Networks

187

(22)

9.1 Input and Output Encoding

188

(2)

9.2 Neural Networks for Estimation and Prediction

190

(1)

9.3 Simple Example of a Neural Network

191

(2)

9.4 Sigmoid Activation Function

193

(1)

9.5 Back-Propagation

194

(4)

9.5.1 Gradient Descent Method

194

(1)

9.5.2 Back-Propagation Rules

195

(1)

9.5.3 Example of Back-Propagation

196

(2)

9.6 Termination Criteria

198

(1)

9.7 Learning Rate

198

(1)

9.8 Momentum Term

199

(2)

9.9 Sensitivity Analysis

201

(1)

9.10 Application of Neural Network Modeling

202

(7)

The R Zone

204

(3)

References

207

(1)

Exercises

207

(1)

Hands-On Analysis

207

(2)

Chapter 10 Hierarchical And K-Means Clustering

209

(19)

10.1 The Clustering Task

209

(3)

10.2 Hierarchical Clustering Methods

212

(1)

10.3 Single-Linkage Clustering

213

(1)

10.4 Complete-Linkage Clustering

214

(1)

10.5 k-Means Clustering

215

(1)

10.6 Example of k-Means Clustering at Work

216

(3)

10.7 Behavior of MSB, MSE, and PSEUDO-F as the k-Means Algorithm Proceeds

219

(1)

10.8 Application of k-Means Clustering Using SAS Enterprise Miner

220

(3)

10.9 Using Cluster Membership to Predict Churn

223

(5)

The R Zone

224

(2)

References

226

(1)

Exercises

226

(1)

Hands-On Analysis

226

(2)

Chapter 11 Kohonen Networks

228

(19)

11.1 Self-Organizing Maps

228

(2)

11.2 Kohonen Networks

230

(1)

11.2.1 Kohonen Networks Algorithm

231

(1)

11.3 Example of a Kohonen Network Study

231

(4)

11.4 Cluster Validity

235

(1)

11.5 Application of Clustering Using Kohonen Networks

235

(2)

11.6 Interpreting the Clusters

237

(5)

11.6.1 Cluster Profiles

240

(2)

11.7 Using Cluster Membership as Input to Downstream Data Mining Models

242

(5)

The R Zone

243

(2)

References

245

(1)

Exercises

245

(1)

Hands-On Analysis

245

(2)

Chapter 12 Association Rules

247

(19)

12.1 Affinity Analysis and Market Basket Analysis

247

(2)

12.1.1 Data Representation for Market Basket Analysis

248

(1)

12.2 Support, Confidence, Frequent Itemsets, and the a Priori Property

249

(2)

12.3 How Does the a Priori Algorithm Work?

251

(4)

12.3.1 Generating Frequent Itemsets

251

(2)

12.3.2 Generating Association Rules

253

(2)

12.4 Extension from Flag Data to General Categorical Data

255

(1)

12.5 Information-Theoretic Approach: Generalized Rule Induction Method

256

(2)

12.5.1 j-Measure

257

(1)

12.6 Association Rules are Easy to do Badly

258

(1)

12.7 How Can We Measure the Usefulness of Association Rules?

259

(1)

12.8 Do Association Rules Represent Supervised or Unsupervised Learning?

260

(1)

12.9 Local Patterns Versus Global Models

261

(5)

The R Zone

262

(1)

References

263

(1)

Exercises

263

(1)

Hands-On Analysis

264

(2)

Chapter 13 Imputation Of Missing Data

266

(11)

13.1 Need for Imputation of Missing Data

266

(1)

13.2 Imputation of Missing Data: Continuous Variables

267

(3)

13.3 Standard Error of the Imputation

270

(1)

13.4 Imputation of Missing Data: Categorical Variables

271

(1)

13.5 Handling Patterns in Missingness

272

(5)

The R Zone

273

(3)

Reference

276

(1)

Exercises

276

(1)

Hands-On Analysis

276

(1)

Chapter 14 Model Evaluation Techniques

277

(17)

14.1 Model Evaluation Techniques for the Description Task

278

(1)

14.2 Model Evaluation Techniques for the Estimation and Prediction Tasks

278

(2)

14.3 Model Evaluation Techniques for the Classification Task

280

(1)

14.4 Error Rate, False Positives, and False Negatives

280

(3)

14.5 Sensitivity and Specificity

283

(1)

14.6 Misclassification Cost Adjustment to Reflect Real-World Concerns

284

(1)

14.7 Decision Cost/Benefit Analysis

285

(1)

14.8 Lift Charts and Gains Charts

286

(3)

14.9 Interweaving Model Evaluation with Model Building

289

(1)

14.10 Confluence of Results: Applying a Suite of Models

290

(4)

The R Zone

291

(1)

Reference

291

(1)

Exercises

291

(1)

Hands-On Analysis

291

(3)

Appendix: Data Summarization And Visualization

294

(15)

Index

309

Daniel T. Larose earned his PhD in Statistics at the University of Connecticut. He is Professor of Mathematical Sciences and Director of the Data Mining programs at Central Connecticut State University. His consulting clients have included Microsoft, Forbes Magazine, the CIT Group, KPMG International, Computer Associates, and Deloitte, Inc. This is Laroses fourth book for Wiley.

Chantal D. Larose is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU). She has co-authored three books on data science and predictive analytics. She helped develop data science programs at ECSU and at SUNY New Paltz. She received her PhD in Statistics from the University of Connecticut, Storrs in 2015 (dissertation title: Model-based Clustering of Incomplete Data).

Discovering Knowledge in Data: An Introduction to Data Mining 2nd edition [Kõva köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv