Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Data Mining with R: Learning with Case Studies, Second Edition

3.81/5 (86 hinnangut Goodreads-ist)

Luis Torgo (University of Porto, Portugal)

Formaat: 446 pages
Sari: Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
Ilmumisaeg: 30-Nov-2016
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781315399096

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 58,49 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 446 pages
Sari: Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
Ilmumisaeg: 30-Nov-2016
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781315399096

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Data Mining with R: Learning with Case Studies, Second Edition uses practical examples to illustrate the power of R and data mining. Providing an extensive update to the best-selling first edition, this new edition is divided into two parts. The first part will feature introductory material, including a new chapter that provides an introduction to data mining, to complement the already existing introduction to R. The second part includes case studies, and the new edition strongly revises the R code of the case studies making it more up-to-date with recent packages that have emerged in R.

The book does not assume any prior knowledge about R. Readers who are new to R and data mining should be able to follow the case studies, and they are designed to be self-contained so the reader can start anywhere in the document.

The book is accompanied by a set of freely available R source files that can be obtained at the books web site. These files include all the code used in the case studies, and they facilitate the "do-it-yourself" approach followed in the book.

Designed for users of data analysis tools, as well as researchers and developers, the book should be useful for anyone interested in entering the "world" of R and data mining.

About the Author

Luķs Torgo is an associate professor in the Department of Computer Science at the University of Porto in Portugal. He teaches Data Mining in R in the NYU Stern School of Business MS in Business Analytics program. An active researcher in machine learning and data mining for more than 20 years, Dr. Torgo is also a researcher in the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) of INESC Porto LA.

Preface

Acknowledgments

xiii

List of Figures

List of Tables

xix

1 Introduction

(4)

1.1 How to Read This Book

(1)

1.2 Reproducibility

(2)

I R and Data Mining

(186)

2 Introduction to R

(36)

2.1 Starting with R

(2)

2.2 Basic Interaction with the R Console

(1)

2.3 R Objects and Variables

(2)

2.4 R Functions

(4)

2.5 Vectors

(2)

2.6 Vectorization

(1)

2.7 Factors

(3)

2.8 Generating Sequences

(2)

2.9 Sub-Setting

(2)

2.10 Matrices and Arrays

(4)

2.11 Lists

(2)

2.12 Data Frames

(4)

2.13 Useful Extensions to Data Frames

(4)

2.14 Objects, Classes, and Methods

(1)

2.15 Managing Your Sessions

(2)

3 Introduction to Data Mining

(148)

3.1 A Bird's Eye View on Data Mining

(2)

3.2 Data Collection and Business Understanding

(8)

3.2.1 Data and Datasets

(1)

3.2.2 Importing Data into R

(1)

3.2.2.1 Text Files

(2)

3.2.2.2 Databases

(3)

3.2.2.3 Spreadsheets

(1)

3.2.2.4 Other Formats

(1)

3.3 Data Pre-Processing

(34)

3.3.1 Data Cleaning

(1)

3.3.1.1 Tidy Data

(3)

3.3.1.2 Handling Dates

(2)

3.3.1.3 String Processing

(2)

3.3.1.4 Dealing with Unknown Values

(2)

3.3.2 Transforming Variables

(1)

3.3.2.1 Handling Different Scales of Variables

(1)

3.3.2.2 Discretizing Variables

(2)

3.3.3 Creating Variables

(1)

3.3.3.1 Handling Case Dependencies

(9)

3.3.3.2 Handling Text Datasets

(4)

3.3.4 Dimensionality Reduction

(1)

3.3.4.1 Sampling Rows

(4)

3.3.4.2 Variable Selection

(5)

3.4 Modeling

(85)

3.4.1 Exploratory Data Analysis

(1)

3.4.1.1 Data Summarization

(9)

3.4.1.2 Data Visualization

(14)

3.4.2 Dependency Modeling using Association Rules

110

(9)

3.4.3 Clustering

119

(1)

3.4.3.1 Measures of Dissimilarity

119

(1)

3.4.3.2 Clustering Methods

120

(11)

3.4.4 Anomaly Detection

131

(1)

3.4.4.1 Univariate Outlier Detection Methods

132

(1)

3.4.4.2 Multi-Variate Outlier Detection Methods

133

(7)

3.4.5 Predictive Analytics

140

(1)

3.4.5.1 Evaluation Metrics

141

(4)

3.4.5.2 Tree-Based Models

145

(6)

3.4.5.3 Support Vector Machines

151

(7)

3.4.5.4 Artificial Neural Networks and Deep Learning

158

(7)

3.4.5.5 Model Ensembles

165

(7)

3.5 Evaluation

172

(10)

3.5.1 The Holdout and Random Subsampling

174

(3)

3.5.2 Cross Validation

177

(2)

3.5.3 Bootstrap Estimates

179

(2)

3.5.4 Recommended Procedures

181

(1)

3.6 Reporting and Deployment

182

(9)

3.6.1 Reporting Through Dynamic Documents

183

(3)

3.6.2 Deployment through Web Applications

186

(5)

II Case Studies

191

(192)

4 Predicting Algae Blooms

193

(48)

4.1 Problem Description and Objectives

193

(1)

4.2 Data Description

194

(1)

4.3 Loading the Data into R

194

(2)

4.4 Data Visualization and Summarization

196

(9)

4.5 Unknown Values

205

(9)

4.5.1 Removing the Observations with Unknown Values

205

(2)

4.5.2 Filling in the Unknowns with the Most Frequent Values

207

(1)

4.5.3 Filling in the Unknown Values by Exploring Correlations

208

(4)

4.5.4 Filling in the Unknown Values by Exploring Similarities between Cases

212

(2)

4.6 Obtaining Prediction Models

214

(11)

4.6.1 Multiple Linear Regression

215

(5)

4.6.2 Regression Trees

220

(5)

4.7 Model Evaluation and Selection

225

(12)

4.8 Predictions for the Seven Algae

237

(2)

4.9 Summary

239

(2)

5 Predicting Stock Market Returns

241

(54)

5.1 Problem Description and Objectives

241

(1)

5.2 The Available Data

242

(2)

5.2.1 Reading the Data from the CSV File

243

(1)

5.2.2 Getting the Data from the Web

243

(1)

5.3 Defining the Prediction Tasks

244

(10)

5.3.1 What to Predict?

244

(3)

5.3.2 Which Predictors?

247

(4)

5.3.3 The Prediction Tasks

251

(1)

5.3.4 Evaluation Criteria

252

(2)

5.4 The Prediction Models

254

(9)

5.4.1 How Will the Training Data Be Used?

254

(2)

5.4.2 The Modeling Tools

256

(1)

5.4.2.1 Artificial Neural Networks

256

(3)

5.4.2.2 Support Vector Machines

259

(1)

5.4.2.3 Multivariate Adaptive Regression Splines

260

(3)

5.5 From Predictions into Actions

263

(8)

5.5.1 How Will the Predictions Be Used?

263

(1)

5.5.2 Trading-Related Evaluation Criteria

264

(1)

5.5.3 Putting Everything Together: A Simulated Trader

265

(6)

5.6 Model Evaluation and Selection

271

(15)

5.6.1 Monte Carlo Estimates

271

(1)

5.6.2 Experimental Comparisons

272

(6)

5.6.3 Results Analysis

278

(8)

5.7 The Trading System

286

(6)

5.7.1 Evaluation of the Final Test Data

286

(5)

5.7.2 An Online Trading System

291

(1)

5.8 Summary

292

(3)

6 Detecting Fraudulent Transactions

295

(58)

6.1 Problem Description and Objectives

295

(1)

6.2 The Available Data

296

(17)

6.2.1 Loading the Data into R

296

(1)

6.2.2 Exploring the Dataset

297

(7)

6.2.3 Data Problems

304

(1)

6.2.3.1 Unknown Values

304

(5)

6.2.3.2 Few Transactions of Some Products

309

(4)

6.3 Defining the Data Mining Tasks

313

(10)

6.3.1 Different Approaches to the Problem

313

(1)

6.3.1.1 Unsupervised Techniques

313

(1)

6.3.1.2 Supervised Techniques

314

(1)

6.3.1.3 Semi-Supervised Techniques

315

(1)

6.3.2 Evaluation Criteria

316

(1)

6.3.2.1 Precision and Recall

316

(1)

6.3.2.2 Lift Charts and Precision/Recall Curves

317

(3)

6.3.2.3 Normalized Distance to Typical Price

320

(1)

6.3.3 Experimental Methodology

321

(2)

6.4 Obtaining Outlier Rankings

323

(27)

6.4.1 Unsupervised Approaches

323

(1)

6.4.1.1 The Modified Box Plot Rule

323

(4)

6.4.1.2 Local Outlier Factors (LOF)

327

(3)

6.4.1.3 Clustering-Based Outlier Rankings (ORh)

330

(2)

6.4.2 Supervised Approaches

332

(1)

6.4.2.1 The Class Imbalance Problem

333

(2)

6.4.2.2 Naive Bayes

335

(4)

6.4.2.3 AdaBoost

339

(5)

6.4.3 Semi-Supervised Approaches

344

(6)

6.5 Summary

350

(3)

7 Classifying Microarray Samples

353

(30)

7.1 Problem Description and Objectives

353

(1)

7.1.1 Brief Background on Microarray Experiments

353

(1)

7.1.2 The ALL Dataset

354

(1)

7.2 The Available Data

354

(5)

7.2.1 Exploring the Dataset

357

(2)

7.3 Gene (Feature) Selection

359

(9)

7.3.1 Simple Filters Based on Distribution Properties

360

(2)

7.3.2 ANOVA Filters

362

(2)

7.3.3 Filtering Using Random Forests

364

(3)

7.3.4 Filtering Using Feature Clustering Ensembles

367

(1)

7.4 Predicting Cytogenetic Abnormalities

368

(13)

7.4.1 Defining the Prediction Task

368

(1)

7.4.2 The Evaluation Metric

369

(1)

7.4.3 The Experimental Procedure

369

(1)

7.4.4 The Modeling Techniques

370

(3)

7.4.5 Comparing the Models

373

(8)

7.5 Summary

381

(2)

Bibliography

383

(12)

Subject Index

395

(4)

Index of Data Mining Topics

399

(2)

Index of R Functions

401

Luķs Torgo is an associate professor in the Department of Computer Science at the University of Porto in Portugal. He teaches Data Mining in R in the NYU Stern School of Business MS in Business Analytics program. An active researcher in machine learning and data mining for more than 20 years, Dr. Torgo is also a researcher in the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) of INESC Porto LA.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97813153990962e.html

Märksõnad:

E-raamat: Data Mining with R: Learning with Case Studies, Second Edition

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv