Klienditugi: 7440010 (E-R 10-18)

E-raamat: Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro

3.56/5 (13 hinnangut Goodreads-ist)

Galit Shmueli (University of Maryland, College Park), Nitin R. Patel, Peter C. Bruce (Massachusetts Institute of Technology), Mia L. Stephens

Formaat: PDF+DRM
Ilmumisaeg: 09-May-2016
Kirjastus: John Wiley & Sons Inc
Keel: eng
ISBN-13: 9781118956625

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 134,49 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Ilmumisaeg: 09-May-2016
Kirjastus: John Wiley & Sons Inc
Keel: eng
ISBN-13: 9781118956625

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® presents an applied and interactive approach to data mining.

Featuring hands-on applications with JMP Pro®, a statistical package from the SAS Institute, the book uses engaging, real-world examples to build a theoretical and practical understanding of key data mining methods, especially predictive models for classification and prediction. Topics include data visualization, dimension reduction techniques, clustering, linear and logistic regression, classification and regression trees, discriminant analysis, naive Bayes, neural networks, uplift modeling, ensemble models, and time series forecasting.

Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® also includes:

Detailed summaries that supply an outline of key topics at the beginning of each chapter End-of-chapter examples and exercises that allow readers to expand their comprehension of the presented material Data-rich case studies to illustrate various applications of data mining techniques A companion website with over two dozen data sets, exercises and case study solutions, and slides for instructors www.dataminingbook.com

Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® is an excellent textbook for advanced undergraduate and graduate-level courses on data mining, predictive analytics, and business analytics. The book is also a one-of-a-kind resource for data scientists, analysts, researchers, and practitioners working with analytics in the fields of management, finance, marketing, information technology, healthcare, education, and any other data-rich field.

Foreword

xvii

Preface

xix

Acknowledgments

xxi

Part I Preliminaries

1 Introduction

(11)

1.1 What Is Business Analytics?

(2)

Who Uses Predictive Analytics?

(1)

1.2 What Is Data Mining?

(1)

1.3 Data Mining and Related Terms

(1)

1.4 Big Data

(1)

1.5 Data Science

(1)

1.6 Why Are There So Many Different Methods?

(1)

1.7 Terminology and Notation

(2)

1.8 Roadmap to This Book

(4)

Order of Topics

(4)

Using IMP Pro, Statistical Discovery Software from SAS

(3)

2 Overview of the Data Mining Process

(37)

2.1 Introduction

(1)

2.2 Core Ideas in Data Mining

(2)

Classification

(1)

Prediction

(1)

Association Rules and Recommendation Systems

(1)

Predictive Analytics

(1)

Data Reduction and Dimension Reduction

(1)

Data Exploration and Visualization

(1)

Supervised and Unsupervised Learning

(1)

2.3 The Steps in Data Mining

(2)

2.4 Preliminary Steps

(6)

Organization of Datasets

(1)

Sampling from a Database

(1)

Oversampling Rare Events in Classification Tasks

(1)

Preprocessing and Cleaning the Data

(5)

Changing Modeling Types in IMP

(5)

Standardizing Data in IMP

(1)

2.5 Predictive Power and Overfitting

(4)

Creation and Use of Data Partitions

(2)

Partitioning Data for Crossvalidation in JMP Pro

(1)

Overfitting

(2)

2.6 Building a Predictive Model with JMP Pro

(9)

Predicting Home Values in a Boston Neighborhood

(1)

Modeling Process

(10)

Setting the Random Seed in JMP

(4)

2.7 Using JMP Pro for Data Mining

(2)

2.8 Automating Data Mining Solutions

(4)

Data Mining Software Tools: the State of the Market by Herb Edelstein

(3)

Problems

(7)

Part II Data Exploration And Dimension Reduction

3 Data Visualization

(30)

3.1 Uses of Data Visualization

(1)

3.2 Data Examples

(2)

Example 1: Boston Housing Data

(1)

Example 2: Ridership on Amtrak Trains

(1)

3.3 Basic Charts: Bar Charts, Line Graphs, and Scatterplots

(7)

Using The JMP Graph Builder

(2)

Distribution Plots: Boxplots and Histograms

(3)

Tools for Data Visualization in JMP

(1)

Heatmaps (Color Maps and Cell Plots): Visualizing Correlations and Missing Values

(2)

3.4 Multidimensional Visualization

(12)

Adding Variables: Color, Size, Shape, Multiple Panels, and Animation

(3)

Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering

(3)

Reference: Trend Lines and Labels

(1)

Adding Trendlines in the Graph Builder

(1)

Scaling Up: Large Datasets

(1)

Multivariate Plot: Parallel Coordinates Plot

(1)

Interactive Visualization

(1)

3.5 Specialized Visualizations

(4)

Visualizing Networked Data

(1)

Visualizing Hierarchical Data: More on Treemaps

(1)

Visualizing Geographical Data: Maps

(1)

3.6 Summary of Major Visualizations and Operations, According to Data Mining Goal

(2)

Prediction

(1)

Classification

(1)

Time Series Forecasting

(1)

Unsupervised Learning

(1)

Problems

(2)

4 Dimension Reduction

(24)

4.1 Introduction

(1)

4.2 Curse of Dimensionality

(1)

4.3 Practical Considerations

(1)

Example 1: House Prices in Boston

(1)

4.4 Data Summaries

(4)

Summary Statistics

(2)

Tabulating Data (Pivot Tables)

(2)

4.5 Correlation Analysis

(1)

4.6 Reducing the Number of Categories in Categorical Variables

(3)

4.7 Converting a Categorical Variable to a Continuous Variable

(1)

4.8 Principal Components Analysis

(10)

Example 2: Breakfast Cereals

(4)

Principal Components

(2)

Normalizing the Data

(3)

Using Principal Components for Classification and Prediction

100

(1)

4.9 Dimension Reduction Using Regression Models

100

(1)

4.10 Dimension Reduction Using Classification and Regression Trees

100

(1)

Problems

101

(4)

Part III Performance Evaluation

5 Evaluating Predictive Performance

105

(28)

5.1 Introduction

105

(1)

5.2 Evaluating Predictive Performance

106

(3)

Benchmark: The Average

106

(1)

Prediction Accuracy Measures

107

(1)

Comparing Training and Validation Performance

108

(1)

5.3 Judging Classifier Performance

109

(11)

Benchmark: The Naive Rule

109

(1)

Class Separation

109

(1)

The Classification Matrix

109

(2)

Using the Validation Data

111

(1)

Accuracy Measures

111

(1)

Propensities and Cutoff for Classification

112

(3)

Cutoff Values for Triage

112

(2)

Changing the Cutoff Values for a Confussion Matrix in JMP

114

(1)

Performance in Unequal Importance of Classes

115

(1)

False-Positive and False-Negative Rates

116

(1)

Asymmetric Misclassification Costs

116

(4)

Asymmetric Misclassification Costs in JMP

119

(1)

Generalization to More Than Two Classes

120

(1)

5.4 Judging Ranking Performance

120

(3)

Lift Curves

120

(2)

Beyond Two Classes

122

(1)

Lift Curves Incorporating Costs and Benefits

122

(1)

5.5 Oversampling

123

(6)

Oversampling the Training Set

126

(1)

Stratified Sampling and Oversampling in JMP

126

(1)

Evaluating Model Performance Using a Nonoversampled Validation Set

126

(1)

Evaluating Model Performance If Only Oversampled Validation Set Exists

127

(8)

Applying Sampling Weights in JMP

128

(1)

Problems

129

(4)

Part IV Prediction And Classification Methods

6 Multiple Linear Regression

133

(22)

6.1 Introduction

133

(1)

6.2 Explanatory versus Predictive Modeling

134

(1)

6.3 Estimating the Regression Equation and Prediction

135

(6)

Example: Predicting the Price of Used Toyota Corolla Automobiles

136

(5)

Coding of Categorical Variables in Regression

138

(2)

Additional Options for Regression Models in JMP

140

(1)

6.4 Variable Selection in Linear Regression

141

(9)

Reducing the Number of Predictors

141

(1)

How to Reduce the Number of Predictors

142

(1)

Manual Variable Selection

142

(1)

Automated Variable Selection

142

(13)

Coding of Categorical Variables in Stepwise Regression

143

(2)

Working with the All Possible Models Output

145

(2)

When Using a Stopping Algorithm in JMP

147

(2)

Other Regression Procedures in JMP Pro-Generalized Regression

149

(1)

Problems

150

(5)

7 k-Nearest Neighbors (k-NN)

155

(12)

7.1 The k-NN Classifier (Categorical Outcome)

155

(6)

Determining Neighbors

155

(1)

Classification Rule

156

(1)

Example: Riding Mowers

156

(1)

Choosing k

157

(2)

k Nearest Neighbors in JMP Pro

158

(1)

The Cutoff Value for Classification

159

(2)

k-NN Predictions and Prediction Formulas in JMP Pro

161

(1)

k-NN with More Than Two Classes

161

(1)

7.2 k-NN for a Numerical Response

161

(2)

Pandora

161

(2)

7.3 Advantages and Shortcomings of k-NN Algorithms

163

(1)

Problems

164

(3)

8 The Naive Bayes Classifier

167

(16)

8.1 Introduction

167

(2)

Naive Bayes Method

167

(1)

Cutoff Probability Method

168

(1)

Conditional Probability

168

(1)

Example 1: Predicting Fraudulent Financial Reporting

168

(1)

8.2 Applying the Full (Exact) Bayesian Classifier

169

(10)

Using the "Assign to the Most Probable Class" Method

169

(1)

Using the Cutoff Probability Method

169

(1)

Practical Difficulty with the Complete (Exact) Bayes Procedure

170

(1)

Solution: Naive Bayes

170

(2)

Example 2: Predicting Fraudulent Financial Reports, Two Predictors

172

(2)

Using the JMP Naive Bayes Add-in

174

(1)

Example 3: Predicting Delayed Flights

174

(5)

8.3 Advantages and Shortcomings of the Naive Bayes Classifier

179

(1)

Spam Filtering

179

(1)

Problems

180

(3)

9 Classification and Regression Trees

183

(28)

9.1 Introduction

183

(1)

9.2 Classification Trees

184

(3)

Recursive Partitioning

184

(1)

Example 1: Riding Mowers

185

(1)

Categorical Predictors

186

(1)

9.3 Growing a Tree

187

(5)

Growing a Tree Example

187

(1)

Classifying a New Observation

188

(4)

Fitting Classification Trees in JMP Pro

191

(1)

Growing a Tree with CART

192

(1)

9.4 Evaluating the Performance of a Classification Tree

192

(1)

Example 2: Acceptance of Personal Loan

192

(1)

9.5 Avoiding Overfitting

193

(3)

Stopping Tree Growth: CHAID

194

(1)

Growing a Full Tree and Pruning It Back

194

(2)

How JMP Limits Tree Size

196

(1)

9.6 Classification Rules from Trees

196

(2)

9.7 Classification Trees for More Than Two Classes

198

(1)

9.8 Regression Trees

199

(1)

Prediction

199

(1)

Evaluating Performance

200

(1)

9.9 Advantages and Weaknesses of a Tree

200

(4)

9.10 Improving Prediction: Multiple Trees

204

(3)

Fitting Ensemble Tree Models in JMP Pro

206

(1)

9.11 CART and Measures of Impurity

207

(1)

Problems

207

(4)

10 Logistic Regression

211

(34)

10.1 Introduction

211

(2)

Logistic Regression and Consumer Choice Theory

212

(1)

10.2 The Logistic Regression Model

213

(8)

Example: Acceptance of Personal Loan (Universal Bank)

214

(2)

Indicator (Dummy) Variables in JMP

216

(1)

Model with a Single Predictor

216

(2)

Fitting One Predictor Logistic Models in JMP

218

(1)

Estimating the Logistic Model from Data: Multiple Predictors

218

(3)

Fitting Logistic Models in JMP with More Than One Predictor

221

(1)

10.3 Evaluating Classification Performance

221

(2)

Variable Selection

222

(1)

10.4 Example of Complete Analysis: Predicting Delayed Flights

223

(11)

Data Preprocessing

225

(1)

Model Fitting, Estimation and Interpretation-A Simple Model

226

(1)

Model Fitting, Estimation and Interpretation-The Full Model

227

(2)

Model Performance

229

(1)

Variable Selection

230

(2)

Regrouping and Recoding Variables in JMP

232

(2)

10.5 Appendixes: Logistic Regression for Profiling

234

(7)

Appendix A: Why Linear Regression Is Problematic for a Categorical Response

234

(2)

Appendix B: Evaluating Explanatory Power

236

(2)

Appendix C: Logistic Regression for More Than Two Classes

238

(1)

Nominal Classes

238

(3)

Problems

241

(4)

11 Neural Nets

245

(23)

11.1 Introduction

245

(1)

11.2 Concept and Structure of a Neural Network

246

(1)

11.3 Fitting a Network to Data

246

(14)

Example 1: Tiny Dataset

246

(2)

Computing Output of Nodes

248

(3)

Preprocessing the Data

251

(1)

Activation Functions and Data Processing Features in JMP Pro

251

(1)

Training the Model

251

(3)

Fitting a Neural Network in JMP Pro

254

(2)

Using the Output for Prediction and Classification

256

(2)

Example 2: Classifying Accident Severity

258

(1)

Avoiding overfitting

259

(1)

11.4 User Input in JMP Pro

260

(4)

Unsupervised Feature Extraction and Deep Learning

263

(1)

11.5 Exploring the Relationship between Predictors and Response

264

(1)

Understanding Neural Models in JMP Pro

264

(1)

11.6 Advantages and Weaknesses of Neural Networks

264

(1)

Problems

265

(3)

12 Discriminant Analysis

268

(17)

12.1 Introduction

268

(2)

Example 1: Riding Mowers

269

(1)

Example 2: Personal Loan Acceptance (Universal Bank)

269

(1)

12.2 Distance of an Observation from a Class

270

(2)

12.3 From Distances to Propensities and Classifications

272

(3)

Linear Discriminant Analysis in JMP

275

(1)

12.4 Classification Performance of Discriminant Analysis

275

(2)

12.5 Prior Probabilities

277

(1)

12.6 Classifying More Than Two Classes

278

(2)

Example 3: Medical Dispatch to Accident Scenes

278

(1)

Using Categorical Predictors in Discriminant Analysis in JMP

279

(1)

12.7 Advantages and Weaknesses

280

(2)

Problems

282

(3)

13 Combining Methods: Ensembles and Uplift Modeling

285

(16)

13.1 Ensembles

285

(5)

Why Ensembles Can Improve Predictive Power

286

(1)

The Wisdom of Crowds

287

(1)

Simple Averaging

287

(1)

Bagging

288

(1)

Boosting

288

(1)

Creating Ensemble Models in JMP Pro

289

(1)

Advantages and Weaknesses of Ensembles

289

(1)

13.2 Uplift (Persuasion) Modeling

290

(5)

A-B Testing

290

(1)

Uplift

290

(1)

Gathering the Data

291

(1)

A Simple Model

292

(1)

Modeling Individual Uplift

293

(1)

Using the Results of an Uplift Model

294

(1)

Creating Uplift Models in JMP Pro

294

(1)

Using the Uplift Platform in JMP Pro

295

(1)

13.3 Summary

295

(2)

Problems

297

(4)

Part V Mining Relationships Among Records

14 Cluster Analysis

301

(34)

14.1 Introduction

301

(4)

Example: Public Utilities

302

(3)

14.2 Measuring Distance between Two Observations

305

(4)

Euclidean Distance

305

(1)

Normalizing Numerical Measurements

305

(1)

Other Distance Measures for Numerical Data

306

(2)

Distance Measures for Categorical Data

308

(1)

Distance Measures for Mixed Data

308

(1)

14.3 Measuring Distance between Two Clusters

309

(2)

Minimum Distance

309

(1)

Maximum Distance

309

(1)

Average Distance

309

(1)

Centroid Distance

309

(2)

14.4 Hierarchical (Agglomerative) Clustering

311

(9)

Hierarchical Clustering in JMP and JMP Pro

311

(1)

Hierarchical Agglomerative Clustering Algorithm

312

(1)

Single Linkage

312

(1)

Complete Linkage

313

(1)

Average Linkage

313

(1)

Centroid Linkage

313

(1)

Ward's Method

314

(1)

Dendrograms: Displaying Clustering Process and Results

314

(2)

Validating Clusters

316

(2)

Two-Way Clustering

318

(1)

Limitations of Hierarchical Clustering

319

(1)

14.5 Nonhierarchical Clustering: The k-Means Algorithm

320

(9)

k-Means Clustering Algorithm

321

(1)

Initial Partition into K Clusters

322

(15)

K-Means Clustering in JMP

322

(7)

Problems

329

(6)

Part VI Forecasting Time Series

15 Handling Time Series

335

(11)

15.1 Introduction

335

(1)

15.2 Descriptive versus Predictive Modeling

336

(1)

15.3 Popular Forecasting Methods in Business

337

(1)

Combining Methods

337

(1)

15.4 Time Series Components

337

(4)

Example: Ridership on Amtrak Trains

337

(4)

15.5 Data Partitioning and Performance Evaluation

341

(2)

Benchmark Performance: Naive Forecasts

342

(1)

Generating Future Forecasts

342

(4)

Partitioning Time Series Data in JMP and Validating Time Series Models

342

(1)

Problems

343

(3)

16 Regression-Based Forecasting

346

(31)

16.1 A Model with Trend

346

(7)

Linear Trend

346

(4)

Fitting a Model with Linear Trend in JMP

348

(2)

Creating Actual versus Predicted Plots and Residual Plots in JMP

350

(1)

Exponential Trend

350

(2)

Computing Forecast Errors for Exponential Trend Models

352

(1)

Polynomial Trend

352

(4)

Fitting a Polynomial Trend in JMP

353

(1)

16.2 A Model with Seasonality

353

(3)

16.3 A Model with Trend and Seasonality

356

(1)

16.4 Autocorrelation and ARIMA Models

356

(10)

Computing Autocorrelation

356

(4)

Improving Forecasts by Integrating Autocorrelation Information

360

(1)

Fitting AR (Autoregression) Models in the JMP Time Series Platform

361

(1)

Fitting AR Models to Residuals

361

(2)

Evaluating Predictability

363

(2)

Summary: Fitting Regression-Based Time Series Models in JMP

365

(1)

Problems

366

(11)

17 Smoothing Methods

377

(25)

17.1 Introduction

377

(1)

17.2 Moving Average

378

(4)

Centered Moving Average for Visualization

378

(1)

Trailing Moving Average for Forecasting

379

(3)

Computing a Trailing Moving Average Forecast in JMP

380

(2)

Choosing Window Width (w)

382

(1)

17.3 Simple Exponential Smoothing

382

(5)

Choosing Smoothing Parameter a

383

(3)

Fitting Simple Exponential Smoothing Models in JMP

384

(2)

Creating Plots for Actual versus Forecasted Series and Residuals Series Using the Graph Builder

386

(1)

Relation between Moving Average and Simple Exponential Smoothing

386

(1)

17.4 Advanced Exponential Smoothing

387

(3)

Series with a Trend

387

(1)

Series with a Trend and Seasonality

388

(2)

Problems

390

(12)

Part VII Cases

18 Cases

402

(29)

18.1 Charles Book Club

401

(8)

The Book Industry

401

(1)

Database Marketing at Charles

402

(1)

Data Mining Techniques

403

(2)

Assignment

405

(4)

18.2 German Credit

409

(1)

Background

409

(1)

Data

409

(1)

Assignment

409

(1)

18.3 Tayko Software Cataloger

410

(5)

Background

410

(3)

The Mailing Experiment

413

(1)

Data

413

(1)

Assignment

413

(2)

18.4 Political Persuasion

415

(4)

Background

415

(1)

Predictive Analytics Arrives in US Politics

415

(1)

Political Targeting

416

(1)

Uplift

416

(1)

Data

417

(1)

Assignment

417

(2)

18.5 Taxi Cancellations

419

(1)

Business Situation

419

(1)

Assignment

419

(1)

18.6 Segmenting Consumers of Bath Soap

420

(3)

Business Situation

420

(1)

Key Problems

421

(1)

Data

421

(1)

Measuring Brand Loyalty

421

(1)

Assignment

421

(2)

18.7 Direct-Mail Fundraising

423

(2)

Background

423

(1)

Data

424

(1)

Assignment

425

(1)

18.8 Predicting Bankruptcy

425

(3)

Predicting Corporate Bankruptcy

426

(2)

Assignment

428

(1)

18.9 Time Series Case: Forecasting Public Transportation Demand

428

(3)

Background

428

(1)

Problem Description

428

(1)

Available Data

428

(1)

Assignment Goal

429

(1)

Assignment

429

(1)

Tips and Suggested Steps

429

(2)

References

431

(2)

Data Files Used in the Book

433

(2)

Index

435

Galit Shmueli, PhD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 70 journal articles, books, textbooks, and book chapters, including Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner®, Third Edition, also published by Wiley.

Peter C. Bruce is President and Founder of the Institute for Statistics Education at www.statistics.com He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective and co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner ®, Third Edition, both published by Wiley.

Mia Stephens is Academic Ambassador at JMP®, a division of SAS Institute. Prior to joining SAS, she was an adjunct professor of statistics at the University of New Hampshire and a founding member of the North Haven Group LLC, a statistical training and consulting company. She is the co-author of three other books, including Visual Six Sigma: Making Data Analysis Lean, Second Edition, also published by Wiley.

Nitin R. Patel, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years. He is co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner®, Third Edition, also published by Wiley.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97811189566252e.html

Märksõnad:

E-raamat: Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv