Klienditugi: 7440010 (E-R 10-18)

Data Mining for Business Analytics: Concepts, Techniques and Applications in Python [Kõva köide]

3.62/5 (18 hinnangut Goodreads-ist)

Peter C. Bruce (Massachusetts Institute of Technology), Peter Gedeck (Collaborative Drug Discovery), Nitin R. Patel (Cytel Inc., Cambridge, MA; Massachusetts Institute of Technology; Harvard University), Galit Shmueli (University of Maryland, College Park)

Formaat: Hardback, 608 pages, kõrgus x laius x paksus: 257x183x31 mm, kaal: 1111 g
Ilmumisaeg: 25-Nov-2019
Kirjastus: John Wiley & Sons Inc
ISBN-10: 1119549841
ISBN-13: 9781119549840

Teised raamatud teemal:

Mathematics - (Hetkel poes: 12 nimetust)

Kõva köide
Hind: 161,46 €*
* saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Lisa soovinimekirja

Formaat: Hardback, 608 pages, kõrgus x laius x paksus: 257x183x31 mm, kaal: 1111 g
Ilmumisaeg: 25-Nov-2019
Kirjastus: John Wiley & Sons Inc
ISBN-10: 1119549841
ISBN-13: 9781119549840

Teised raamatud teemal:

Mathematics - (Hetkel poes: 12 nimetust)

Püsilink: https://www.kriso.ee/db/9781119549840.html

Märksõnad:

Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python presents an applied approach to data mining concepts and methods, using Python software for illustration

Readers will learn how to implement a variety of popular data mining algorithms in Python (a free and open-source software) to tackle business problems and opportunities.

This is the sixth version of this successful text, and the first using Python. It covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining and network analysis. It also includes:

A new co-author, Peter Gedeck, who brings both experience teaching business analytics courses using Python, and expertise in the application of machine learning methods to the drug-discovery process
A new section on ethical issues in data mining
Updates and new material based on feedback from instructors teaching MBA, undergraduate, diploma and executive courses, and from their students
More than a dozen case studies demonstrating applications for the data mining techniques described
End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented
A companion website with more than two dozen data sets, and instructor materials including exercise solutions, PowerPoint slides, and case solutions

Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python is an ideal textbook for graduate and upper-undergraduate level courses in data mining, predictive analytics, and business analytics. This new edition is also an excellent reference for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology.

“This book has by far the most comprehensive review of business analytics methods that I have ever seen, covering everything from classical approaches such as linear and logistic regression, through to modern methods like neural networks, bagging and boosting, and even much more business specific procedures such as social network analysis and text mining. If not the bible, it is at the least a definitive manual on the subject.”

—Gareth M. James, University of Southern California and co-author (with Witten, Hastie and Tibshirani) of the best-selling book An Introduction to Statistical Learning, with Applications in R

Foreword

xix

Gareth James

Foreword

xxi

Ravi Bapna

Preface to the Python Edition

xxiii

Acknowledgments

xxvii

Part I Preliminaries

Chapter 1 Introduction

(12)

1.1 What Is Business Analytics'

(2)

1.2 What Is Data Mining?

(1)

1.3 Data Mining and Related Terms

(1)

1.4 Big Data

(1)

1.5 Data Science

(1)

1.6 Why Are There So Many Different Methods?

(1)

1.7 Terminology and Notation

(2)

1.8 Road Maps to This Book

(4)

Order of Topics

(4)

Chapter 2 Overview of the Data Mining Process

(46)

2.1 Introduction

(1)

2.2 Core Ideas in Data Mining

(3)

Classification

(1)

Prediction

(1)

Association Rules and Recommendation Systems

(1)

Predictive Analytics

(1)

Data Reduction and Dimension Reduction

(1)

Data Exploration and Visualization

(1)

Supervised and Unsupervised Learning

(1)

2.3 The Steps in Data Mining

(2)

2.4 Preliminary Steps

(13)

Organization of Datasets

(1)

Predicting Home Values in the West Roxbury Neighborhood

(1)

Loading and Looking at the Data in Python

(3)

Python Imports

(1)

Sampling from a Database

(1)

Oversampling Rare Events in Classification Tasks

(1)

Preprocessing and Cleaning the Data

(7)

2.5 Predictive Power and Overfitting

(6)

Overfitting

(2)

Creation and Use of Data Partitions

(4)

2.6 Building a Predictive Model

(4)

Modeling Process

(4)

2.7 Using Python for Data Mining on a Local Machine

(1)

2.8 Automating Data Mining Solutions

(2)

2.9 Ethical Practice in Data Mining

(9)

Data Mining Software: The State of the Market (by Herb Edelstein)

(4)

Problems

(5)

Part II Data Exploration And Dimension Reduction

Chapter 3 Data Visualization

(38)

3.1 Introduction

(3)

3.2 Data Examples

(1)

Example 1: Boston Housing Data

(1)

Example 2: Ridership on Amtrak Trains

(1)

3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots

(9)

Distribution Plots: Boxplots and Histograms

(3)

Heatmaps: Visualizing Correlations and Missing Values

(3)

3.4 Multidimensional Visualization

(14)

Adding Variables: Color, Size, Shape, Multiple Panels, and Animation

(3)

Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering

(4)

Reference: Trend Lines and Labels

(1)

Scaling Up to Large Datasets

(1)

Multivariate Plot: Parallel Coordinates Plot

(1)

Interactive Visualization

(5)

3.5 Specialized Visualizations

(5)

Visualizing Networked Data

(2)

Visualizing Hierarchical Data: Treemaps

(1)

Visualizing Geographical Data: Map Charts

(2)

3.6 Summary: Major Visualizations and Operations, by Data Mining Goal

(4)

Prediction

(1)

Classification

(2)

Time Series Forecasting

(1)

Unsupervised Learning

(1)

Problems

(2)

Chapter 4 Dimension Reduction

(26)

4.1 Introduction

100

(1)

4.2 Curse of Dimensionality

100

(1)

4.3 Practical Considerations

100

(2)

Example 1: House Prices in Boston

101

(1)

4.4 Data Summaries

102

(3)

Summary Statistics

102

(2)

Aggregation and Pivot Tables

104

(1)

4.5 Correlation Analysis

105

(1)

4.6 Reducing the Number of Categories in Categorical Variables

106

(2)

4.7 Converting a Categorical Variable to a Numerical Variable

108

(1)

4.8 Principal Components Analysis

108

(11)

Example 2: Breakfast Cereals

109

(5)

Principal Components

114

(1)

Normalizing the Data

114

(3)

Using Principal Components for Classification and Prediction

117

(2)

4.9 Dimension Reduction Using Regression Models

119

(1)

4.10 Dimension Reduction Using Classification and Regression Trees

119

(1)

Problems

120

(5)

Part III Performance Evaluation

Chapter 5 Evaluating Predictive Performance

125

(36)

5.1 Introduction

126

(1)

5.2 Evaluating Predictive Performance

126

(5)

Naive Benchmark: The Average

127

(1)

Prediction Accuracy Measures

127

(1)

Comparing Training and Validation Performance

128

(1)

Cumulative Gains and Lift Charts

128

(3)

5.3 Judging Classifier Performance

131

(13)

Benchmark: The Naive Rule

132

(1)

Class Separation

133

(1)

The Confusion (Classification) Matrix

133

(1)

Using the Validation Data

134

(1)

Accuracy Measures

135

(1)

Propensities and Cutoff for Classification

136

(2)

Performance in Case of Unequal Importance of Classes

138

(2)

Asymmetric Misclassification Costs

140

(4)

Generalization to More Than Two Classes

144

(1)

5.4 Judging Ranking Performance

144

(5)

Gains and Lift Charts for Binary Data

144

(3)

Decile Lift Charts

147

(1)

Beyond Two Classes

148

(1)

Gains and Lift Charts Incorporating Costs and Benefits

148

(1)

Cumulative Gains as a Function of Cutoff

148

(1)

5.5 Oversampling

149

(6)

Oversampling the Training Set

152

(1)

Evaluating Model Performance Using a Non-oversampled Validation Set

152

(1)

Evaluating Model Performance if Only Oversampled Validation Set Exists

152

(3)

Problems

155

(6)

Part IV Prediction And Classification Methods

Chapter 6 Multiple Linear Regression

161

(24)

6.1 Introduction

162

(1)

6.2 Explanatory vs. Predictive Modeling

162

(2)

6.3 Estimating the Regression Equation and Prediction

164

(5)

Example: Predicting the Price of Used Toyota Corolla Cars

165

(4)

6.4 Variable Selection in Linear Regression

169

(11)

Reducing the Number of Predictors

169

(1)

How to Reduce the Number of Predictors

170

(6)

Regularization (Shrinkage Models)

176

(3)

Appendix: Using Statmodels

179

(1)

Problems

180

(5)

Chapter 7 k-Nearest Neighbors (kNN)

185

(14)

7.1 The k-NN Classifier (Categorical Outcome)

185

(8)

Determining Neighbors

186

(1)

Classification Rule

186

(1)

Example: Riding Mowers

187

(1)

Choosing k

188

(3)

Setting the Cutoff Value

191

(1)

k-NN with More Than Two Classes

192

(1)

Converting Categorical Variables to Binary Dummies

193

(1)

7.2 k-NN for a Numerical Outcome

193

(2)

7.3 Advantages and Shortcomings of k-NN Algorithms

195

(2)

Problems

197

(2)

Chapter 8 The Naive Bayes Classifier

199

(18)

8.1 Introduction

199

(2)

Cutoff Probability Method

200

(1)

Conditional Probability

200

(1)

Example 1: Predicting Fraudulent Financial Reporting

201

(1)

8.2 Applying the Full (Exact) Bayesian Classifier

201

(9)

Using the "Assign to the Most Probable Class" Method

202

(1)

Using the Cutoff Probability Method

202

(1)

Practical Difficulty with the Complete (Exact) Bayes Procedure

202

(1)

Solution: Naive Bayes

203

(1)

The Naive Bayes Assumption of Conditional Independence

204

(1)

Using the Cutoff Probability Method

204

(1)

Example 2: Predicting Fraudulent Financial Reports, Two Predictors

205

(1)

Example 3: Predicting Delayed Flights

206

(4)

8.3 Advantages and Shortcomings of the Naive Bayes Classifier

210

(4)

Problems

214

(3)

Chapter 9 Classification and Regression Trees

217

(34)

9.1 Introduction

218

(2)

Tree Structure

219

(1)

Decision Rules

219

(1)

Classifying a New Record

220

(1)

9.2 Classification Trees

220

(8)

Recursive Partitioning

220

(1)

Example 1: Riding Mowers

221

(2)

Measures of Impurity

223

(5)

9.3 Evaluating the Performance of a Classification Tree

228

(4)

Example 2: Acceptance of Personal Loan

228

(2)

Sensitivity Analysis Using Cross Validation

230

(2)

9.4 Avoiding Overfitting

232

(6)

Stopping Tree Growth

233

(1)

Fine-tuning Tree Parameters

234

(2)

Other Methods for Limiting Tree Size

236

(2)

9.5 Classification Rules from Trees

238

(1)

9.6 Classification Trees for More Than Two Classes

239

(1)

9.7 Regression Trees

239

(4)

Prediction

240

(1)

Measuring Impurity

240

(1)

Evaluating Performance

241

(2)

9.8 Improving Prediction: Random Forests and Boosted Trees

243

(3)

Random Forests

243

(1)

Boosted Trees

244

(2)

9.9 Advantages and Weaknesses of a Tree

246

(2)

Problems

248

(3)

Chapter 10 Logistic Regression

251

(32)

10.1 Introduction

252

(1)

10.2 The Logistic Regression Model

253

(2)

10.3 Example: Acceptance of Personal Loan

255

(6)

Model with a Single Predictor

255

(2)

Estimating the Logistic Model. from Data: Computing Parameter Estimates

257

(2)

Interpreting Results in Terms of Odds (for a Profiling Goal)

259

(2)

10.4 Evaluating Classification Performance

261

(3)

Variable Selection

262

(2)

10.5 Logistic Regression for Multi-class Classification

264

(5)

Ordinal Classes

264

(2)

Nominal Classes

266

(1)

Comparing Ordinal and Nominal Models

267

(2)

10.6 Example of Complete Analysis: Predicting Delayed Flights

269

(11)

Data Preprocessing

270

(2)

Model Training

272

(1)

Model Interpretation

273

(1)

Model Performance

273

(3)

Variable Selection

276

(2)

Appendix: Using Statmodels

278

(2)

Problems

280

(3)

Chapter 11 Neural Nets

283

(26)

11.1 Introduction

284

(1)

11.2 Concept and Structure of a Neural Network

284

(1)

11.3 Fitting a Network to Data

285

(12)

Example 1: Tiny Dataset

285

(1)

Computing Output of Nodes

286

(3)

Preprocessing the Data

289

(1)

Training the Model

290

(2)

Example 2: Classifying Accident Severity

292

(3)

Avoiding Overfitting

295

(2)

Using the Output for Prediction and Classification

297

(1)

11.4 Required User Input

297

(2)

11.5 Exploring the Relationship Between Predictors and Outcome

299

(1)

11.6 Deep Learning

299

(6)

Convolutional Neural Networks (CNNs)

300

(1)

Local Feature Map

301

(1)

A Hierarchy of Features

302

(1)

The Learning Process

302

(1)

Unsupervised Learning

303

(1)

Conclusion

304

(1)

11.7 Advantages and Weaknesses of Neural Networks

305

(1)

Problems

306

(3)

Chapter 12 Discriminant Analysis

309

(18)

12.1 Introduction

310

(1)

Example 1: Riding Mowers

310

(1)

Example 2: Personal Loan Acceptance

310

(1)

12.2 Distance of a Record from a Class

311

(3)

12.3 Fisher's Linear Classification Functions

314

(3)

12.4 Classification Performance of Discriminant Analysis

317

(1)

12.5 Prior Probabilities

318

(1)

12.6 Unequal Misclassification Costs

319

(1)

12.7 Classifying More Than Two Classes

319

(3)

Example 3: Medical Dispatch to Accident Scenes

319

(3)

12.8 Advantages and Weaknesses

322

(2)

Problems

324

(3)

Chapter 13 Combining Methods: Ensembles and Uplift Modeling

327

(18)

13.1 Ensembles

328

(6)

Why Ensembles Can Improve Predictive Power

329

(1)

Simple Averaging

330

(1)

Bagging

331

(1)

Boosting

331

(1)

Bagging and Boosting in Python

332

(1)

Advantages and Weaknesses of Ensembles

332

(2)

13.2 Uplift (Persuasion) Modeling

334

(6)

A-B Testing

334

(1)

Uplift

334

(1)

Gathering the Data

335

(1)

A Simple Model

336

(1)

Modeling Individual Uplift

337

(1)

Computing Uplift with Python

338

(1)

Using the Results of an Uplift Model

339

(1)

13.3 Summary

340

(1)

Problems

341

(4)

Part V Mining Relationships Among Records

Chapter 14 Association Rules and Collaborative Filtering

345

(30)

14.1 Association Rules

346

(11)

Discovering Association Rules in Transaction Databases

346

(2)

Example 1: Synthetic Data on Purchases of Phone Faceplates

348

(1)

Generating Candidate Rules

348

(1)

The Apriori Algorithm

349

(1)

Selecting Strong Rules

349

(3)

Data Format

352

(1)

The Process of Rule Selection

353

(1)

Interpreting the Results

354

(1)

Rules and Chance

355

(2)

Example 2: Rules for Similar Book Purchases

357

(1)

14.2 Collaborative Filtering

357

(11)

Data Type and Format

359

(1)

Example 3: Netflix Prize Contest

360

(1)

User-Based Collaborative Filtering: "People Like You"

361

(2)

Item-Based Collaborative Filtering

363

(1)

Advantages and Weaknesses of Collaborative Filtering

364

(2)

Collaborative Filtering vs. Association Rules

366

(2)

14.3 Summary

368

(2)

Problems

370

(5)

Chapter 15 Cluster Analysis

375

(32)

15.1 Introduction

376

(3)

Example: Public Utilities

377

(2)

15.2 Measuring Distance Between Two Records

379

(6)

Euclidean Distance

380

(1)

Normalizing Numerical Measurements

380

(1)

Other Distance Measures for Numerical Data

381

(2)

Distance Measures for Categorical Data

383

(1)

Distance Measures for Mixed Data

384

(1)

15.3 Measuring Distance Between Two Clusters

385

(2)

Minimum Distance

385

(1)

Maximum Distance

385

(1)

Average Distance

385

(1)

Centroid Distance

385

(2)

15.4 Hierarchical (Agglomerative) Clustering

387

(8)

Single Linkage

388

(1)

Complete Linkage

388

(1)

Average Linkage

388

(1)

Centroid Linkage

389

(1)

Ward's Method

389

(1)

Dendrograms: Displaying Clustering Process and Results

390

(1)

Validating Clusters

390

(3)

Limitations of Hierarchical Clustering

393

(2)

15.5 Non-Hierarchical Clustering: The k-Means Algorithm

395

(6)

Choosing the Number of Clusters (k)

396

(5)

Problems

401

(6)

Part VI Forecasting Time Series

Chapter 16 Handling Time Series

407

(16)

16.1 Introduction

408

(1)

16.2 Descriptive vs. Predictive Modeling

409

(1)

16.3 Popular Forecasting Methods in Business

409

(1)

Combining Methods

410

(1)

16.4 Time Series Components

410

(5)

Example: Ridership on Amtrak Trains

411

(4)

16.5 Data-Partitioning and Performance Evaluation

415

(4)

Benchmark Performance: Naive Forecasts

415

(1)

Generating Future Forecasts

416

(3)

Problems

419

(4)

Chapter 17 Regression-Based Forecasting

423

(28)

17.1 A Model with Trend

424

(5)

Linear Trend

424

(2)

Exponential Trend

426

(1)

Polynomial Trend

427

(2)

17.2 A Model with Seasonality

429

(3)

17.3 A Model with Trend and Seasonality

432

(1)

17.4 Autocorrelation and ARIMA Models

433

(9)

Computing Autocorrelation

434

(2)

Improving Forecasts by Integrating Autocorrelation Information

436

(4)

Evaluating Predictability

440

(2)

Problems

442

(9)

Chapter 18 Smoothing Methods

451

(22)

18.1 Introduction

452

(1)

18.2 Moving Average

452

(5)

Centered Moving Average for Visualization

452

(1)

Trailing Moving Average for Forecasting

453

(2)

Choosing Window Width (w)

455

(2)

18.3 Simple Exponential Smoothing

457

(3)

Choosing Smoothing Parameter a

458

(2)

Relation Between Moving Average and Simple Exponential Smoothing

460

(1)

18.4 Advanced Exponential Smoothing

460

(4)

Series with a Trend

460

(1)

Series with a Trend and Seasonality

461

(1)

Series with Seasonality (No Trend)

462

(2)

Problems

464

(9)

Part VII Data Analytics

Chapter 19 Social Network Analytics

473

(22)

19.1 Introduction

473

(2)

19.2 Directed vs. Undirected Networks

475

(1)

19.3 Visualizing and Analyzing Networks

476

(4)

Plot Layout

476

(2)

Edge List

478

(1)

Adjacency Matrix

479

(1)

Using Network Data in Classification and Prediction

479

(1)

19.4 Social Data Metrics and Taxonomy

480

(5)

Node-Level Centrality Metrics

480

(1)

Egocentric Network

481

(2)

Network Metrics

483

(2)

19.5 Using Network Metrics in Prediction and Classification

485

(6)

Link Prediction

485

(1)

Entity Resolution

485

(3)

Collaborative Filtering

488

(3)

19.6 Collecting Social Network Data with Python

491

(1)

19.7 Advantages and Disadvantages

491

(3)

Problems

494

(1)

Chapter 20 Text Mining

495

(20)

20.1 Introduction

496

(1)

20.2 The Tabular Representation of Text: Term-Document Matrix and "Bag-of-Words"

496

(1)

20.3 Bag-of-Words vs. Meaning Extraction at Document Level

497

(1)

20.4 Preprocessing the Text

498

(8)

Tokenization

499

(2)

Text Reduction

501

(1)

Presence/Absence vs. Frequency

501

(1)

Term Frequency-Inverse Document Frequency (TF-IDF)

502

(3)

From Terms to Concepts: Latent Semantic Indexing

505

(1)

Extracting Meaning

505

(1)

20.5 Implementing Data Mining Methods

506

(1)

20.6 Example: Online Discussions on Autos and Electronics

506

(4)

Importing and Labeling the Records

507

(1)

Text Preprocessing in Python

508

(1)

Producing a Concept Matrix

508

(1)

Fitting a Predictive Model

508

(1)

Prediction

509

(1)

20.7 Summary

510

(1)

Problems

511

(4)

Part VIII Cases

Chapter 21 Cases

515

(34)

21.1 Charles Book Club

515

(7)

The Book Industry

515

(1)

Database Marketing at Charles

516

(2)

Data Mining Techniques

518

(2)

Assignment

520

(2)

21.2 German Credit

522

(5)

Background

522

(1)

Data

522

(4)

Assignment

526

(1)

21.3 Tayko Software Cataloger

527

(4)

Background

527

(1)

The Mailing Experiment

527

(1)

Data

527

(2)

Assignment

529

(2)

21.4 Political Persuasion

531

(4)

Background

531

(1)

Predictive Analytics Arrives in US Politics

531

(1)

Political Targeting

531

(1)

Uplift

532

(1)

Data

533

(1)

Assignment

533

(2)

21.5 Taxi Cancellations

535

(2)

Business Situation

535

(1)

Assignment

535

(2)

21.6 Segmenting Consumers of Bath Soap

537

(4)

Business Situation

537

(1)

Key Problems

537

(1)

Data

538

(1)

Measuring Brand Loyalty

538

(1)

Assignment

538

(3)

21.7 Direct-Mail Fundraising

541

(3)

Background

541

(1)

Data

541

(1)

Assignment

541

(3)

21.8 Catalog Cross-Selling

544

(2)

Background

544

(1)

Assignment

544

(2)

21.9 Time Series Case: Forecasting Public Transportation Demand

546

(3)

Background

546

(1)

Problem Description

546

(1)

Available Data

546

(1)

Assignment Goat

546

(1)

Assignment

547

(1)

Tips and Suggested Steps

547

(2)

References

549

(2)

Data Files Used in the Book

551

(4)

Python Utilities Functions

555

(10)

Index

565

GALIT SHMUELI, PHD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 100 publications including books.

PETER C. BRUCE is President and Founder of the Institute for Statistics Education at Statistics.com. He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective (Wiley) and co-author of Practical Statistics for Data Scientists: 50 Essential Concepts (O'Reilly).

PETER GEDECK, PHD, is a Senior Data Scientist at Collaborative Drug Discovery, where he helps develop cloud-based software to manage the huge amount of data involved in the drug discovery process. He also teaches data mining at Statistics.com.

NITIN R. PATEL, PhD, is cofounder and board member of Cytel Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years.

Data Mining for Business Analytics: Concepts, Techniques and Applications in Python [Kõva köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv