Muutke küpsiste eelistusi

E-raamat: Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro

(University of Maryland, College Park), , (Massachusetts Institute of Technology),
  • Formaat: PDF+DRM
  • Ilmumisaeg: 09-May-2016
  • Kirjastus: John Wiley & Sons Inc
  • Keel: eng
  • ISBN-13: 9781118956625
  • Formaat - PDF+DRM
  • Hind: 134,49 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: PDF+DRM
  • Ilmumisaeg: 09-May-2016
  • Kirjastus: John Wiley & Sons Inc
  • Keel: eng
  • ISBN-13: 9781118956625

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® presents an  applied and interactive approach to data mining.

Featuring hands-on applications with JMP Pro®, a statistical package from the SAS Institute, the book uses engaging, real-world examples to build a theoretical and practical understanding of key data mining methods, especially predictive models for classification and prediction. Topics include data visualization, dimension reduction techniques, clustering, linear and logistic regression, classification and regression trees, discriminant analysis, naive Bayes, neural networks, uplift modeling, ensemble models, and time series forecasting.

Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® also includes:





Detailed summaries that supply an outline of key topics at the beginning of each chapter End-of-chapter examples and exercises that allow readers to expand their comprehension of the presented material Data-rich case studies to illustrate various applications of data mining techniques A companion website with over two dozen data sets, exercises and case study solutions, and slides for instructors www.dataminingbook.com

Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® is an excellent textbook for advanced undergraduate and graduate-level courses on data mining, predictive analytics, and business analytics. The book is also a one-of-a-kind resource for data scientists, analysts, researchers, and practitioners working with analytics in the fields of management, finance, marketing, information technology, healthcare, education, and any other data-rich field.
Foreword xvii
Preface xix
Acknowledgments xxi
Part I Preliminaries
1 Introduction
3(11)
1.1 What Is Business Analytics?
3(2)
Who Uses Predictive Analytics?
4(1)
1.2 What Is Data Mining?
5(1)
1.3 Data Mining and Related Terms
5(1)
1.4 Big Data
6(1)
1.5 Data Science
7(1)
1.6 Why Are There So Many Different Methods?
7(1)
1.7 Terminology and Notation
8(2)
1.8 Roadmap to This Book
10(4)
Order of Topics
11(4)
Using IMP Pro, Statistical Discovery Software from SAS
11(3)
2 Overview of the Data Mining Process
14(37)
2.1 Introduction
14(1)
2.2 Core Ideas in Data Mining
15(2)
Classification
15(1)
Prediction
15(1)
Association Rules and Recommendation Systems
15(1)
Predictive Analytics
16(1)
Data Reduction and Dimension Reduction
16(1)
Data Exploration and Visualization
16(1)
Supervised and Unsupervised Learning
16(1)
2.3 The Steps in Data Mining
17(2)
2.4 Preliminary Steps
19(6)
Organization of Datasets
19(1)
Sampling from a Database
19(1)
Oversampling Rare Events in Classification Tasks
19(1)
Preprocessing and Cleaning the Data
20(5)
Changing Modeling Types in IMP
20(5)
Standardizing Data in IMP
25(1)
2.5 Predictive Power and Overfitting
25(4)
Creation and Use of Data Partitions
25(2)
Partitioning Data for Crossvalidation in JMP Pro
27(1)
Overfitting
27(2)
2.6 Building a Predictive Model with JMP Pro
29(9)
Predicting Home Values in a Boston Neighborhood
29(1)
Modeling Process
30(10)
Setting the Random Seed in JMP
34(4)
2.7 Using JMP Pro for Data Mining
38(2)
2.8 Automating Data Mining Solutions
40(4)
Data Mining Software Tools: the State of the Market by Herb Edelstein
41(3)
Problems
44(7)
Part II Data Exploration And Dimension Reduction
3 Data Visualization
51(30)
3.1 Uses of Data Visualization
51(1)
3.2 Data Examples
52(2)
Example 1: Boston Housing Data
53(1)
Example 2: Ridership on Amtrak Trains
53(1)
3.3 Basic Charts: Bar Charts, Line Graphs, and Scatterplots
54(7)
Using The JMP Graph Builder
54(2)
Distribution Plots: Boxplots and Histograms
56(3)
Tools for Data Visualization in JMP
59(1)
Heatmaps (Color Maps and Cell Plots): Visualizing Correlations and Missing Values
59(2)
3.4 Multidimensional Visualization
61(12)
Adding Variables: Color, Size, Shape, Multiple Panels, and Animation
62(3)
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering
65(3)
Reference: Trend Lines and Labels
68(1)
Adding Trendlines in the Graph Builder
69(1)
Scaling Up: Large Datasets
70(1)
Multivariate Plot: Parallel Coordinates Plot
71(1)
Interactive Visualization
72(1)
3.5 Specialized Visualizations
73(4)
Visualizing Networked Data
74(1)
Visualizing Hierarchical Data: More on Treemaps
75(1)
Visualizing Geographical Data: Maps
76(1)
3.6 Summary of Major Visualizations and Operations, According to Data Mining Goal
77(2)
Prediction
77(1)
Classification
78(1)
Time Series Forecasting
78(1)
Unsupervised Learning
79(1)
Problems
79(2)
4 Dimension Reduction
81(24)
4.1 Introduction
81(1)
4.2 Curse of Dimensionality
82(1)
4.3 Practical Considerations
82(1)
Example 1: House Prices in Boston
82(1)
4.4 Data Summaries
83(4)
Summary Statistics
83(2)
Tabulating Data (Pivot Tables)
85(2)
4.5 Correlation Analysis
87(1)
4.6 Reducing the Number of Categories in Categorical Variables
87(3)
4.7 Converting a Categorical Variable to a Continuous Variable
90(1)
4.8 Principal Components Analysis
90(10)
Example 2: Breakfast Cereals
91(4)
Principal Components
95(2)
Normalizing the Data
97(3)
Using Principal Components for Classification and Prediction
100(1)
4.9 Dimension Reduction Using Regression Models
100(1)
4.10 Dimension Reduction Using Classification and Regression Trees
100(1)
Problems
101(4)
Part III Performance Evaluation
5 Evaluating Predictive Performance
105(28)
5.1 Introduction
105(1)
5.2 Evaluating Predictive Performance
106(3)
Benchmark: The Average
106(1)
Prediction Accuracy Measures
107(1)
Comparing Training and Validation Performance
108(1)
5.3 Judging Classifier Performance
109(11)
Benchmark: The Naive Rule
109(1)
Class Separation
109(1)
The Classification Matrix
109(2)
Using the Validation Data
111(1)
Accuracy Measures
111(1)
Propensities and Cutoff for Classification
112(3)
Cutoff Values for Triage
112(2)
Changing the Cutoff Values for a Confussion Matrix in JMP
114(1)
Performance in Unequal Importance of Classes
115(1)
False-Positive and False-Negative Rates
116(1)
Asymmetric Misclassification Costs
116(4)
Asymmetric Misclassification Costs in JMP
119(1)
Generalization to More Than Two Classes
120(1)
5.4 Judging Ranking Performance
120(3)
Lift Curves
120(2)
Beyond Two Classes
122(1)
Lift Curves Incorporating Costs and Benefits
122(1)
5.5 Oversampling
123(6)
Oversampling the Training Set
126(1)
Stratified Sampling and Oversampling in JMP
126(1)
Evaluating Model Performance Using a Nonoversampled Validation Set
126(1)
Evaluating Model Performance If Only Oversampled Validation Set Exists
127(8)
Applying Sampling Weights in JMP
128(1)
Problems
129(4)
Part IV Prediction And Classification Methods
6 Multiple Linear Regression
133(22)
6.1 Introduction
133(1)
6.2 Explanatory versus Predictive Modeling
134(1)
6.3 Estimating the Regression Equation and Prediction
135(6)
Example: Predicting the Price of Used Toyota Corolla Automobiles
136(5)
Coding of Categorical Variables in Regression
138(2)
Additional Options for Regression Models in JMP
140(1)
6.4 Variable Selection in Linear Regression
141(9)
Reducing the Number of Predictors
141(1)
How to Reduce the Number of Predictors
142(1)
Manual Variable Selection
142(1)
Automated Variable Selection
142(13)
Coding of Categorical Variables in Stepwise Regression
143(2)
Working with the All Possible Models Output
145(2)
When Using a Stopping Algorithm in JMP
147(2)
Other Regression Procedures in JMP Pro-Generalized Regression
149(1)
Problems
150(5)
7 k-Nearest Neighbors (k-NN)
155(12)
7.1 The k-NN Classifier (Categorical Outcome)
155(6)
Determining Neighbors
155(1)
Classification Rule
156(1)
Example: Riding Mowers
156(1)
Choosing k
157(2)
k Nearest Neighbors in JMP Pro
158(1)
The Cutoff Value for Classification
159(2)
k-NN Predictions and Prediction Formulas in JMP Pro
161(1)
k-NN with More Than Two Classes
161(1)
7.2 k-NN for a Numerical Response
161(2)
Pandora
161(2)
7.3 Advantages and Shortcomings of k-NN Algorithms
163(1)
Problems
164(3)
8 The Naive Bayes Classifier
167(16)
8.1 Introduction
167(2)
Naive Bayes Method
167(1)
Cutoff Probability Method
168(1)
Conditional Probability
168(1)
Example 1: Predicting Fraudulent Financial Reporting
168(1)
8.2 Applying the Full (Exact) Bayesian Classifier
169(10)
Using the "Assign to the Most Probable Class" Method
169(1)
Using the Cutoff Probability Method
169(1)
Practical Difficulty with the Complete (Exact) Bayes Procedure
170(1)
Solution: Naive Bayes
170(2)
Example 2: Predicting Fraudulent Financial Reports, Two Predictors
172(2)
Using the JMP Naive Bayes Add-in
174(1)
Example 3: Predicting Delayed Flights
174(5)
8.3 Advantages and Shortcomings of the Naive Bayes Classifier
179(1)
Spam Filtering
179(1)
Problems
180(3)
9 Classification and Regression Trees
183(28)
9.1 Introduction
183(1)
9.2 Classification Trees
184(3)
Recursive Partitioning
184(1)
Example 1: Riding Mowers
185(1)
Categorical Predictors
186(1)
9.3 Growing a Tree
187(5)
Growing a Tree Example
187(1)
Classifying a New Observation
188(4)
Fitting Classification Trees in JMP Pro
191(1)
Growing a Tree with CART
192(1)
9.4 Evaluating the Performance of a Classification Tree
192(1)
Example 2: Acceptance of Personal Loan
192(1)
9.5 Avoiding Overfitting
193(3)
Stopping Tree Growth: CHAID
194(1)
Growing a Full Tree and Pruning It Back
194(2)
How JMP Limits Tree Size
196(1)
9.6 Classification Rules from Trees
196(2)
9.7 Classification Trees for More Than Two Classes
198(1)
9.8 Regression Trees
199(1)
Prediction
199(1)
Evaluating Performance
200(1)
9.9 Advantages and Weaknesses of a Tree
200(4)
9.10 Improving Prediction: Multiple Trees
204(3)
Fitting Ensemble Tree Models in JMP Pro
206(1)
9.11 CART and Measures of Impurity
207(1)
Problems
207(4)
10 Logistic Regression
211(34)
10.1 Introduction
211(2)
Logistic Regression and Consumer Choice Theory
212(1)
10.2 The Logistic Regression Model
213(8)
Example: Acceptance of Personal Loan (Universal Bank)
214(2)
Indicator (Dummy) Variables in JMP
216(1)
Model with a Single Predictor
216(2)
Fitting One Predictor Logistic Models in JMP
218(1)
Estimating the Logistic Model from Data: Multiple Predictors
218(3)
Fitting Logistic Models in JMP with More Than One Predictor
221(1)
10.3 Evaluating Classification Performance
221(2)
Variable Selection
222(1)
10.4 Example of Complete Analysis: Predicting Delayed Flights
223(11)
Data Preprocessing
225(1)
Model Fitting, Estimation and Interpretation-A Simple Model
226(1)
Model Fitting, Estimation and Interpretation-The Full Model
227(2)
Model Performance
229(1)
Variable Selection
230(2)
Regrouping and Recoding Variables in JMP
232(2)
10.5 Appendixes: Logistic Regression for Profiling
234(7)
Appendix A: Why Linear Regression Is Problematic for a Categorical Response
234(2)
Appendix B: Evaluating Explanatory Power
236(2)
Appendix C: Logistic Regression for More Than Two Classes
238(1)
Nominal Classes
238(3)
Problems
241(4)
11 Neural Nets
245(23)
11.1 Introduction
245(1)
11.2 Concept and Structure of a Neural Network
246(1)
11.3 Fitting a Network to Data
246(14)
Example 1: Tiny Dataset
246(2)
Computing Output of Nodes
248(3)
Preprocessing the Data
251(1)
Activation Functions and Data Processing Features in JMP Pro
251(1)
Training the Model
251(3)
Fitting a Neural Network in JMP Pro
254(2)
Using the Output for Prediction and Classification
256(2)
Example 2: Classifying Accident Severity
258(1)
Avoiding overfitting
259(1)
11.4 User Input in JMP Pro
260(4)
Unsupervised Feature Extraction and Deep Learning
263(1)
11.5 Exploring the Relationship between Predictors and Response
264(1)
Understanding Neural Models in JMP Pro
264(1)
11.6 Advantages and Weaknesses of Neural Networks
264(1)
Problems
265(3)
12 Discriminant Analysis
268(17)
12.1 Introduction
268(2)
Example 1: Riding Mowers
269(1)
Example 2: Personal Loan Acceptance (Universal Bank)
269(1)
12.2 Distance of an Observation from a Class
270(2)
12.3 From Distances to Propensities and Classifications
272(3)
Linear Discriminant Analysis in JMP
275(1)
12.4 Classification Performance of Discriminant Analysis
275(2)
12.5 Prior Probabilities
277(1)
12.6 Classifying More Than Two Classes
278(2)
Example 3: Medical Dispatch to Accident Scenes
278(1)
Using Categorical Predictors in Discriminant Analysis in JMP
279(1)
12.7 Advantages and Weaknesses
280(2)
Problems
282(3)
13 Combining Methods: Ensembles and Uplift Modeling
285(16)
13.1 Ensembles
285(5)
Why Ensembles Can Improve Predictive Power
286(1)
The Wisdom of Crowds
287(1)
Simple Averaging
287(1)
Bagging
288(1)
Boosting
288(1)
Creating Ensemble Models in JMP Pro
289(1)
Advantages and Weaknesses of Ensembles
289(1)
13.2 Uplift (Persuasion) Modeling
290(5)
A-B Testing
290(1)
Uplift
290(1)
Gathering the Data
291(1)
A Simple Model
292(1)
Modeling Individual Uplift
293(1)
Using the Results of an Uplift Model
294(1)
Creating Uplift Models in JMP Pro
294(1)
Using the Uplift Platform in JMP Pro
295(1)
13.3 Summary
295(2)
Problems
297(4)
Part V Mining Relationships Among Records
14 Cluster Analysis
301(34)
14.1 Introduction
301(4)
Example: Public Utilities
302(3)
14.2 Measuring Distance between Two Observations
305(4)
Euclidean Distance
305(1)
Normalizing Numerical Measurements
305(1)
Other Distance Measures for Numerical Data
306(2)
Distance Measures for Categorical Data
308(1)
Distance Measures for Mixed Data
308(1)
14.3 Measuring Distance between Two Clusters
309(2)
Minimum Distance
309(1)
Maximum Distance
309(1)
Average Distance
309(1)
Centroid Distance
309(2)
14.4 Hierarchical (Agglomerative) Clustering
311(9)
Hierarchical Clustering in JMP and JMP Pro
311(1)
Hierarchical Agglomerative Clustering Algorithm
312(1)
Single Linkage
312(1)
Complete Linkage
313(1)
Average Linkage
313(1)
Centroid Linkage
313(1)
Ward's Method
314(1)
Dendrograms: Displaying Clustering Process and Results
314(2)
Validating Clusters
316(2)
Two-Way Clustering
318(1)
Limitations of Hierarchical Clustering
319(1)
14.5 Nonhierarchical Clustering: The k-Means Algorithm
320(9)
k-Means Clustering Algorithm
321(1)
Initial Partition into K Clusters
322(15)
K-Means Clustering in JMP
322(7)
Problems
329(6)
Part VI Forecasting Time Series
15 Handling Time Series
335(11)
15.1 Introduction
335(1)
15.2 Descriptive versus Predictive Modeling
336(1)
15.3 Popular Forecasting Methods in Business
337(1)
Combining Methods
337(1)
15.4 Time Series Components
337(4)
Example: Ridership on Amtrak Trains
337(4)
15.5 Data Partitioning and Performance Evaluation
341(2)
Benchmark Performance: Naive Forecasts
342(1)
Generating Future Forecasts
342(4)
Partitioning Time Series Data in JMP and Validating Time Series Models
342(1)
Problems
343(3)
16 Regression-Based Forecasting
346(31)
16.1 A Model with Trend
346(7)
Linear Trend
346(4)
Fitting a Model with Linear Trend in JMP
348(2)
Creating Actual versus Predicted Plots and Residual Plots in JMP
350(1)
Exponential Trend
350(2)
Computing Forecast Errors for Exponential Trend Models
352(1)
Polynomial Trend
352(4)
Fitting a Polynomial Trend in JMP
353(1)
16.2 A Model with Seasonality
353(3)
16.3 A Model with Trend and Seasonality
356(1)
16.4 Autocorrelation and ARIMA Models
356(10)
Computing Autocorrelation
356(4)
Improving Forecasts by Integrating Autocorrelation Information
360(1)
Fitting AR (Autoregression) Models in the JMP Time Series Platform
361(1)
Fitting AR Models to Residuals
361(2)
Evaluating Predictability
363(2)
Summary: Fitting Regression-Based Time Series Models in JMP
365(1)
Problems
366(11)
17 Smoothing Methods
377(25)
17.1 Introduction
377(1)
17.2 Moving Average
378(4)
Centered Moving Average for Visualization
378(1)
Trailing Moving Average for Forecasting
379(3)
Computing a Trailing Moving Average Forecast in JMP
380(2)
Choosing Window Width (w)
382(1)
17.3 Simple Exponential Smoothing
382(5)
Choosing Smoothing Parameter a
383(3)
Fitting Simple Exponential Smoothing Models in JMP
384(2)
Creating Plots for Actual versus Forecasted Series and Residuals Series Using the Graph Builder
386(1)
Relation between Moving Average and Simple Exponential Smoothing
386(1)
17.4 Advanced Exponential Smoothing
387(3)
Series with a Trend
387(1)
Series with a Trend and Seasonality
388(2)
Problems
390(12)
Part VII Cases
18 Cases
402(29)
18.1 Charles Book Club
401(8)
The Book Industry
401(1)
Database Marketing at Charles
402(1)
Data Mining Techniques
403(2)
Assignment
405(4)
18.2 German Credit
409(1)
Background
409(1)
Data
409(1)
Assignment
409(1)
18.3 Tayko Software Cataloger
410(5)
Background
410(3)
The Mailing Experiment
413(1)
Data
413(1)
Assignment
413(2)
18.4 Political Persuasion
415(4)
Background
415(1)
Predictive Analytics Arrives in US Politics
415(1)
Political Targeting
416(1)
Uplift
416(1)
Data
417(1)
Assignment
417(2)
18.5 Taxi Cancellations
419(1)
Business Situation
419(1)
Assignment
419(1)
18.6 Segmenting Consumers of Bath Soap
420(3)
Business Situation
420(1)
Key Problems
421(1)
Data
421(1)
Measuring Brand Loyalty
421(1)
Assignment
421(2)
18.7 Direct-Mail Fundraising
423(2)
Background
423(1)
Data
424(1)
Assignment
425(1)
18.8 Predicting Bankruptcy
425(3)
Predicting Corporate Bankruptcy
426(2)
Assignment
428(1)
18.9 Time Series Case: Forecasting Public Transportation Demand
428(3)
Background
428(1)
Problem Description
428(1)
Available Data
428(1)
Assignment Goal
429(1)
Assignment
429(1)
Tips and Suggested Steps
429(2)
References 431(2)
Data Files Used in the Book 433(2)
Index 435
Galit Shmueli, PhD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 70 journal articles, books, textbooks, and book chapters, including Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner®, Third Edition, also published by Wiley.

Peter C. Bruce is President and Founder of the Institute for Statistics Education at www.statistics.com He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective and co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner ®, Third Edition, both published by Wiley.

Mia Stephens is Academic Ambassador at JMP®, a division of SAS Institute. Prior to joining SAS, she was an adjunct professor of statistics at the University of New Hampshire and a founding member of the North Haven Group LLC, a statistical training and consulting company. She is the co-author of three other books, including Visual Six Sigma: Making Data Analysis Lean, Second Edition, also published by Wiley.

Nitin R. Patel, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years. He is co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner®, Third Edition, also published by Wiley.