Klienditugi: 7440010 (E-R 10-18)

E-raamat: Data Science Using Python and R [Wiley Online]

Chantal D. Larose (Eastern Connecticut State University (ECSU)), Daniel T. Larose (Central Connecticut State University)

Teised formaadid

Other digital carrier (Hind: 122,88 €) - 29-Mar-2019

Formaat: 256 pages
Sari: Wiley Series on Methods and Applications in Data Mining
Ilmumisaeg: 14-May-2019
Kirjastus: John Wiley & Sons Inc
ISBN-10: 1119526868
ISBN-13: 9781119526865

Teised raamatud teemal:

Data mining

Wiley Online
Hind: 121,54 €*
* hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks

Formaat: 256 pages
Sari: Wiley Series on Methods and Applications in Data Mining
Ilmumisaeg: 14-May-2019
Kirjastus: John Wiley & Sons Inc
ISBN-10: 1119526868
ISBN-13: 9781119526865

Teised raamatud teemal:

Data mining

Rohkem infot Wiley Online kohta

Raamatu kodulehekülg: https://onlinelibrary.wiley.com/doi/book/10.1002/9781119526865

Learn data science by doing data science!

Data Science Using Python and R will get you plugged into the world’s two most widespread open-source platforms for data science: Python and R.

Data science is hot. Bloomberg called data scientist “the hottest job in America.” Python and R are the top two open-source data science tools in the world. In Data Science Using Python and R, you will learn step-by-step how to produce hands-on solutions to real-world business problems, using state-of-the-art techniques.

Data Science Using Python and R is written for the general reader with no previous analytics or programming experience. An entire chapter is dedicated to learning the basics of Python and R. Then, each chapter presents step-by-step instructions and walkthroughs for solving data science problems using Python and R.

Those with analytics experience will appreciate having a one-stop shop for learning how to do data science using Python and R. Topics covered include data preparation, exploratory data analysis, preparing to model the data, decision trees, model evaluation, misclassification costs, naïve Bayes classification, neural networks, clustering, regression modeling, dimension reduction, and association rules mining.

Further, exciting new topics such as random forests and general linear models are also included. The book emphasizes data-driven error costs to enhance profitability, which avoids the common pitfalls that may cost a company millions of dollars.

Data Science Using Python and R provides exercises at the end of every chapter, totaling over 500 exercises in the book. Readers will therefore have plenty of opportunity to test their newfound data science skills and expertise. In the Hands-on Analysis exercises, readers are challenged to solve interesting business problems using real-world data sets.

Preface

About The Authors

Acknowledgements

xvii

Chapter 1 Introduction To Data Science

(8)

1.1 Why Data Science?

(1)

1.2 What is Data Science?

(1)

1.3 The Data Science Methodology

(3)

1.4 Data Science Tasks

(3)

1.4.1 Description

(1)

1.4.2 Estimation

(1)

1.4.3 Classification

(1)

1.4.4 Clustering

(1)

1.4.5 Prediction

(1)

1.4.6 Association

(1)

Exercises

(1)

Chapter 2 The Basics Of Python And R

(20)

2.1 Downloading Python

(1)

2.2 Basics of Coding in Python

(8)

2.2.1 Using Comments in Python

(1)

2.2.2 Executing Commands in Python

(1)

2.2.3 Importing Packages in Python

(1)

2.2.4 Getting Data into Python

(1)

2.2.5 Saving Output in Python

(1)

2.2.6 Accessing Records and Variables in Python

(1)

2.2.7 Setting Up Graphics in Python

(2)

2.3 Downloading R and RStudio

(2)

2.4 Basics of Coding in R

(7)

2.4.1 Using Comments in R

(1)

2.4.2 Executing Commands in R

(1)

2.4.3 Importing Packages in R

(1)

2.4.4 Getting Data into R

(2)

2.4.5 Saving Output in R

(1)

2.4.6 Accessing Records and Variables in R

(2)

References

(1)

Exercises

(3)

Chapter 3 Data Preparation

(18)

3.1 The Bank Marketing Data Set

(1)

3.2 The Problem Understanding Phase

(2)

3.2.1 Clearly Enunciate the Project Objectives

(1)

3.2.2 Translate These Objectives into a Data Science Problem

(1)

3.3 Data Preparation Phase

(1)

3.4 Adding an Index Field

(2)

3.4.1 How to Add an Index Field Using Python

(1)

3.4.2 How to Add an Index Field Using R

(1)

3.5 Changing Misleading Field Values

(3)

3.5.1 How to Change Misleading Field Values Using Python

(1)

3.5.2 How to Change Misleading Field Values Using R

(2)

3.6 Reexpression of Categorical Data as Numeric

(3)

3.6.1 How to Reexpress Categorical Field Values Using Python

(2)

3.6.2 How to Reexpress Categorical Field Values Using R

(1)

3.7 Standardizing the Numeric Fields

(1)

3.7.1 How to Standardize Numeric Fields Using Python

(1)

3.7.2 How to Standardize Numeric Fields Using R

(1)

3.8 Identifying Outliers

(3)

3.8.1 How to Identify Outliers Using Python

(1)

3.8.2 How to Identify Outliers Using R

(1)

References

(1)

Exercises

(3)

Chapter 4 Exploratory Data Analysis

(22)

4.1 EDA Versus HT

(1)

4.2 Bar Graphs with Response Overlay

(4)

4.2.1 How to Construct a Bar Graph with Overlay Using Python

(1)

4.2.2 How to Construct a Bar Graph with Overlay Using R

(1)

4.3 Contingency Tables

(2)

4.3.1 How to Construct Contingency Tables Using Python

(1)

4.3.2 How to Construct Contingency Tables Using R

(1)

4.4 Histograms with Response Overlay

(5)

4.4.1 How to Construct Histograms with Overlay Using Python

(3)

4.4.2 How to Construct Histograms with Overlay Using R

(1)

4.5 Binning Based on Predictive Value

(5)

4.5.1 How to Perform Binning Based on Predictive Value Using Python

(3)

4.5.2 How to Perform Binning Based on Predictive Value Using R

(1)

References

(1)

Exercises

(6)

Chapter 5 Preparing To Model The Data

(12)

5.1 The Story So Far

(1)

5.2 Partitioning the Data

(3)

5.2.1 How to Partition the Data in Python

(1)

5.2.2 How to Partition the Data in R

(1)

5.3 Validating your Partition

(1)

5.4 Balancing the Training Data Set

(4)

5.4.1 How to Balance the Training Data Set in Python

(1)

5.4.2 How to Balance the Training Data Set in R

(2)

5.5 Establishing Baseline Model Performance

(1)

References

(1)

Exercises

(3)

Chapter 6 Decision Trees

(16)

6.1 Introduction to Decision Trees

(2)

6.2 Classification and Regression Trees

(5)

6.2.1 How to Build CART Decision Trees Using Python

(2)

6.2.2 How to Build CART Decision Trees Using R

(2)

6.3 The C5.0 Algorithm for Building Decision Trees

(3)

6.3.1 How to Build C5.0 Decision Trees Using Python

(1)

6.3.2 How to Build C5.0 Decision Trees Using R

(1)

6.4 Random Forests

(2)

6.4.1 How to Build Random Forests in Python

(1)

6.4.2 How to Build Random Forests in R

(1)

References

(1)

Exercises

(4)

Chapter 7 Model Evaluation

(16)

7.1 Introduction to Model Evaluation

(1)

7.2 Classification Evaluation Measures

(2)

7.3 Sensitivity and Specificity

(1)

7.4 Precision, Recall, and Fβ Scores

(1)

7.5 Method for Model Evaluation

100

(1)

7.6 An Application of Model Evaluation

100

(4)

7.6.1 How to Perform Model Evaluation Using R

103

(1)

7.7 Accounting for Unequal Error Costs

104

(2)

7.7.1 Accounting for Unequal Error Costs Using R

105

(1)

7.8 Comparing Models with and without Unequal Error Costs

106

(1)

7.9 Data-Driven Error Costs

107

(2)

Exercises

109

(4)

Chapter 8 Naive Bayes Classification

113

(16)

8.1 Introduction to Naive Bayes

113

(1)

8.2 Bayes Theorem

113

(1)

8.3 Maximum a Posteriori Hypothesis

114

(1)

8.4 Class Conditional Independence

114

(1)

8.5 Application of Naive Bayes Classification

115

(10)

8.5.1 Naive Bayes in Python

121

(2)

8.5.2 Naive Bayes in R

123

(2)

References

125

(1)

Exercises

126

(3)

Chapter 9 Neural Networks

129

(12)

9.1 Introduction to Neural Networks

129

(1)

9.2 The Neural Network Structure

129

(2)

9.3 Connection Weights and the Combination Function

131

(2)

9.4 The Sigmoid Activation Function

133

(1)

9.5 Backpropagation

134

(1)

9.6 An Application of a Neural Network Model

134

(2)

9.7 Interpreting the Weights in a Neural Network Model

136

(1)

9.8 How to Use Neural Networks in R

137

(1)

References

138

(1)

Exercises

138

(3)

Chapter 10 Clustering

141

(10)

10.1 What is Clustering?

141

(1)

10.2 Introduction to the K-Means Clustering Algorithm

142

(1)

10.3 An Application of K-Means Clustering

143

(1)

10.4 Cluster Validation

144

(1)

10.5 How to Perform K-Means Clustering Using Python

145

(2)

10.6 How to Perform K-Means Clustering Using R

147

(2)

Exercises

149

(2)

Chapter 11 Regression Modeling

151

(16)

11.1 The Estimation Task

151

(1)

11.2 Descriptive Regression Modeling

151

(1)

11.3 An Application of Multiple Regression Modeling

152

(2)

11.4 How to Perform Multiple Regression Modeling Using Python

154

(2)

11.5 How to Perform Multiple Regression Modeling Using R

156

(1)

11.6 Model Evaluation for Estimation

157

(4)

11.6.1 How to Perform Estimation Model Evaluation Using Python

159

(1)

11.6.2 How to Perform Estimation Model Evaluation Using R

160

(1)

11.7 Stepwise Regression

161

(1)

11.7.1 How to Perform Stepwise Regression Using R

162

(1)

11.8 Baseline Models for Regression

162

(1)

References

163

(1)

Exercises

164

(3)

Chapter 12 Dimension Reduction

167

(20)

12.1 The Need for Dimension Reduction

167

(1)

12.2 Multicollinearity

168

(3)

12.3 Identifying Multicollinearity Using Variance Inflation Factors

171

(4)

12.3.1 How to Identify Multicollinearity Using Python

172

(1)

12.3.2 How to Identify Multicollinearity in R

173

(2)

12.4 Principal Components Analysis

175

(1)

12.5 An Application of Principal Components Analysis

175

(1)

12.6 How Many Components Should We Extract?

176

(2)

12.6.1 The Eigenvalue Criterion

176

(1)

12.6.2 The Proportion of Variance Explained Criterion

177

(1)

12.7 Performing Pca with K = 4

178

(1)

12.8 Validation of the Principal Components

178

(1)

12.9 How to Perform Principal Components Analysis Using Python

179

(2)

12.10 How to Perform Principal Components Analysis Using R

181

(2)

12.11 When is Multicollinearity Not a Problem?

183

(1)

References

184

(1)

Exercises

184

(3)

Chapter 13 Generalized Linear Models

187

(12)

13.1 An Overview of General Linear Models

187

(1)

13.2 Linear Regression as a General Linear Model

188

(1)

13.3 Logistic Regression as a General Linear Model

188

(1)

13.4 An Application of Logistic Regression Modeling

189

(3)

13.4.1 How to Perform Logistic Regression Using Python

190

(1)

13.4.2 How to Perform Logistic Regression Using R

191

(1)

13.5 Poisson Regression

192

(1)

13.6 An Application of Poisson Regression Modeling

192

(3)

13.6.1 How to Perform Poisson Regression Using Python

193

(1)

13.6.2 How to Perform Poisson Regression Using R

194

(1)

Reference

195

(1)

Exercises

195

(4)

Chapter 14 Association Rules

199

(16)

14.1 Introduction to Association Rules

199

(1)

14.2 A Simple Example of Association Rule Mining

200

(1)

14.3 Support, Confidence, and Lift

200

(2)

14.4 Mining Association Rules

202

(5)

14.4.1 How to Mine Association Rules Using R

203

(4)

14.5 Confirming Our Metrics

207

(1)

14.6 The Confidence Difference Criterion

208

(1)

14.6.1 How to Apply the Confidence Difference Criterion Using R

208

(1)

14.7 The Confidence Quotient Criterion

209

(2)

14.7.1 How to Apply the Confidence Quotient Criterion Using R

210

(1)

References

211

(1)

Exercises

211

(4)

Appendix Data Summarization And Visualization

215

(16)

Part 1: Summarization 1: Building Blocks of Data Analysis

215

(2)

Part 2: Visualization: Graphs and Tables for Summarizing and Organizing Data

217

(5)

Part 3: Summarization 2: Measures of Center, Variability, and Position

222

(3)

Part 4: Summarization and Visualization of Bivariate Elationships

225

(6)

Index

231

CHANTAL D. LAROSE, PHD, is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU). She has co-authored three books on data science and predictive analytics and helped develop data science programs at ECSU and SUNY New Paltz. Her PhD dissertation, Model-Based Clustering of Incomplete Data, tackles the persistent problem of trying to do data science with incomplete data.

DANIEL T. LAROSE, PHD, is a Professor of Data Science and Statistics and Director of the Data Science programs at Central Connecticut State University. He has published many books on data science, data mining, predictive analytics, and statistics. His consulting clients include The Economist magazine, Forbes Magazine, the CIT Group, and Microsoft.

Püsilink: https://www.kriso.ee/db/9781119526865_pe.html

Märksõnad:

E-raamat: Data Science Using Python and R [Wiley Online]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Kirjastuste teemad

Vali ostukorv