Preface |
|
xi | |
About The Authors |
|
xv | |
Acknowledgements |
|
xvii | |
Chapter 1 Introduction To Data Science |
|
1 | (8) |
|
|
1 | (1) |
|
1.2 What is Data Science? |
|
|
1 | (1) |
|
1.3 The Data Science Methodology |
|
|
2 | (3) |
|
|
5 | (3) |
|
|
6 | (1) |
|
|
6 | (1) |
|
|
6 | (1) |
|
|
7 | (1) |
|
|
7 | (1) |
|
|
7 | (1) |
|
|
8 | (1) |
Chapter 2 The Basics Of Python And R |
|
9 | (20) |
|
|
9 | (1) |
|
2.2 Basics of Coding in Python |
|
|
9 | (8) |
|
2.2.1 Using Comments in Python |
|
|
9 | (1) |
|
2.2.2 Executing Commands in Python |
|
|
10 | (1) |
|
2.2.3 Importing Packages in Python |
|
|
11 | (1) |
|
2.2.4 Getting Data into Python |
|
|
12 | (1) |
|
2.2.5 Saving Output in Python |
|
|
13 | (1) |
|
2.2.6 Accessing Records and Variables in Python |
|
|
14 | (1) |
|
2.2.7 Setting Up Graphics in Python |
|
|
15 | (2) |
|
2.3 Downloading R and RStudio |
|
|
17 | (2) |
|
2.4 Basics of Coding in R |
|
|
19 | (7) |
|
2.4.1 Using Comments in R |
|
|
19 | (1) |
|
2.4.2 Executing Commands in R |
|
|
20 | (1) |
|
2.4.3 Importing Packages in R |
|
|
20 | (1) |
|
2.4.4 Getting Data into R |
|
|
21 | (2) |
|
|
23 | (1) |
|
2.4.6 Accessing Records and Variables in R |
|
|
24 | (2) |
|
|
26 | (1) |
|
|
26 | (3) |
Chapter 3 Data Preparation |
|
29 | (18) |
|
3.1 The Bank Marketing Data Set |
|
|
29 | (1) |
|
3.2 The Problem Understanding Phase |
|
|
29 | (2) |
|
3.2.1 Clearly Enunciate the Project Objectives |
|
|
29 | (1) |
|
3.2.2 Translate These Objectives into a Data Science Problem |
|
|
30 | (1) |
|
3.3 Data Preparation Phase |
|
|
31 | (1) |
|
3.4 Adding an Index Field |
|
|
31 | (2) |
|
3.4.1 How to Add an Index Field Using Python |
|
|
31 | (1) |
|
3.4.2 How to Add an Index Field Using R |
|
|
32 | (1) |
|
3.5 Changing Misleading Field Values |
|
|
33 | (3) |
|
3.5.1 How to Change Misleading Field Values Using Python |
|
|
34 | (1) |
|
3.5.2 How to Change Misleading Field Values Using R |
|
|
34 | (2) |
|
3.6 Reexpression of Categorical Data as Numeric |
|
|
36 | (3) |
|
3.6.1 How to Reexpress Categorical Field Values Using Python |
|
|
36 | (2) |
|
3.6.2 How to Reexpress Categorical Field Values Using R |
|
|
38 | (1) |
|
3.7 Standardizing the Numeric Fields |
|
|
39 | (1) |
|
3.7.1 How to Standardize Numeric Fields Using Python |
|
|
40 | (1) |
|
3.7.2 How to Standardize Numeric Fields Using R |
|
|
40 | (1) |
|
|
40 | (3) |
|
3.8.1 How to Identify Outliers Using Python |
|
|
41 | (1) |
|
3.8.2 How to Identify Outliers Using R |
|
|
42 | (1) |
|
|
43 | (1) |
|
|
44 | (3) |
Chapter 4 Exploratory Data Analysis |
|
47 | (22) |
|
|
47 | (1) |
|
4.2 Bar Graphs with Response Overlay |
|
|
47 | (4) |
|
4.2.1 How to Construct a Bar Graph with Overlay Using Python |
|
|
49 | (1) |
|
4.2.2 How to Construct a Bar Graph with Overlay Using R |
|
|
50 | (1) |
|
|
51 | (2) |
|
4.3.1 How to Construct Contingency Tables Using Python |
|
|
52 | (1) |
|
4.3.2 How to Construct Contingency Tables Using R |
|
|
53 | (1) |
|
4.4 Histograms with Response Overlay |
|
|
53 | (5) |
|
4.4.1 How to Construct Histograms with Overlay Using Python |
|
|
55 | (3) |
|
4.4.2 How to Construct Histograms with Overlay Using R |
|
|
58 | (1) |
|
4.5 Binning Based on Predictive Value |
|
|
58 | (5) |
|
4.5.1 How to Perform Binning Based on Predictive Value Using Python |
|
|
59 | (3) |
|
4.5.2 How to Perform Binning Based on Predictive Value Using R |
|
|
62 | (1) |
|
|
63 | (1) |
|
|
63 | (6) |
Chapter 5 Preparing To Model The Data |
|
69 | (12) |
|
|
69 | (1) |
|
5.2 Partitioning the Data |
|
|
69 | (3) |
|
5.2.1 How to Partition the Data in Python |
|
|
70 | (1) |
|
5.2.2 How to Partition the Data in R |
|
|
71 | (1) |
|
5.3 Validating your Partition |
|
|
72 | (1) |
|
5.4 Balancing the Training Data Set |
|
|
73 | (4) |
|
5.4.1 How to Balance the Training Data Set in Python |
|
|
74 | (1) |
|
5.4.2 How to Balance the Training Data Set in R |
|
|
75 | (2) |
|
5.5 Establishing Baseline Model Performance |
|
|
77 | (1) |
|
|
78 | (1) |
|
|
78 | (3) |
Chapter 6 Decision Trees |
|
81 | (16) |
|
6.1 Introduction to Decision Trees |
|
|
81 | (2) |
|
6.2 Classification and Regression Trees |
|
|
83 | (5) |
|
6.2.1 How to Build CART Decision Trees Using Python |
|
|
84 | (2) |
|
6.2.2 How to Build CART Decision Trees Using R |
|
|
86 | (2) |
|
6.3 The C5.0 Algorithm for Building Decision Trees |
|
|
88 | (3) |
|
6.3.1 How to Build C5.0 Decision Trees Using Python |
|
|
89 | (1) |
|
6.3.2 How to Build C5.0 Decision Trees Using R |
|
|
90 | (1) |
|
|
91 | (2) |
|
6.4.1 How to Build Random Forests in Python |
|
|
92 | (1) |
|
6.4.2 How to Build Random Forests in R |
|
|
92 | (1) |
|
|
93 | (1) |
|
|
93 | (4) |
Chapter 7 Model Evaluation |
|
97 | (16) |
|
7.1 Introduction to Model Evaluation |
|
|
97 | (1) |
|
7.2 Classification Evaluation Measures |
|
|
97 | (2) |
|
7.3 Sensitivity and Specificity |
|
|
99 | (1) |
|
7.4 Precision, Recall, and Fβ Scores |
|
|
99 | (1) |
|
7.5 Method for Model Evaluation |
|
|
100 | (1) |
|
7.6 An Application of Model Evaluation |
|
|
100 | (4) |
|
7.6.1 How to Perform Model Evaluation Using R |
|
|
103 | (1) |
|
7.7 Accounting for Unequal Error Costs |
|
|
104 | (2) |
|
7.7.1 Accounting for Unequal Error Costs Using R |
|
|
105 | (1) |
|
7.8 Comparing Models with and without Unequal Error Costs |
|
|
106 | (1) |
|
7.9 Data-Driven Error Costs |
|
|
107 | (2) |
|
|
109 | (4) |
Chapter 8 Naive Bayes Classification |
|
113 | (16) |
|
8.1 Introduction to Naive Bayes |
|
|
113 | (1) |
|
|
113 | (1) |
|
8.3 Maximum a Posteriori Hypothesis |
|
|
114 | (1) |
|
8.4 Class Conditional Independence |
|
|
114 | (1) |
|
8.5 Application of Naive Bayes Classification |
|
|
115 | (10) |
|
8.5.1 Naive Bayes in Python |
|
|
121 | (2) |
|
|
123 | (2) |
|
|
125 | (1) |
|
|
126 | (3) |
Chapter 9 Neural Networks |
|
129 | (12) |
|
9.1 Introduction to Neural Networks |
|
|
129 | (1) |
|
9.2 The Neural Network Structure |
|
|
129 | (2) |
|
9.3 Connection Weights and the Combination Function |
|
|
131 | (2) |
|
9.4 The Sigmoid Activation Function |
|
|
133 | (1) |
|
|
134 | (1) |
|
9.6 An Application of a Neural Network Model |
|
|
134 | (2) |
|
9.7 Interpreting the Weights in a Neural Network Model |
|
|
136 | (1) |
|
9.8 How to Use Neural Networks in R |
|
|
137 | (1) |
|
|
138 | (1) |
|
|
138 | (3) |
Chapter 10 Clustering |
|
141 | (10) |
|
|
141 | (1) |
|
10.2 Introduction to the K-Means Clustering Algorithm |
|
|
142 | (1) |
|
10.3 An Application of K-Means Clustering |
|
|
143 | (1) |
|
|
144 | (1) |
|
10.5 How to Perform K-Means Clustering Using Python |
|
|
145 | (2) |
|
10.6 How to Perform K-Means Clustering Using R |
|
|
147 | (2) |
|
|
149 | (2) |
Chapter 11 Regression Modeling |
|
151 | (16) |
|
|
151 | (1) |
|
11.2 Descriptive Regression Modeling |
|
|
151 | (1) |
|
11.3 An Application of Multiple Regression Modeling |
|
|
152 | (2) |
|
11.4 How to Perform Multiple Regression Modeling Using Python |
|
|
154 | (2) |
|
11.5 How to Perform Multiple Regression Modeling Using R |
|
|
156 | (1) |
|
11.6 Model Evaluation for Estimation |
|
|
157 | (4) |
|
11.6.1 How to Perform Estimation Model Evaluation Using Python |
|
|
159 | (1) |
|
11.6.2 How to Perform Estimation Model Evaluation Using R |
|
|
160 | (1) |
|
|
161 | (1) |
|
11.7.1 How to Perform Stepwise Regression Using R |
|
|
162 | (1) |
|
11.8 Baseline Models for Regression |
|
|
162 | (1) |
|
|
163 | (1) |
|
|
164 | (3) |
Chapter 12 Dimension Reduction |
|
167 | (20) |
|
12.1 The Need for Dimension Reduction |
|
|
167 | (1) |
|
|
168 | (3) |
|
12.3 Identifying Multicollinearity Using Variance Inflation Factors |
|
|
171 | (4) |
|
12.3.1 How to Identify Multicollinearity Using Python |
|
|
172 | (1) |
|
12.3.2 How to Identify Multicollinearity in R |
|
|
173 | (2) |
|
12.4 Principal Components Analysis |
|
|
175 | (1) |
|
12.5 An Application of Principal Components Analysis |
|
|
175 | (1) |
|
12.6 How Many Components Should We Extract? |
|
|
176 | (2) |
|
12.6.1 The Eigenvalue Criterion |
|
|
176 | (1) |
|
12.6.2 The Proportion of Variance Explained Criterion |
|
|
177 | (1) |
|
12.7 Performing Pca with K = 4 |
|
|
178 | (1) |
|
12.8 Validation of the Principal Components |
|
|
178 | (1) |
|
12.9 How to Perform Principal Components Analysis Using Python |
|
|
179 | (2) |
|
12.10 How to Perform Principal Components Analysis Using R |
|
|
181 | (2) |
|
12.11 When is Multicollinearity Not a Problem? |
|
|
183 | (1) |
|
|
184 | (1) |
|
|
184 | (3) |
Chapter 13 Generalized Linear Models |
|
187 | (12) |
|
13.1 An Overview of General Linear Models |
|
|
187 | (1) |
|
13.2 Linear Regression as a General Linear Model |
|
|
188 | (1) |
|
13.3 Logistic Regression as a General Linear Model |
|
|
188 | (1) |
|
13.4 An Application of Logistic Regression Modeling |
|
|
189 | (3) |
|
13.4.1 How to Perform Logistic Regression Using Python |
|
|
190 | (1) |
|
13.4.2 How to Perform Logistic Regression Using R |
|
|
191 | (1) |
|
|
192 | (1) |
|
13.6 An Application of Poisson Regression Modeling |
|
|
192 | (3) |
|
13.6.1 How to Perform Poisson Regression Using Python |
|
|
193 | (1) |
|
13.6.2 How to Perform Poisson Regression Using R |
|
|
194 | (1) |
|
|
195 | (1) |
|
|
195 | (4) |
Chapter 14 Association Rules |
|
199 | (16) |
|
14.1 Introduction to Association Rules |
|
|
199 | (1) |
|
14.2 A Simple Example of Association Rule Mining |
|
|
200 | (1) |
|
14.3 Support, Confidence, and Lift |
|
|
200 | (2) |
|
14.4 Mining Association Rules |
|
|
202 | (5) |
|
14.4.1 How to Mine Association Rules Using R |
|
|
203 | (4) |
|
14.5 Confirming Our Metrics |
|
|
207 | (1) |
|
14.6 The Confidence Difference Criterion |
|
|
208 | (1) |
|
14.6.1 How to Apply the Confidence Difference Criterion Using R |
|
|
208 | (1) |
|
14.7 The Confidence Quotient Criterion |
|
|
209 | (2) |
|
14.7.1 How to Apply the Confidence Quotient Criterion Using R |
|
|
210 | (1) |
|
|
211 | (1) |
|
|
211 | (4) |
Appendix Data Summarization And Visualization |
|
215 | (16) |
|
Part 1: Summarization 1: Building Blocks of Data Analysis |
|
|
215 | (2) |
|
Part 2: Visualization: Graphs and Tables for Summarizing and Organizing Data |
|
|
217 | (5) |
|
Part 3: Summarization 2: Measures of Center, Variability, and Position |
|
|
222 | (3) |
|
Part 4: Summarization and Visualization of Bivariate Elationships |
|
|
225 | (6) |
Index |
|
231 | |