About the Authors |
|
xix | |
About the Technical Reviewer |
|
xxi | |
Acknowledgments |
|
xxiii | |
|
Chapter 1 Introduction to Machine Learning and R |
|
|
1 | (30) |
|
1.1 Understanding the Evolution |
|
|
2 | (4) |
|
1.1.1 Statistical Learning |
|
|
2 | (1) |
|
1.1.2 Machine Learning (ML) |
|
|
3 | (1) |
|
1.1.3 Artificial Intelligence (AI) |
|
|
3 | (1) |
|
|
4 | (1) |
|
|
5 | (1) |
|
1.2 Probability and Statistics |
|
|
6 | (12) |
|
1.2.1 Counting and Probability Definition |
|
|
7 | (2) |
|
1.2.2 Events and Relationships |
|
|
9 | (3) |
|
1.2.3 Randomness, Probability, and Distributions |
|
|
12 | (1) |
|
1.2.4 Confidence Interval and Hypothesis Testing |
|
|
13 | (5) |
|
1.3 Getting Started with R |
|
|
18 | (8) |
|
1.3.1 Basic Building Blocks |
|
|
18 | (1) |
|
1.3.2 Data Structures in R |
|
|
19 | (2) |
|
|
21 | (2) |
|
1.3.4 Functions and Apply Family |
|
|
23 | (3) |
|
1.4 Machine Learning Process Flow |
|
|
26 | (2) |
|
|
26 | (1) |
|
|
26 | (1) |
|
|
27 | (1) |
|
|
27 | (1) |
|
|
28 | (1) |
|
|
28 | (1) |
|
|
28 | (3) |
|
Chapter 2 Data Preparation and Exploration |
|
|
31 | (36) |
|
2.1 Planning the Gathering of Data |
|
|
32 | (9) |
|
|
32 | (1) |
|
|
33 | (7) |
|
|
40 | (1) |
|
2.2 Initial Data Analysis (IDA) |
|
|
41 | (10) |
|
2.2.1 Discerning a First Look |
|
|
41 | (2) |
|
2.2.2 Organizing Multiple Sources of Data into One |
|
|
43 | (3) |
|
|
46 | (3) |
|
2.2.4 Supplementing with More Information |
|
|
49 | (1) |
|
|
50 | (1) |
|
2.3 Exploratory Data Analysis |
|
|
51 | (10) |
|
|
52 | (3) |
|
|
55 | (6) |
|
2.4 Case Study: Credit Card Fraud |
|
|
61 | (4) |
|
|
61 | (1) |
|
2.4.2 Data Transformation |
|
|
62 | (1) |
|
|
63 | (2) |
|
|
65 | (1) |
|
|
65 | (2) |
|
Chapter 3 Sampling and Resampling Techniques |
|
|
67 | (62) |
|
3.1 Introduction to Sampling |
|
|
68 | (1) |
|
|
69 | (4) |
|
|
69 | (1) |
|
3.2.2 Sampling Distribution |
|
|
70 | (1) |
|
3.2.3 Population Mean and Variance |
|
|
70 | (1) |
|
3.2.4 Sample Mean and Variance |
|
|
70 | (1) |
|
3.2.5 Pooled Mean and Variance |
|
|
70 | (1) |
|
|
71 | (1) |
|
|
71 | (1) |
|
|
72 | (1) |
|
|
72 | (1) |
|
3.2.10 Sampling Without Replacement (SWOR) |
|
|
72 | (1) |
|
3.2.11 Sampling with Replacement (SWR) |
|
|
72 | (1) |
|
3.3 Credit Card Fraud: Population Statistics |
|
|
73 | (5) |
|
|
73 | (1) |
|
|
74 | (1) |
|
3.3.3 Population Variance |
|
|
74 | (1) |
|
3.3.4 Pooled Mean and Variance |
|
|
75 | (3) |
|
3.4 Business Implications of Sampling |
|
|
78 | (1) |
|
3.4.1 Features of Sampling |
|
|
79 | (1) |
|
3.4.2 Shortcomings of Sampling |
|
|
79 | (1) |
|
3.5 Probability and Non-Probability Sampling |
|
|
79 | (2) |
|
3.5.1 Types of Non-Probability Sampling |
|
|
80 | (1) |
|
3.6 Statistical Theory on Sampling Distributions |
|
|
81 | (8) |
|
3.6.1 Law of Large Numbers: LLN |
|
|
81 | (4) |
|
3.6.2 Central Limit Theorem |
|
|
85 | (4) |
|
3.7 Probability Sampling Techniques |
|
|
89 | (35) |
|
3.7.1 Population Statistics |
|
|
89 | (4) |
|
3.7.2 Simple Random Sampling |
|
|
93 | (7) |
|
3.7.3 Systematic Random Sampling |
|
|
100 | (4) |
|
3.7.4 Stratified Random Sampling |
|
|
104 | (7) |
|
|
111 | (6) |
|
|
117 | (7) |
|
3.8 Monte Carlo Method: Acceptance-Rejection Method |
|
|
124 | (2) |
|
3.9 A Qualitative Account of Computational Savings by Sampling |
|
|
126 | (1) |
|
|
127 | (2) |
|
Chapter 4 Data Visualization in R |
|
|
129 | (52) |
|
4.1 Introduction to the ggplot2 Package |
|
|
130 | (2) |
|
4.2 World Development Indicators |
|
|
132 | (1) |
|
|
132 | (6) |
|
4.4 Stacked Column Charts |
|
|
138 | (6) |
|
|
144 | (1) |
|
|
145 | (3) |
|
4.7 Histograms and Density Plots |
|
|
148 | (4) |
|
|
152 | (2) |
|
|
154 | (2) |
|
|
156 | (2) |
|
|
158 | (4) |
|
|
162 | (3) |
|
|
165 | (2) |
|
|
167 | (2) |
|
|
169 | (1) |
|
|
170 | (2) |
|
|
172 | (2) |
|
|
174 | (4) |
|
|
178 | (1) |
|
|
179 | (2) |
|
Chapter 5 Feature Engineering |
|
|
181 | (38) |
|
5.1 Introduction to Feature Engineering |
|
|
182 | (3) |
|
|
184 | (1) |
|
|
184 | (1) |
|
|
184 | (1) |
|
5.2 Understanding the Working Data |
|
|
185 | (6) |
|
|
186 | (1) |
|
5.2.2 Properties of Dependent Variable |
|
|
186 | (3) |
|
5.2.3 Features Availability: Continuous or Categorical |
|
|
189 | (2) |
|
5.2.4 Setting Up Data Assumptions |
|
|
191 | (1) |
|
|
191 | (4) |
|
5.4 Variable Subset Selection |
|
|
195 | (15) |
|
|
195 | (4) |
|
|
199 | (7) |
|
|
206 | (4) |
|
5.5 Dimensionality Reduction |
|
|
210 | (5) |
|
5.6 Feature Engineering Checklist |
|
|
215 | (2) |
|
|
217 | (1) |
|
|
217 | (2) |
|
Chapter 6 Machine Learning Theory and Practices |
|
|
219 | (206) |
|
6.1 Machine Learning Types |
|
|
222 | (2) |
|
6.1.1 Supervised Learning |
|
|
222 | (1) |
|
6.1.2 Unsupervised Learning |
|
|
223 | (1) |
|
6.1.3 Semi-Supervised Learning |
|
|
223 | (1) |
|
6.1.4 Reinforcement Learning |
|
|
223 | (1) |
|
6.2 Groups of Machine Learning Algorithms |
|
|
224 | (5) |
|
|
229 | (4) |
|
|
229 | (1) |
|
6.3.2 Purchase Preference |
|
|
230 | (1) |
|
6.3.3 Twitter Feeds and Article |
|
|
231 | (1) |
|
|
231 | (1) |
|
|
232 | (1) |
|
|
232 | (1) |
|
|
233 | (2) |
|
|
235 | (55) |
|
|
238 | (3) |
|
6.5.2 Simple Linear Regression |
|
|
241 | (3) |
|
6.5.3 Multiple Linear Regression |
|
|
244 | (3) |
|
6.5.4 Model Diagnostics: Linear Regression |
|
|
247 | (14) |
|
6.5.5 Polynomial Regression |
|
|
261 | (4) |
|
6.5.6 Logistic Regression |
|
|
265 | (1) |
|
6.5.7 Logit Transformation |
|
|
266 | (1) |
|
|
267 | (8) |
|
6.5.9 Model Diagnostics: Logistic Regression |
|
|
275 | (10) |
|
6.5.10 Multinomial Logistic Regression |
|
|
285 | (4) |
|
6.5.11 Generalized Linear Models |
|
|
289 | (1) |
|
|
290 | (1) |
|
6.6 Support Vector Machine SVM |
|
|
290 | (7) |
|
|
292 | (1) |
|
6.6.2 Binary SVM Classifier |
|
|
293 | (2) |
|
|
295 | (2) |
|
|
297 | (1) |
|
|
297 | (33) |
|
6.7.1 Types of Decision Trees |
|
|
298 | (2) |
|
|
300 | (2) |
|
6.7.3 Decision Tree Learning Methods |
|
|
302 | (19) |
|
|
321 | (8) |
|
|
329 | (1) |
|
6.8 The Naive Bayes Method |
|
|
330 | (7) |
|
6.8.1 Conditional Probability |
|
|
330 | (1) |
|
|
330 | (1) |
|
|
331 | (1) |
|
6.8.4 Posterior Probability |
|
|
331 | (1) |
|
6.8.5 Likelihood and Marginal Likelihood |
|
|
331 | (1) |
|
6.8.6 Naive Bayes Methods |
|
|
332 | (5) |
|
|
337 | (1) |
|
|
337 | (17) |
|
6.9.1 Introduction to Clustering |
|
|
338 | (1) |
|
6.9.2 Clustering Algorithms |
|
|
339 | (12) |
|
6.9.3 Internal Evaluation |
|
|
351 | (2) |
|
6.9.4 External Evaluation |
|
|
353 | (1) |
|
|
354 | (1) |
|
6.10 Association Rule Mining |
|
|
354 | (18) |
|
6.10.1 Introduction to Association Concepts |
|
|
355 | (2) |
|
6.10.2 Rule-Mining Algorithms |
|
|
357 | (7) |
|
6.10.3 Recommendation Algorithms |
|
|
364 | (8) |
|
|
372 | (1) |
|
6.11 Artificial Neural Networks |
|
|
372 | (24) |
|
6.11.1 Human Cognitive Learning |
|
|
372 | (2) |
|
|
374 | (3) |
|
|
377 | (1) |
|
6.11.4 Neural Network Architecture |
|
|
377 | (2) |
|
6.11.5 Supervised versus Unsupervised Neural Nets |
|
|
379 | (1) |
|
6.11.6 Neural Network Learning Algorithms |
|
|
380 | (2) |
|
6.11.7 Feed-Forward Back-Propagation |
|
|
382 | (7) |
|
|
389 | (7) |
|
|
396 | (1) |
|
6.12 Text-Mining Approaches |
|
|
396 | (21) |
|
6.12.1 Introduction to Text Mining |
|
|
397 | (1) |
|
6.12.2 Text Summarization |
|
|
398 | (2) |
|
|
400 | (2) |
|
6.12.4 Part-of-Speech (POS) Tagging |
|
|
402 | (4) |
|
|
406 | (1) |
|
6.12.6 Text Analysis: Microsoft Cognitive Services |
|
|
407 | (10) |
|
|
417 | (1) |
|
6.13 Online Machine Learning Algorithms |
|
|
417 | (5) |
|
6.13.1 Fuzzy C-Means Clustering |
|
|
419 | (3) |
|
|
422 | (1) |
|
6.14 Model Building Checklist |
|
|
422 | (1) |
|
|
423 | (1) |
|
|
423 | (2) |
|
Chapter 7 Machine Learning Model Evaluation |
|
|
425 | (40) |
|
|
426 | (4) |
|
|
426 | (2) |
|
7.1.2 Purchase Preference |
|
|
428 | (2) |
|
7.2 Introduction to Model Performance and Evaluation |
|
|
430 | (1) |
|
7.3 Objectives of Model Performance Evaluation |
|
|
431 | (1) |
|
7.4 Population Stability Index |
|
|
432 | (5) |
|
7.5 Model Evaluation for Continuous Output |
|
|
437 | (8) |
|
7.5.1 Mean Absolute Error |
|
|
439 | (2) |
|
7.5.2 Root Mean Square Error |
|
|
441 | (1) |
|
|
442 | (3) |
|
7.6 Model Evaluation for Discrete Output |
|
|
445 | (10) |
|
7.6.1 Classification Matrix |
|
|
446 | (5) |
|
7.6.2 Sensitivity and Specificity |
|
|
451 | (1) |
|
7.6.3 Area Under ROC Curve |
|
|
452 | (3) |
|
7.7 Probabilistic Techniques |
|
|
455 | (4) |
|
7.7.1 K-Fold Cross Validation |
|
|
456 | (2) |
|
|
458 | (1) |
|
7.8 The Kappa Error Metric |
|
|
459 | (4) |
|
|
463 | (1) |
|
|
464 | (1) |
|
Chapter 8 Model Performance Improvement |
|
|
465 | (54) |
|
8.1 Machine Learning and Statistical Modeling |
|
|
466 | (2) |
|
8.2 Overview of the Caret Package |
|
|
468 | (2) |
|
8.3 Introduction to Hyper-Parameters |
|
|
470 | (4) |
|
8.4 Hyper-Parameter Optimization |
|
|
474 | (14) |
|
|
475 | (2) |
|
|
477 | (2) |
|
8.4.3 Automatic Grid Search |
|
|
479 | (2) |
|
|
481 | (2) |
|
|
483 | (2) |
|
|
485 | (3) |
|
8.5 The Bias and Variance Tradeoff |
|
|
488 | (5) |
|
8.5.1 Bagging or Bootstrap Aggregation |
|
|
492 | (1) |
|
|
493 | (1) |
|
8.6 Introduction to Ensemble Learning |
|
|
493 | (5) |
|
|
494 | (1) |
|
8.6.2 Advanced Methods in Ensemble Learning |
|
|
495 | (3) |
|
8.7 Ensemble Techniques Illustration in R |
|
|
498 | (13) |
|
|
498 | (2) |
|
8.7.2 Gradient Boosting with a Decision Tree |
|
|
500 | (5) |
|
8.7.3 Blending KNN and Rpart |
|
|
505 | (1) |
|
8.7.4 Stacking Using caretEnsemble |
|
|
506 | (5) |
|
8.8 Advanced Topic: Bayesian Optimization of Machine Learning Models |
|
|
511 | (5) |
|
|
516 | (1) |
|
|
517 | (2) |
|
Chapter 9 Scalable Machine Learning and Related Technologies |
|
|
519 | (36) |
|
9.1 Distributed Processing and Storage |
|
|
520 | (6) |
|
9.1.1 Google File System (GFS) |
|
|
520 | (2) |
|
|
522 | (1) |
|
9.1.3 Parallel Execution in R |
|
|
523 | (3) |
|
|
526 | (15) |
|
|
527 | (4) |
|
|
531 | (4) |
|
|
535 | (3) |
|
|
538 | (2) |
|
|
540 | (1) |
|
9.3 Machine Learning in R with Spark |
|
|
541 | (5) |
|
9.3.1 Setting the Environment Variable |
|
|
542 | (1) |
|
9.3.2 Initializing the Spark Session |
|
|
542 | (1) |
|
9.3.3 Loading Data and the Running Pre-Process |
|
|
542 | (1) |
|
9.3.4 Creating SparkDataFrame |
|
|
543 | (1) |
|
9.3.5 Building the ML Model |
|
|
544 | (1) |
|
9.3.6 Predicting the Test Data |
|
|
545 | (1) |
|
9.3.7 Stopping the SparkR Session |
|
|
546 | (1) |
|
9.4 Machine Learning in R with H20 |
|
|
546 | (7) |
|
9.4.1 Installation of Packages |
|
|
547 | (1) |
|
9.4.2 Initialization of H20 Clusters |
|
|
547 | (1) |
|
9.4.3 Deep Learning Demo in R with H20 |
|
|
548 | (5) |
|
|
553 | (1) |
|
|
554 | (1) |
Index |
|
555 | |