Muutke küpsiste eelistusi

Just Enough R!: An Interactive Approach to Machine Learning and Analytics [Kõva köide]

  • Formaat: Hardback, 346 pages, kõrgus x laius: 254x178 mm, kaal: 1760 g, 33 Tables, black and white; 72 Illustrations, black and white
  • Ilmumisaeg: 08-Jun-2020
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 0367443201
  • ISBN-13: 9780367443207
  • Formaat: Hardback, 346 pages, kõrgus x laius: 254x178 mm, kaal: 1760 g, 33 Tables, black and white; 72 Illustrations, black and white
  • Ilmumisaeg: 08-Jun-2020
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 0367443201
  • ISBN-13: 9780367443207
Just Enough R! An Interactive Approach to Machine Learning and Analytics presents just enough of the R language, machine learning algorithms, statistical methodology, and analytics for the reader to learn how to find interesting structure in data. The approach might be called "seeing then doing" as it first gives step-by-step explanations using simple, understandable examples of how the various machine learning algorithms work independent of any programming language. This is followed by detailed scripts written in R that apply the algorithms to solve nontrivial problems with real data. The script code is provided, allowing the reader to execute the scripts as they study the explanations given in the text.

Features











Gets you quickly using R as a problem-solving tool





Uses RStudios integrated development environment





Shows how to interface R with SQLite





Includes examples using Rs Rattle graphical user interface





Requires no prior knowledge of R, machine learning, or computer programming





Offers over 50 scripts written in R, including several problem-solving templates that, with slight modification, can be used again and again





Covers the most popular machine learning techniques, including ensemble-based methods and logistic regression











Includes end-of-chapter exercises, many of which can be solved by modifying existing scripts





Includes datasets from several areas, including business, health and medicine, and science

About the Author

Richard J. Roiger is a professor emeritus at Minnesota State University, Mankato, where he taught and performed research in the Computer and Information Science Department for over 30 years.
Preface xiii
Acknowledgment xv
Author xvii
Chapter 1 Introduction to Machine Learning
1(24)
1.1 Machine Learning, Statistical Analysis, and Data Science
2(1)
1.2 Machine Learning: A First Example
3(3)
1.2.1 Attribute-Value Format
3(1)
1.2.2 A Decision Tree for Diagnosing Illness
4(2)
1.3 Machine Learning Strategies
6(6)
1.3.1 Classification
7(1)
1.3.2 Estimation
7(1)
1.3.3 Prediction
8(3)
1.3.4 Unsupervised Clustering
11(1)
1.3.5 Market Basket Analysis
12(1)
1.4 Evaluating Performance
12(5)
1.4.1 Evaluating Supervised Models
12(1)
1.4.2 Two-Class Error Analysis
13(1)
1.4.3 Evaluating Numeric Output
14(1)
1.4.4 Comparing Models by Measuring Lift
15(2)
1.4.5 Unsupervised Model Evaluation
17(1)
1.5 Ethical Issues
17(1)
1.6
Chapter Summary
18(1)
1.7 Key Terms
18(2)
Exercises
20(5)
Chapter 2 Introduction to R
25(16)
2.1 Introducing R and Rstudio
25(3)
2.1.1 Features of R
26(1)
2.1.2 Installing R
26(2)
2.1.3 Installing RStudio
28(1)
2.2 Navigating Rstudio
28(10)
2.2.1 The Console
28(2)
2.2.2 The Source Panel
30(2)
2.2.3 The Global Environment
32(5)
2.2.4 Packages
37(1)
2.3 Where's the Data?
38(1)
2.4 Obtaining Help and Additional Information
38(1)
2.5 Summary
39(1)
Exercises
39(2)
Chapter 3 Data Structures and Manipulation
41(20)
3.1 Data Types
41(3)
3.1.1 Character Data and Factors
42(2)
3.2 Single-Mode Data Structures
44(3)
3.2.1 Vectors
44(2)
3.2.2 Mxatrices and Arrays
46(1)
3.3 Multimode Data Structures
47(3)
3.3.1 Lists
47(1)
3.3.2 Data Frames
48(2)
3.4 Writing Your Own Functions
50(8)
3.4.1 Writing a Simple Function
50(2)
3.4.2 Conditional Statements
52(1)
3.4.3 Iteration
53(4)
3.4.4 Recursive Programming
57(1)
3.5 Summary
58(1)
3.6 Key Terms
58(1)
Exercises
59(2)
Chapter 4 Preparing the Data
61(18)
4.1 A Process Model for Knowledge Discovery
61(1)
4.2 Creating a Target Dataset
62(4)
4.2.1 Interfacing R with the Relational Model
64(2)
4.2.2 Additional Sources for Target Data
66(1)
4.3 Data Preprocessing
66(4)
4.3.1 Noisy Data
66(1)
4.3.2 Preprocessing with R
67(2)
4.3.3 Detecting Outliers
69(1)
4.3.4 Missing Data
69(1)
4.4 Data Transformation
70(5)
4.4.1 Data Normalization
70(2)
4.4.2 Data Type Conversion
72(1)
4.4.3 Attribute and Instance Selection
72(2)
4.4.4 Creating Training and Test Set Data
74(1)
4.4.5 Cross Validation and Bootstrapping
74(1)
4.4.6 Large-Sized Data
75(1)
4.5
Chapter Summary
75(1)
4.6 Key Terms
76(1)
Exercises
77(2)
Chapter 5 Supervised Statistical Techniques
79(48)
5.1 Simple Linear Regression
79(6)
5.2 Multiple Linear Regression
85(14)
5.2.1 Multiple Linear Regression: An Example
85(3)
5.2.2 Evaluating Numeric Output
88(1)
5.2.3 Training/Test Set Evaluation
89(2)
5.2.4 Using Cross Validation
91(2)
5.2.5 Linear Regression with Categorical Data
93(6)
5.3 Logistic Regression
99(10)
5.3.1 Transforming the Linear Regression Model
100(1)
5.3.2 The Logistic Regression Model
100(1)
5.3.3 Logistic Regression with R
101(3)
5.3.4 Creating a Confusion Matrix
104(1)
5.3.5 Receiver Operating Characteristics (ROC) Curves
104(4)
5.3.6 The Area under an ROC Curve
108(1)
5.4 Naive Bayes Classifier
109(11)
5.4.1 Bayes Classifier: An Example
109(3)
5.4.2 Zero-Valued Attribute Counts
112(1)
5.4.3 Missing Data
112(1)
5.4.4 Numeric Data
113(2)
5.4.5 Experimenting with Naive Bayes
115(5)
5.5
Chapter Summary
120(1)
5.6 Key Terms
120(2)
Exercises
122(5)
Chapter 6 Tree-Based Methods
127(34)
6.1 A Decision Tree Algorithm
127(6)
6.1.1 An Algorithm for Building Decision Trees
128(1)
6.1.2 C4.5 Attribute Selection
128(5)
6.1.3 Other Methods for Building Decision Trees
133(1)
6.2 Building Decision Trees: C5.0
133(4)
6.2.1 A Decision Tree for Credit Card Promotions
134(1)
6.2.2 Data for Simulating Customer Churn
135(1)
6.2.3 Predicting Customer Churn with C5.0
136(1)
6.3 Building Decision Trees: Rpart
137(10)
6.3.1 An rpart Decision Tree for Credit Card Promotions
139(2)
6.3.2 Train and Test rpart: Churn Data
141(2)
6.3.3 Cross Validation rpart: Churn Data
143(4)
6.4 Building Decision Trees: J48
147(2)
6.5 Ensemble Techniques for Improving Performance
149(5)
6.5.1 Bagging
149(1)
6.5.2 Boosting
150(1)
6.5.3 Boosting: An Example with C5.0
150(1)
6.5.4 Random Forests
151(3)
6.6 Regression Trees
154(2)
6.7
Chapter Summary
156(1)
6.8 Key Terms
157(1)
Exercises
157(4)
Chapter 7 Rule-Based Techniques
161(28)
7.1 From Trees to Rules
161(4)
7.1.1 The Spam Email Dataset
162(1)
7.1.2 Spam Email Classification: C5.0
163(2)
7.2 A Basic Covering Rule Algorithm
165(4)
7.2.1 Generating Covering Rules with JRip
166(3)
7.3 Generating Association Rules
169(8)
7.3.1 Confidence and Support
169(1)
7.3.2 Mining Association Rules: An Example
170(3)
7.3.3 General Considerations
173(1)
7.3.4 Rweka's Apriori Function
173(4)
7.4 Shake, Rattle, and Roll
177(7)
7.5
Chapter Summary
184(1)
7.6 Key Terms
184(1)
Exercises
185(4)
Chapter 8 Neural Networks
189(50)
8.1 Feed-Forward Neural Networks
190(4)
8.1.1 Neural Network Input Format
190(2)
8.1.2 Neural Network Output Format
192(1)
8.1.3 The Sigmoid Evaluation Function
193(1)
8.2 Neural Network Training: A Conceptual View
194(2)
8.2.1 Supervised Learning with Feed-Forward Networks
194(1)
8.2.2 Unsupervised Clustering with Self-Organizing Maps
195(1)
8.3 Neural Network Explanation
196(1)
8.4 General Considerations
197(1)
8.4.1 Strengths
197(1)
8.4.2 Weaknesses
198(1)
8.5 Neural Network Training: A Detailed View
198(5)
8.5.1 The Backpropagation Algorithm: An Example
198(4)
8.5.2 Kohonen Self-Organizing Maps: An Example
202(1)
8.6 Building Neural Networks With R
203(20)
8.6.1 The Exclusive-or Function
204(2)
8.6.2 Modeling Exclusive-or with MLP: Numeric Output
206(4)
8.6.3 Modeling Exclusive-or with MLP: Categorical Output
210(2)
8.6.4 Modeling Exclusive-or with neuralnet: Numeric Output
212(2)
8.6.5 Modeling Exclusive-or with neuralnet: Categorical Output
214(2)
8.6.6 Classifying Satellite Image Data
216(4)
8.6.7 Testing for Diabetes
220(3)
8.7 Neural Net Clustering for Attribute Evaluation
223(4)
8.8 Times Series Analysis
227(5)
8.8.1 Stock Market Analytics
227(1)
8.8.2 Time Series Analysis: An Example
228(1)
8.8.3 The Target Data
229(1)
8.8.4 Modeling the Time Series
230(2)
8.8.5 General Considerations
232(1)
8.9
Chapter Summary
232(1)
8.10 Key Terms
233(1)
Exercises
234(5)
Chapter 9 Formal Evaluation Techniques
239(18)
9.1 What Should Be Evaluated?
240(1)
9.2 Tools for Evaluation
241(6)
9.2.1 Single-Valued Summary Statistics
242(1)
9.2.2 The Normal Distribution
242(2)
9.2.3 Normal Distributions and Sample Means
244(1)
9.2.4 A Classical Model for Hypothesis Testing
245(2)
9.3 Computing Test Set Confidence Intervals
247(2)
9.4 Comparing Supervised Models
249(4)
9.4.1 Comparing the Performance of Two Models
251(1)
9.4.2 Comparing the Performance of Two or More Models
252(1)
9.5 Confidence Intervals for Numeric Output
253(1)
9.6
Chapter Summary
253(1)
9.7 Key Terms
254(1)
Exercises
255(2)
Chapter 10 Support Vector Machines
257(22)
10.1 Linearly Separable Classes
259(5)
10.2 The Nonlinear Case
264(1)
10.3 Experimenting With Linearly Separable Data
265(2)
10.4 Microarray Data Mining
267(2)
10.4.1 DNA and Gene Expression
267(1)
10.4.2 Preprocessing Microarray Data: Attribute Selection
268(1)
10.4.3 Microarray Data Mining: Issues
269(1)
10.5 A Microarray Application
269(5)
10.5.1 Establishing a Benchmark
270(1)
10.5.2 Attribute Elimination
271(3)
10.6
Chapter Summary
274(1)
10.7 Key Terms
275(1)
Exercises
275(4)
Chapter 11 Unsupervised Clustering Techniques
279(32)
11.1 The K-Means Algorithm
280(4)
11.1.1 An Example Using K-Means
280(3)
11.1.2 General Considerations
283(1)
11.2 Agglomerative Clustering
284(3)
11.2.1 Agglomerative Clustering: An Example
284(2)
11.2.2 General Considerations
286(1)
11.3 Conceptual Clustering
287(4)
11.3.1 Measuring Category Utility
287(1)
11.3.2 Conceptual Clustering: An Example
288(2)
11.3.3 General Considerations
290(1)
11.4 Expectation Maximization
291(1)
11.5 Unsupervised Clustering With R
292(13)
11.5.1 Supervised Learning for Cluster Evaluation
292(2)
11.5.2 Unsupervised Clustering for Attribute Evaluation
294(3)
11.5.3 Agglomerative Clustering: A Simple Example
297(1)
11.5.4 Agglomerative Clustering of Gamma-Ray Burst Data
298(3)
11.5.5 Agglomerative Clustering of Cardiology Patient Data
301(2)
11.5.6 Agglomerative Clustering of Credit Screening Data
303(2)
11.6
Chapter Summary
305(1)
11.7 Key Terms
306(1)
Exercises
307(4)
Chapter 12 A Case Study in Predicting Treatment Outcome
311(10)
12.1 Goal Identification
313(1)
12.2 A Measure of Treatment Success
314(1)
12.3 Target Data Creation
315(1)
12.4 Data Preprocessing
316(1)
12.5 Data Transformation
316(1)
12.6 Data Mining
316(2)
12.6.1 Two-Class Experiments
316(2)
12.7 Interpretation and Evaluation
318(1)
12.7.1 Should Patients Torso Rotate?
318(1)
12.8 Taking Action
319(1)
12.9
Chapter Summary
319(2)
Bibliography 321(6)
Appendix A Supplementary Materials And More Datasets 327(2)
Appendix B Statistics For Performance Evaluation 329(6)
Subject Index 335(6)
Index of R Functions 341(2)
Script Index 343
Richard J. Roiger is a professor emeritus at Minnesota State University, Mankato where he taught and performed research in the Computer & Information Science Department for 27 years. Dr. Roigers Ph.D. degree is in Computer & Information Sciences from the University of Minnesota. Dr. Roiger continues to serve as a part-time faculty member teaching courses in data mining, artificial intelligence and research methods. Richard enjoys interacting with his grandchildren, traveling, writing and pursuing his musical talents.