Muutke küpsiste eelistusi

E-raamat: Just Enough R!: An Interactive Approach to Machine Learning and Analytics [Taylor & Francis e-raamat]

  • Formaat: 346 pages, 33 Tables, black and white; 72 Illustrations, black and white
  • Ilmumisaeg: 08-Jun-2020
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-13: 9781003006695
  • Taylor & Francis e-raamat
  • Hind: 207,73 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
  • Tavahind: 296,75 €
  • Säästad 30%
  • Formaat: 346 pages, 33 Tables, black and white; 72 Illustrations, black and white
  • Ilmumisaeg: 08-Jun-2020
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-13: 9781003006695
"Just Enough R! An Interactive Approach to Machine Learning and Analytics: presents just enough of the R language, machine learning algorithms, statistical methodology, and analytics for the reader to learn how to find interesting structure in data. The approach might be called "seeing then doing" as it first gives step by step explanations using simple, understandable examples of how the various machine learning algorithms work independent of any programming language. This is followed by detailed scripts written in R that apply the algorithms to solve nontrivial problems with real data. The script code is provided allowing the reader to execute the scripts as they study the explanations given in the text"--

Just Enough R! An Interactive Approach to Machine Learning and Analytics presents just enough of the R language, machine learning algorithms, statistical methodology, and analytics for the reader to learn how to find interesting structure in data. The approach might be called "seeing then doing" as it first gives step-by-step explanations using simple, understandable examples of how the various machine learning algorithms work independent of any programming language. This is followed by detailed scripts written in R that apply the algorithms to solve nontrivial problems with real data. The script code is provided, allowing the reader to execute the scripts as they study the explanations given in the text.

Features

  • Gets you quickly using R as a problem-solving tool
  • Uses RStudio’s integrated development environment
  • Shows how to interface R with SQLite
  • Includes examples using R’s Rattle graphical user interface
  • Requires no prior knowledge of R, machine learning, or computer programming
  • Offers over 50 scripts written in R, including several problem-solving templates that, with slight modification, can be used again and again
    • Covers the most popular machine learning techniques, including ensemble-based methods and logistic regression
    • Includes end-of-chapter exercises, many of which can be solved by modifying existing scripts
    • Includes datasets from several areas, including business, health and medicine, and science

    About the Author

    Richard J. Roiger is a professor emeritus at Minnesota State University, Mankato, where he taught and performed research in the Computer and Information Science Department for over 30 years.

    Preface xiii
    Acknowledgment xv
    Author xvii
    Chapter 1 Introduction to Machine Learning
    1(24)
    1.1 Machine Learning, Statistical Analysis, And Data Science
    2(1)
    1.2 Machine Learning: A First Example
    3(3)
    1.2.1 Attribute-Value Format
    3(1)
    1.2.2 A Decision Tree for Diagnosing Illness
    4(2)
    1.3 Machine Learning Strategies
    6(6)
    1.3.1 Classification
    7(1)
    1.3.2 Estimation
    7(1)
    1.3.3 Prediction
    8(3)
    1.3.4 Unsupervised Clustering
    11(1)
    1.3.5 Market Basket Analysis
    12(1)
    1.4 Evaluating Performance
    12(5)
    1.4.1 Evaluating Supervised Models
    12(1)
    1.4.2 Two-Class Error Analysis
    13(1)
    1.4.3 Evaluating Numeric Output
    14(1)
    1.4.4 Comparing Models by Measuring Lift
    15(2)
    1.4.5 Unsupervised Model Evaluation
    17(1)
    1.5 Ethical Issues
    17(1)
    1.6
    Chapter Summary
    18(1)
    1.7 Key Terms
    18(2)
    Exercises
    20(5)
    Chapter 2 Introduction to R
    25(16)
    2.1 Introducing R And Rstudio
    25(3)
    2.1.1 Features of R
    26(1)
    2.1.2 Installing R
    26(2)
    2.1.3 Installing RStudio
    28(1)
    2.2 Navigating Rstudio
    28(10)
    2.2.1 The Console
    28(2)
    2.2.2 The Source Panel
    30(2)
    2.2.3 The Global Environment
    32(5)
    2.2.4 Packages
    37(1)
    2.3 Where's The Data?
    38(1)
    2.4 Obtaining Help And Additional Information
    38(1)
    2.5 Summary
    39(1)
    Exercises
    39(2)
    Chapter 3 Data Structures and Manipulation
    41(20)
    3.1 Data types
    41(3)
    3.1.1 Character Data and Factors
    42(2)
    3.2 Single-Mode Data Structures
    44(3)
    3.2.1 Vectors
    44(2)
    3.2.2 Matrices and Arrays
    46(1)
    3.3 Multimode Data Structures
    47(3)
    3.3.1 Lists
    47(1)
    3.3.2 Data Frames
    48(2)
    3.4 Writing Your Own Functions
    50(8)
    3.4.1 Writing a Simple Function
    50(2)
    3.4.2 Conditional Statements
    52(1)
    3.4.3 Iteration
    53(4)
    3.4.4 Recursive Programming
    57(1)
    3.5 Summary
    58(1)
    3.6 Key Terms
    58(1)
    Exercises
    59(2)
    Chapter 4 Preparing the Data
    61(18)
    4.1 A Process Model For Knowledge Discovery
    61(1)
    4.2 Creating A Target Dataset
    62(4)
    4.2.1 Interfacing R with the Relational Model
    64(2)
    4.2.2 Additional Sources for Target Data
    66(1)
    4.3 Data Preprocessing
    66(4)
    4.3.1 Noisy Data
    66(1)
    4.3.2 Preprocessing with R
    67(2)
    4.3.3 Detecting Outliers
    69(1)
    4.3.4 Missing Data
    69(1)
    4.4 Data Transformation
    70(5)
    4.4.1 Data Normalization
    70(2)
    4.4.2 Data Type Conversion
    72(1)
    4.4.3 Attribute and Instance Selection
    72(2)
    4.4.4 Creating Training and Test Set Data
    74(1)
    4.4.5 Cross Validation and Bootstrapping
    74(1)
    4.4.6 Large-Sized Data
    75(1)
    4.5
    Chapter Summary
    75(1)
    4.6 Key Terms
    76(1)
    Exercises
    77(2)
    Chapter 5 Supervised Statistical Techniques
    79(48)
    5.1 Simple Linear Regression
    79(6)
    5.2 Multiple Linear Regression
    85(14)
    5.2.1 Multiple Linear Regression: An Example
    85(3)
    5.2.2 Evaluating Numeric Output
    88(1)
    5.2.3 Training/Test Set Evaluation
    89(2)
    5.2.4 Using Cross Validation
    91(2)
    5.2.5 Linear Regression with Categorical Data
    93(6)
    5.3 Logistic Regression
    99(10)
    5.3.1 Transforming the Linear Regression Model
    100(1)
    5.3.2 The Logistic Regression Model
    100(1)
    5.3.3 Logistic Regression with R
    101(3)
    5.3.4 Creating a Confusion Matrix
    104(1)
    5.3.5 Receiver Operating Characteristics (ROC) Curves
    104(4)
    5.3.6 The Area under an ROC Curve
    108(1)
    5.4 Naive Bayes Classifier
    109(4)
    5.4.1 Bayes Classifier: An Example
    109(3)
    5.4.2 Zero-Valued Attribute Counts
    112(1)
    5.4.3 Missing Data
    112(1)
    5.44 Numeric Data
    113(7)
    5.4.5 Experimenting with Naive Bayes
    115(5)
    5.5
    Chapter Summary
    120(1)
    5.6 Key Terms
    120(2)
    Exercises
    122(5)
    Chapter 6 Tree-Based Methods
    127(34)
    6.1 A Decision Tree Algorithm
    127(6)
    6.1.1 An Algorithm for Building Decision Trees
    128(1)
    6.1.2 C4.5 Attribute Selection
    128(5)
    6.1.3 Other Methods for Building Decision Trees
    133(1)
    6.2 Building Decision Trees: C5.0
    133(4)
    6.2.1 A Decision Tree for Credit Card Promotions
    134(1)
    6.2.2 Data for Simulating Customer Churn
    135(1)
    6.2.3 Predicting Customer Churn with C5.0
    136(1)
    6.3 Building Decision Trees: R Part
    137(10)
    6.3.1 An rpart Decision Tree for Credit Card Promotions
    139(2)
    6.3.2 Train and Test rpart: Churn Data
    141(2)
    6.3.3 Cross Validation rpart: Churn Data
    143(4)
    6.4 Building Decision Trees: J48
    147(2)
    6.5 Ensemble Techniques For Improving Performance
    149(5)
    6.5.1 Bagging
    149(1)
    6.5.2 Boosting
    150(1)
    6.5.3 Boosting: An Example with C5.0
    150(1)
    6.5.4 Random Forests
    151(3)
    6.6 Regression Trees
    154(2)
    6.7
    Chapter Summary
    156(1)
    6.8 Key Terms
    157(1)
    Exercises
    157(4)
    Chapter 7 Rule-Based Techniques
    161(28)
    7.1 From trees to rules
    161(4)
    7.1.1 The Spam Email Dataset
    162(1)
    7.1.2 Spam Email Classification: C5.0
    163(2)
    7.2 A Basic Covering Rule Algorithm
    165(4)
    7.2.1 Generating Covering Rules with JRip
    166(3)
    7.3 Generating Association Rules
    169(8)
    7.3.1 Confidence and Support
    169(1)
    7.3.2 Mining Association Rules: An Example
    170(3)
    7.3.3 General Considerations
    173(1)
    7.3.4 Rweka's Apriori Function
    173(4)
    7.4 Shake, Rattle, And Roll
    177(7)
    7.5
    Chapter Summary
    184(1)
    7.6 Key Terms
    184(1)
    Exercises
    185(4)
    Chapter 8 Neural Networks
    189(50)
    8.1 Feed-Forward Neural Networks
    190(4)
    8.1.1 Neural Network Input Format
    190(2)
    8.1.2 Neural Network Output Format
    192(1)
    8.1.3 The Sigmoid Evaluation Function
    193(1)
    8.2 Neural Network Training: A Conceptual View
    194(2)
    8.2.1 Supervised Learning with Feed-Forward Networks
    194(1)
    8.2.2 Unsupervised Clustering with Self-Organizing Maps
    195(1)
    8.3 Neural Network Explanation
    196(1)
    8.4 General Considerations
    197(1)
    8.4.1 Strengths
    197(1)
    8.4.2 Weaknesses
    198(1)
    8.5 Neural Network Training: A Detailed View
    198(5)
    8.5.1 The Backpropagation Algorithm: An Example
    198(4)
    8.5.2 Kohonen Self-Organizing Maps: An Example
    202(1)
    8.6 Building Neural Networks With R
    203(20)
    8.6.1 The Exclusive-OR Function
    204(2)
    8.6.2 Modeling Exclusive-OR with MLP: Numeric Output
    206(4)
    8.6.3 Modeling Exclusive-OR with MLP: Categorical Output
    210(2)
    8.6.4 Modeling Exclusive-OR with neuralnet: Numeric Output
    212(2)
    8.6.5 Modeling Exclusive-OR with neuralnet: Categorical Output
    214(2)
    8.6.6 Classifying Satellite Image Data
    216(4)
    8.6.7 Testing for Diabetes
    220(3)
    8.7 Neural Net Clustering For Attribute Evaluation
    223(4)
    8.8 Times Series Analysis
    227(5)
    8.8.1 Stock Market Analytics
    227(1)
    8.8.2 Time Series Analysis: An Example
    228(1)
    8.8.3 The Target Data
    229(1)
    8.8.4 Modeling the Time Series
    230(2)
    8.8.5 General Considerations
    232(1)
    8.9
    Chapter Summary
    232(1)
    8.10 Key Terms
    233(1)
    Exercises
    234(5)
    Chapter 9 Formal Evaluation Techniques
    239(18)
    9.1 What Should Be Evaluated?
    240(1)
    9.2 Tools For Evaluation
    241(6)
    9.2.1 Single-Valued Summary Statistics
    242(1)
    9.2.2 The Normal Distribution
    242(2)
    9.2.3 Normal Distributions and Sample Means
    244(1)
    9.2.4 A Classical Model for Hypothesis Testing
    245(2)
    9.3 Computing Test Set Confidence Intervals
    247(2)
    9.4 Comparing Supervised Models
    249(4)
    9.4.1 Comparing the Performance of Two Models
    251(1)
    9.4.2 Comparing the Performance of Two or More Models
    252(1)
    9.5 Confidence Intervals For Numeric Output
    253(1)
    9.6
    Chapter Summary
    253(1)
    9.7 Key Terms
    254(1)
    Exercises
    255(2)
    Chapter 10 Support Vector Machines
    257(22)
    10.1 Linearly Separable Classes
    259(5)
    10.2 The Nonlinear Case
    264(1)
    10.3 Experimenting With Linearly Separable Data
    265(2)
    10.4 Microarray Data Mining
    267(2)
    10.4.1 DNA and Gene Expression
    267(1)
    10.4.2 Preprocessing Microarray Data: Attribute Selection
    268(1)
    10.4.3 Microarray Data Mining: Issues
    269(1)
    10.5 A Microarray Application
    269(5)
    10.5.1 Establishing a Benchmark
    270(1)
    10.5.2 Attribute Elimination
    271(3)
    10.6
    Chapter Summary
    274(1)
    10.7 Key Terms
    275(1)
    Exercises
    275(4)
    Chapter 11 Unsupervised Clustering Techniques
    279(32)
    11.1 The K-Means Algorithm
    280(4)
    11.1.1 An Example Using K-Means
    280(3)
    11.1.2 General Considerations
    283(1)
    11.2 Agglomerative Clustering
    284(3)
    11.2.1 Agglomerative Clustering: An Example
    284(2)
    11.2.2 General Considerations
    286(1)
    11.3 Conceptual Clustering
    287(4)
    11.3.1 Measuring Category Utility
    287(1)
    11.3.2 Conceptual Clustering: An Example
    288(2)
    11.3.3 General Considerations
    290(1)
    11.4 Expectation Maximization
    291(1)
    11.5 Unsupervised Clustering With R
    292(13)
    11.5.1 Supervised Learning for Cluster Evaluation
    292(2)
    11.5.2 Unsupervised Clustering for Attribute Evaluation
    294(3)
    11.5.3 Agglomerative Clustering: A Simple Example
    297(1)
    11.5.4 Agglomerative Clustering of Gamma-Ray Burst Data
    298(3)
    11.5.5 Agglomerative Clustering of Cardiology Patient Data
    301(2)
    11.5.6 Agglomerative Clustering of Credit Screening Data
    303(2)
    11.6
    Chapter Summary
    305(1)
    11.7 Key Terms
    306(1)
    Exercises
    307(4)
    Chapter 12 A Case Study in Predicting Treatment Outcome
    311(10)
    12.1 Goal Identification
    313(1)
    12.2 A Measure Of Treatment Success
    314(1)
    12.3 Target Data Creation
    315(1)
    12.4 Data Preprocessing
    316(1)
    12.5 Data Transformation
    316(1)
    12.6 Data Mining
    316(2)
    12.6.1 Two-Class Experiments
    316(2)
    12.7 Interpretation And Evaluation
    318(1)
    12.7.1 Should Patients Torso Rotate?
    318(1)
    12.8 Taking Action
    319(1)
    12.9
    Chapter Summary
    319(2)
    Bibliography 321(6)
    Appendix A Supplementary Materials And More Datasets 327(2)
    Appendix B Statistics For Performance Evaluation 329(6)
    Subject Index 335(6)
    Index Of R Functions 341(2)
    Script Index 343
    Richard J. Roiger is a professor emeritus at Minnesota State University, Mankato where he taught and performed research in the Computer & Information Science Department for 27 years. Dr. Roigers Ph.D. degree is in Computer & Information Sciences from the University of Minnesota. Dr. Roiger continues to serve as a part-time faculty member teaching courses in data mining, artificial intelligence and research methods. Richard enjoys interacting with his grandchildren, traveling, writing and pursuing his musical talents.