Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Just Enough R!: An Interactive Approach to Machine Learning and Analytics [Taylor & Francis e-raamat]

Richard J. Roiger

Formaat: 346 pages, 33 Tables, black and white; 72 Illustrations, black and white
Ilmumisaeg: 08-Jun-2020
Kirjastus: Chapman & Hall/CRC
ISBN-13: 9781003006695

Teised raamatud teemal:

Taylor & Francis e-raamat
Hind: 207,73 €*
* hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
Tavahind: 296,75 €
Säästad 30%

Formaat: 346 pages, 33 Tables, black and white; 72 Illustrations, black and white
Ilmumisaeg: 08-Jun-2020
Kirjastus: Chapman & Hall/CRC
ISBN-13: 9781003006695

Teised raamatud teemal:

Rohkem infot Taylor & Francis e-raamatute kohta

Raamatu kodulehekülg: https://www.taylorfrancis.com/books/9781003006695

"Just Enough R! An Interactive Approach to Machine Learning and Analytics: presents just enough of the R language, machine learning algorithms, statistical methodology, and analytics for the reader to learn how to find interesting structure in data. The approach might be called "seeing then doing" as it first gives step by step explanations using simple, understandable examples of how the various machine learning algorithms work independent of any programming language. This is followed by detailed scripts written in R that apply the algorithms to solve nontrivial problems with real data. The script code is provided allowing the reader to execute the scripts as they study the explanations given in the text"--

Just Enough R! An Interactive Approach to Machine Learning and Analytics presents just enough of the R language, machine learning algorithms, statistical methodology, and analytics for the reader to learn how to find interesting structure in data. The approach might be called "seeing then doing" as it first gives step-by-step explanations using simple, understandable examples of how the various machine learning algorithms work independent of any programming language. This is followed by detailed scripts written in R that apply the algorithms to solve nontrivial problems with real data. The script code is provided, allowing the reader to execute the scripts as they study the explanations given in the text.

Features

Gets you quickly using R as a problem-solving tool

Uses RStudio’s integrated development environment

Shows how to interface R with SQLite

Includes examples using R’s Rattle graphical user interface

Requires no prior knowledge of R, machine learning, or computer programming

Offers over 50 scripts written in R, including several problem-solving templates that, with slight modification, can be used again and again

Covers the most popular machine learning techniques, including ensemble-based methods and logistic regression

Includes end-of-chapter exercises, many of which can be solved by modifying existing scripts

Includes datasets from several areas, including business, health and medicine, and science

About the Author

Richard J. Roiger is a professor emeritus at Minnesota State University, Mankato, where he taught and performed research in the Computer and Information Science Department for over 30 years.

Preface

xiii

Acknowledgment

Author

xvii

Chapter 1 Introduction to Machine Learning

(24)

1.1 Machine Learning, Statistical Analysis, And Data Science

(1)

1.2 Machine Learning: A First Example

(3)

1.2.1 Attribute-Value Format

(1)

1.2.2 A Decision Tree for Diagnosing Illness

(2)

1.3 Machine Learning Strategies

(6)

1.3.1 Classification

(1)

1.3.2 Estimation

(1)

1.3.3 Prediction

(3)

1.3.4 Unsupervised Clustering

(1)

1.3.5 Market Basket Analysis

(1)

1.4 Evaluating Performance

(5)

1.4.1 Evaluating Supervised Models

(1)

1.4.2 Two-Class Error Analysis

(1)

1.4.3 Evaluating Numeric Output

(1)

1.4.4 Comparing Models by Measuring Lift

(2)

1.4.5 Unsupervised Model Evaluation

(1)

1.5 Ethical Issues

(1)

1.6
Chapter Summary

(1)

1.7 Key Terms

(2)

Exercises

(5)

Chapter 2 Introduction to R

(16)

2.1 Introducing R And Rstudio

(3)

2.1.1 Features of R

(1)

2.1.2 Installing R

(2)

2.1.3 Installing RStudio

(1)

2.2 Navigating Rstudio

(10)

2.2.1 The Console

(2)

2.2.2 The Source Panel

(2)

2.2.3 The Global Environment

(5)

2.2.4 Packages

(1)

2.3 Where's The Data?

(1)

2.4 Obtaining Help And Additional Information

(1)

2.5 Summary

(1)

Exercises

(2)

Chapter 3 Data Structures and Manipulation

(20)

3.1 Data types

(3)

3.1.1 Character Data and Factors

(2)

3.2 Single-Mode Data Structures

(3)

3.2.1 Vectors

(2)

3.2.2 Matrices and Arrays

(1)

3.3 Multimode Data Structures

(3)

3.3.1 Lists

(1)

3.3.2 Data Frames

(2)

3.4 Writing Your Own Functions

(8)

3.4.1 Writing a Simple Function

(2)

3.4.2 Conditional Statements

(1)

3.4.3 Iteration

(4)

3.4.4 Recursive Programming

(1)

3.5 Summary

(1)

3.6 Key Terms

(1)

Exercises

(2)

Chapter 4 Preparing the Data

(18)

4.1 A Process Model For Knowledge Discovery

(1)

4.2 Creating A Target Dataset

(4)

4.2.1 Interfacing R with the Relational Model

(2)

4.2.2 Additional Sources for Target Data

(1)

4.3 Data Preprocessing

(4)

4.3.1 Noisy Data

(1)

4.3.2 Preprocessing with R

(2)

4.3.3 Detecting Outliers

(1)

4.3.4 Missing Data

(1)

4.4 Data Transformation

(5)

4.4.1 Data Normalization

(2)

4.4.2 Data Type Conversion

(1)

4.4.3 Attribute and Instance Selection

(2)

4.4.4 Creating Training and Test Set Data

(1)

4.4.5 Cross Validation and Bootstrapping

(1)

4.4.6 Large-Sized Data

(1)

4.5
Chapter Summary

(1)

4.6 Key Terms

(1)

Exercises

(2)

Chapter 5 Supervised Statistical Techniques

(48)

5.1 Simple Linear Regression

(6)

5.2 Multiple Linear Regression

(14)

5.2.1 Multiple Linear Regression: An Example

(3)

5.2.2 Evaluating Numeric Output

(1)

5.2.3 Training/Test Set Evaluation

(2)

5.2.4 Using Cross Validation

(2)

5.2.5 Linear Regression with Categorical Data

(6)

5.3 Logistic Regression

(10)

5.3.1 Transforming the Linear Regression Model

100

(1)

5.3.2 The Logistic Regression Model

100

(1)

5.3.3 Logistic Regression with R

101

(3)

5.3.4 Creating a Confusion Matrix

104

(1)

5.3.5 Receiver Operating Characteristics (ROC) Curves

104

(4)

5.3.6 The Area under an ROC Curve

108

(1)

5.4 Naive Bayes Classifier

109

(4)

5.4.1 Bayes Classifier: An Example

109

(3)

5.4.2 Zero-Valued Attribute Counts

112

(1)

5.4.3 Missing Data

112

(1)

5.44 Numeric Data

113

(7)

5.4.5 Experimenting with Naive Bayes

115

(5)

5.5
Chapter Summary

120

(1)

5.6 Key Terms

120

(2)

Exercises

122

(5)

Chapter 6 Tree-Based Methods

127

(34)

6.1 A Decision Tree Algorithm

127

(6)

6.1.1 An Algorithm for Building Decision Trees

128

(1)

6.1.2 C4.5 Attribute Selection

128

(5)

6.1.3 Other Methods for Building Decision Trees

133

(1)

6.2 Building Decision Trees: C5.0

133

(4)

6.2.1 A Decision Tree for Credit Card Promotions

134

(1)

6.2.2 Data for Simulating Customer Churn

135

(1)

6.2.3 Predicting Customer Churn with C5.0

136

(1)

6.3 Building Decision Trees: R Part

137

(10)

6.3.1 An rpart Decision Tree for Credit Card Promotions

139

(2)

6.3.2 Train and Test rpart: Churn Data

141

(2)

6.3.3 Cross Validation rpart: Churn Data

143

(4)

6.4 Building Decision Trees: J48

147

(2)

6.5 Ensemble Techniques For Improving Performance

149

(5)

6.5.1 Bagging

149

(1)

6.5.2 Boosting

150

(1)

6.5.3 Boosting: An Example with C5.0

150

(1)

6.5.4 Random Forests

151

(3)

6.6 Regression Trees

154

(2)

6.7
Chapter Summary

156

(1)

6.8 Key Terms

157

(1)

Exercises

157

(4)

Chapter 7 Rule-Based Techniques

161

(28)

7.1 From trees to rules

161

(4)

7.1.1 The Spam Email Dataset

162

(1)

7.1.2 Spam Email Classification: C5.0

163

(2)

7.2 A Basic Covering Rule Algorithm

165

(4)

7.2.1 Generating Covering Rules with JRip

166

(3)

7.3 Generating Association Rules

169

(8)

7.3.1 Confidence and Support

169

(1)

7.3.2 Mining Association Rules: An Example

170

(3)

7.3.3 General Considerations

173

(1)

7.3.4 Rweka's Apriori Function

173

(4)

7.4 Shake, Rattle, And Roll

177

(7)

7.5
Chapter Summary

184

(1)

7.6 Key Terms

184

(1)

Exercises

185

(4)

Chapter 8 Neural Networks

189

(50)

8.1 Feed-Forward Neural Networks

190

(4)

8.1.1 Neural Network Input Format

190

(2)

8.1.2 Neural Network Output Format

192

(1)

8.1.3 The Sigmoid Evaluation Function

193

(1)

8.2 Neural Network Training: A Conceptual View

194

(2)

8.2.1 Supervised Learning with Feed-Forward Networks

194

(1)

8.2.2 Unsupervised Clustering with Self-Organizing Maps

195

(1)

8.3 Neural Network Explanation

196

(1)

8.4 General Considerations

197

(1)

8.4.1 Strengths

197

(1)

8.4.2 Weaknesses

198

(1)

8.5 Neural Network Training: A Detailed View

198

(5)

8.5.1 The Backpropagation Algorithm: An Example

198

(4)

8.5.2 Kohonen Self-Organizing Maps: An Example

202

(1)

8.6 Building Neural Networks With R

203

(20)

8.6.1 The Exclusive-OR Function

204

(2)

8.6.2 Modeling Exclusive-OR with MLP: Numeric Output

206

(4)

8.6.3 Modeling Exclusive-OR with MLP: Categorical Output

210

(2)

8.6.4 Modeling Exclusive-OR with neuralnet: Numeric Output

212

(2)

8.6.5 Modeling Exclusive-OR with neuralnet: Categorical Output

214

(2)

8.6.6 Classifying Satellite Image Data

216

(4)

8.6.7 Testing for Diabetes

220

(3)

8.7 Neural Net Clustering For Attribute Evaluation

223

(4)

8.8 Times Series Analysis

227

(5)

8.8.1 Stock Market Analytics

227

(1)

8.8.2 Time Series Analysis: An Example

228

(1)

8.8.3 The Target Data

229

(1)

8.8.4 Modeling the Time Series

230

(2)

8.8.5 General Considerations

232

(1)

8.9
Chapter Summary

232

(1)

8.10 Key Terms

233

(1)

Exercises

234

(5)

Chapter 9 Formal Evaluation Techniques

239

(18)

9.1 What Should Be Evaluated?

240

(1)

9.2 Tools For Evaluation

241

(6)

9.2.1 Single-Valued Summary Statistics

242

(1)

9.2.2 The Normal Distribution

242

(2)

9.2.3 Normal Distributions and Sample Means

244

(1)

9.2.4 A Classical Model for Hypothesis Testing

245

(2)

9.3 Computing Test Set Confidence Intervals

247

(2)

9.4 Comparing Supervised Models

249

(4)

9.4.1 Comparing the Performance of Two Models

251

(1)

9.4.2 Comparing the Performance of Two or More Models

252

(1)

9.5 Confidence Intervals For Numeric Output

253

(1)

9.6
Chapter Summary

253

(1)

9.7 Key Terms

254

(1)

Exercises

255

(2)

Chapter 10 Support Vector Machines

257

(22)

10.1 Linearly Separable Classes

259

(5)

10.2 The Nonlinear Case

264

(1)

10.3 Experimenting With Linearly Separable Data

265

(2)

10.4 Microarray Data Mining

267

(2)

10.4.1 DNA and Gene Expression

267

(1)

10.4.2 Preprocessing Microarray Data: Attribute Selection

268

(1)

10.4.3 Microarray Data Mining: Issues

269

(1)

10.5 A Microarray Application

269

(5)

10.5.1 Establishing a Benchmark

270

(1)

10.5.2 Attribute Elimination

271

(3)

10.6
Chapter Summary

274

(1)

10.7 Key Terms

275

(1)

Exercises

275

(4)

Chapter 11 Unsupervised Clustering Techniques

279

(32)

11.1 The K-Means Algorithm

280

(4)

11.1.1 An Example Using K-Means

280

(3)

11.1.2 General Considerations

283

(1)

11.2 Agglomerative Clustering

284

(3)

11.2.1 Agglomerative Clustering: An Example

284

(2)

11.2.2 General Considerations

286

(1)

11.3 Conceptual Clustering

287

(4)

11.3.1 Measuring Category Utility

287

(1)

11.3.2 Conceptual Clustering: An Example

288

(2)

11.3.3 General Considerations

290

(1)

11.4 Expectation Maximization

291

(1)

11.5 Unsupervised Clustering With R

292

(13)

11.5.1 Supervised Learning for Cluster Evaluation

292

(2)

11.5.2 Unsupervised Clustering for Attribute Evaluation

294

(3)

11.5.3 Agglomerative Clustering: A Simple Example

297

(1)

11.5.4 Agglomerative Clustering of Gamma-Ray Burst Data

298

(3)

11.5.5 Agglomerative Clustering of Cardiology Patient Data

301

(2)

11.5.6 Agglomerative Clustering of Credit Screening Data

303

(2)

11.6
Chapter Summary

305

(1)

11.7 Key Terms

306

(1)

Exercises

307

(4)

Chapter 12 A Case Study in Predicting Treatment Outcome

311

(10)

12.1 Goal Identification

313

(1)

12.2 A Measure Of Treatment Success

314

(1)

12.3 Target Data Creation

315

(1)

12.4 Data Preprocessing

316

(1)

12.5 Data Transformation

316

(1)

12.6 Data Mining

316

(2)

12.6.1 Two-Class Experiments

316

(2)

12.7 Interpretation And Evaluation

318

(1)

12.7.1 Should Patients Torso Rotate?

318

(1)

12.8 Taking Action

319

(1)

12.9
Chapter Summary

319

(2)

Bibliography

321

(6)

Appendix A Supplementary Materials And More Datasets

327

(2)

Appendix B Statistics For Performance Evaluation

329

(6)

Subject Index

335

(6)

Index Of R Functions

341

(2)

Script Index

343

Richard J. Roiger is a professor emeritus at Minnesota State University, Mankato where he taught and performed research in the Computer & Information Science Department for 27 years. Dr. Roigers Ph.D. degree is in Computer & Information Sciences from the University of Minnesota. Dr. Roiger continues to serve as a part-time faculty member teaching courses in data mining, artificial intelligence and research methods. Richard enjoys interacting with his grandchildren, traveling, writing and pursuing his musical talents.

Püsilink: https://www.kriso.ee/db/9781003006695_pe.html

Märksõnad:

E-raamat: Just Enough R!: An Interactive Approach to Machine Learning and Analytics [Taylor & Francis e-raamat]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Kirjastuste teemad

Vali ostukorv