Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Machine Learning Using R 1st ed. [Pehme köide]

5.00/5 (2 hinnangut Goodreads-ist)

Karthik Ramasubramanian, Abhishek Singh

Formaat: Paperback / softback, 566 pages, kõrgus x laius: 235x155 mm, kaal: 8832 g, 155 Illustrations, color; 54 Illustrations, black and white; XXIII, 566 p. 209 illus., 155 illus. in color., 1 Paperback / softback
Ilmumisaeg: 24-Dec-2016
Kirjastus: APress
ISBN-10: 1484223330
ISBN-13: 9781484223338

Teised raamatud teemal:

Machine learning
Programming & scripting languages: general - (Hetkel poes: 1 nimetust)
Program concepts / learning to program

Pehme köide
Hind: 58,59 €*
* saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Lisa soovinimekirja

Formaat: Paperback / softback, 566 pages, kõrgus x laius: 235x155 mm, kaal: 8832 g, 155 Illustrations, color; 54 Illustrations, black and white; XXIII, 566 p. 209 illus., 155 illus. in color., 1 Paperback / softback
Ilmumisaeg: 24-Dec-2016
Kirjastus: APress
ISBN-10: 1484223330
ISBN-13: 9781484223338

Teised raamatud teemal:

Machine learning
Programming & scripting languages: general - (Hetkel poes: 1 nimetust)
Program concepts / learning to program

Püsilink: https://www.kriso.ee/db/9781484223338.html

Märksõnad:

This book is inspired by the Machine Learning Model Building Process Flow, which provides the reader the ability to understand a ML algorithm and apply the entire process of building a ML model from the raw data.

This new paradigm of teaching Machine Learning will bring about a radical change in perception for many of those who think this subject is difficult to learn. Though theory sometimes looks difficult, especially when there is heavy mathematics involved, the seamless flow from the theoretical aspects to example-driven learning provided in Blockchain and Capitalism makes it easy for someone to connect the dots.

For every Machine Learning algorithm covered in this book, a 3-D approach of theory, case-study and practice will be given. And where appropriate, the mathematics will be explained through visualization in R.

All practical demonstrations will be explored in R, a powerful programming language and software environment for statistical computing and graphics. The various packages and methods available in R will be used to explain the topics. In the end, readers will learn some of the latest technological advancements in building a scalable machine learning model with Big Data.

Who This Book is For:

Data scientists, data science professionals and researchers in academia who want to understand the nuances of Machine learning approaches/algorithms along with ways to see them in practice using R. The book will also benefit the readers who want to understand the technology behind implementing a scalable machine learning model using Apache Hadoop, Hive, Pig and Spark.

What you will learn:

1. ML model building process flow

2. Theoretical aspects of Machine Learning

3. Industry based Case-Study

4. Example based understanding of ML algorithm using R

5. Building ML models using Apache Hadoop and Spark

Arvustused

This is a fantastic and commendable effort by the authors to write a comprehensive book on machine learning. They have taken special care to provide complete R software code while discussing machine learning concepts and use cases. While there are plenty of resources on the Internet about machine learning, this book will serve as a single-source reference for both theoretical and practical machine learning leveraging R. (Computing Reviews, October, 2017)

About the Authors

xix

About the Technical Reviewer

xxi

Acknowledgments

xxiii

Chapter 1 Introduction to Machine Learning and R

(30)

1.1 Understanding the Evolution

(4)

1.1.1 Statistical Learning

(1)

1.1.2 Machine Learning (ML)

(1)

1.1.3 Artificial Intelligence (AI)

(1)

1.1.4 Data Mining

(1)

1.1.5 Data Science

(1)

1.2 Probability and Statistics

(12)

1.2.1 Counting and Probability Definition

(2)

1.2.2 Events and Relationships

(3)

1.2.3 Randomness, Probability, and Distributions

(1)

1.2.4 Confidence Interval and Hypothesis Testing

(5)

1.3 Getting Started with R

(8)

1.3.1 Basic Building Blocks

(1)

1.3.2 Data Structures in R

(2)

1.3.3 Subsetting

(2)

1.3.4 Functions and Apply Family

(3)

1.4 Machine Learning Process Flow

(2)

1.4.1 Plan

(1)

1.4.2 Explore

(1)

1.4.3 Build

(1)

1.4.4 Evaluate

(1)

1.5 Other Technologies

(1)

1.6 Summary

(1)

1.7 References

(3)

Chapter 2 Data Preparation and Exploration

(36)

2.1 Planning the Gathering of Data

(9)

2.1.1 Variables Types

(1)

2.1.2 Data Formats

(7)

2.1.3 Data Sources

(1)

2.2 Initial Data Analysis (IDA)

(10)

2.2.1 Discerning a First Look

(2)

2.2.2 Organizing Multiple Sources of Data into One

(3)

2.2.3 Cleaning the Data

(3)

2.2.4 Supplementing with More Information

(1)

2.2.5 Reshaping

(1)

2.3 Exploratory Data Analysis

(10)

2.3.1 Summary Statistics

(3)

2.3.2 Moment

(6)

2.4 Case Study: Credit Card Fraud

(4)

2.4.1 Data Import

(1)

2.4.2 Data Transformation

(1)

2.4.3 Data Exploration

(2)

2.5 Summary

(1)

2.6 References

(2)

Chapter 3 Sampling and Resampling Techniques

(62)

3.1 Introduction to Sampling

(1)

3.2 Sampling Terminology

(4)

3.2.1 Sample

(1)

3.2.2 Sampling Distribution

(1)

3.2.3 Population Mean and Variance

(1)

3.2.4 Sample Mean and Variance

(1)

3.2.5 Pooled Mean and Variance

(1)

3.2.6 Sample Point

(1)

3.2.7 Sampling Error

(1)

3.2.8 Sampling Fraction

(1)

3.2.9 Sampling Bias

(1)

3.2.10 Sampling Without Replacement (SWOR)

(1)

3.2.11 Sampling with Replacement (SWR)

(1)

3.3 Credit Card Fraud: Population Statistics

(5)

3.3.1 Data Description

(1)

3.3.2 Population Mean

(1)

3.3.3 Population Variance

(1)

3.3.4 Pooled Mean and Variance

(3)

3.4 Business Implications of Sampling

(1)

3.4.1 Features of Sampling

(1)

3.4.2 Shortcomings of Sampling

(1)

3.5 Probability and Non-Probability Sampling

(2)

3.5.1 Types of Non-Probability Sampling

(1)

3.6 Statistical Theory on Sampling Distributions

(8)

3.6.1 Law of Large Numbers: LLN

(4)

3.6.2 Central Limit Theorem

(4)

3.7 Probability Sampling Techniques

(35)

3.7.1 Population Statistics

(4)

3.7.2 Simple Random Sampling

(7)

3.7.3 Systematic Random Sampling

100

(4)

3.7.4 Stratified Random Sampling

104

(7)

3.7.5 Cluster Sampling

111

(6)

3.7.6 Bootstrap Sampling

117

(7)

3.8 Monte Carlo Method: Acceptance-Rejection Method

124

(2)

3.9 A Qualitative Account of Computational Savings by Sampling

126

(1)

3.10 Summary

127

(2)

Chapter 4 Data Visualization in R

129

(52)

4.1 Introduction to the ggplot2 Package

130

(2)

4.2 World Development Indicators

132

(1)

4.3 Line Chart

132

(6)

4.4 Stacked Column Charts

138

(6)

4.5 Scatterplots

144

(1)

4.6 Boxplots

145

(3)

4.7 Histograms and Density Plots

148

(4)

4.8 Pie Charts

152

(2)

4.9 Correlation Plots

154

(2)

4.10 HeatMaps

156

(2)

4.11 Bubble Charts

158

(4)

4.12 Waterfall Charts

162

(3)

4.13 Dendogram

165

(2)

4.14 Wordclouds

167

(2)

4.15 Sankey Plots

169

(1)

4.16 Time Series Graphs

170

(2)

4.17 Cohort Diagrams

172

(2)

4.18 Spatial Maps

174

(4)

4.19 Summary

178

(1)

4.20 References

179

(2)

Chapter 5 Feature Engineering

181

(38)

5.1 Introduction to Feature Engineering

182

(3)

5.1.1 Filter Methods

184

(1)

5.1.2 Wrapper Methods

184

(1)

5.1.3 Embedded Methods

184

(1)

5.2 Understanding the Working Data

185

(6)

5.2.1 Data Summary

186

(1)

5.2.2 Properties of Dependent Variable

186

(3)

5.2.3 Features Availability: Continuous or Categorical

189

(2)

5.2.4 Setting Up Data Assumptions

191

(1)

5.3 Feature Ranking

191

(4)

5.4 Variable Subset Selection

195

(15)

5.4.1 Filter Method

195

(4)

5.4.2 Wrapper Methods

199

(7)

5.4.3 Embedded Methods

206

(4)

5.5 Dimensionality Reduction

210

(5)

5.6 Feature Engineering Checklist

215

(2)

5.7 Summary

217

(1)

5.8 References

217

(2)

Chapter 6 Machine Learning Theory and Practices

219

(206)

6.1 Machine Learning Types

222

(2)

6.1.1 Supervised Learning

222

(1)

6.1.2 Unsupervised Learning

223

(1)

6.1.3 Semi-Supervised Learning

223

(1)

6.1.4 Reinforcement Learning

223

(1)

6.2 Groups of Machine Learning Algorithms

224

(5)

6.3 Real-World Datasets

229

(4)

6.3.1 House Sale Prices

229

(1)

6.3.2 Purchase Preference

230

(1)

6.3.3 Twitter Feeds and Article

231

(1)

6.3.4 Breast Cancer

231

(1)

6.3.5 Market Basket

232

(1)

6.3.6 Amazon Food Review

232

(1)

6.4 Regression Analysis

233

(2)

6.5 Correlation Analysis

235

(55)

6.5.1 Linear Regression

238

(3)

6.5.2 Simple Linear Regression

241

(3)

6.5.3 Multiple Linear Regression

244

(3)

6.5.4 Model Diagnostics: Linear Regression

247

(14)

6.5.5 Polynomial Regression

261

(4)

6.5.6 Logistic Regression

265

(1)

6.5.7 Logit Transformation

266

(1)

6.5.8 Odds Ratio

267

(8)

6.5.9 Model Diagnostics: Logistic Regression

275

(10)

6.5.10 Multinomial Logistic Regression

285

(4)

6.5.11 Generalized Linear Models

289

(1)

6.5.12 Conclusion

290

(1)

6.6 Support Vector Machine SVM

290

(7)

6.6.1 Linear SVM

292

(1)

6.6.2 Binary SVM Classifier

293

(2)

6.6.3 Multi-Class SVM

295

(2)

6.6.4 Conclusion

297

(1)

6.7 Decision Trees

297

(33)

6.7.1 Types of Decision Trees

298

(2)

6.7.2 Decision Measures

300

(2)

6.7.3 Decision Tree Learning Methods

302

(19)

6.7.4 Ensemble Trees

321

(8)

6.7.5 Conclusion

329

(1)

6.8 The Naive Bayes Method

330

(7)

6.8.1 Conditional Probability

330

(1)

6.8.2 Bayes Theorem

330

(1)

6.8.3 Prior Probability

331

(1)

6.8.4 Posterior Probability

331

(1)

6.8.5 Likelihood and Marginal Likelihood

331

(1)

6.8.6 Naive Bayes Methods

332

(5)

6.8.7 Conclusion

337

(1)

6.9 Cluster Analysis

337

(17)

6.9.1 Introduction to Clustering

338

(1)

6.9.2 Clustering Algorithms

339

(12)

6.9.3 Internal Evaluation

351

(2)

6.9.4 External Evaluation

353

(1)

6.9.5 Conclusion

354

(1)

6.10 Association Rule Mining

354

(18)

6.10.1 Introduction to Association Concepts

355

(2)

6.10.2 Rule-Mining Algorithms

357

(7)

6.10.3 Recommendation Algorithms

364

(8)

6.10.4 Conclusion

372

(1)

6.11 Artificial Neural Networks

372

(24)

6.11.1 Human Cognitive Learning

372

(2)

6.11.2 Perceptron

374

(3)

6.11.3 Sigmoid Neuron

377

(1)

6.11.4 Neural Network Architecture

377

(2)

6.11.5 Supervised versus Unsupervised Neural Nets

379

(1)

6.11.6 Neural Network Learning Algorithms

380

(2)

6.11.7 Feed-Forward Back-Propagation

382

(7)

6.11.8 Deep Learning

389

(7)

6.11.9 Conclusion

396

(1)

6.12 Text-Mining Approaches

396

(21)

6.12.1 Introduction to Text Mining

397

(1)

6.12.2 Text Summarization

398

(2)

6.12.3 TF-IDF

400

(2)

6.12.4 Part-of-Speech (POS) Tagging

402

(4)

6.12.5 Word Cloud

406

(1)

6.12.6 Text Analysis: Microsoft Cognitive Services

407

(10)

6.12.7 Conclusion

417

(1)

6.13 Online Machine Learning Algorithms

417

(5)

6.13.1 Fuzzy C-Means Clustering

419

(3)

6.13.2 Conclusion

422

(1)

6.14 Model Building Checklist

422

(1)

6.15 Summary

423

(1)

6.16 References

423

(2)

Chapter 7 Machine Learning Model Evaluation

425

(40)

7.1 Dataset

426

(4)

7.1.1 House Sale Prices

426

(2)

7.1.2 Purchase Preference

428

(2)

7.2 Introduction to Model Performance and Evaluation

430

(1)

7.3 Objectives of Model Performance Evaluation

431

(1)

7.4 Population Stability Index

432

(5)

7.5 Model Evaluation for Continuous Output

437

(8)

7.5.1 Mean Absolute Error

439

(2)

7.5.2 Root Mean Square Error

441

(1)

7.5.3 R-Square

442

(3)

7.6 Model Evaluation for Discrete Output

445

(10)

7.6.1 Classification Matrix

446

(5)

7.6.2 Sensitivity and Specificity

451

(1)

7.6.3 Area Under ROC Curve

452

(3)

7.7 Probabilistic Techniques

455

(4)

7.7.1 K-Fold Cross Validation

456

(2)

7.7.2 Bootstrap Sampling

458

(1)

7.8 The Kappa Error Metric

459

(4)

7.9 Summary

463

(1)

7.10 References

464

(1)

Chapter 8 Model Performance Improvement

465

(54)

8.1 Machine Learning and Statistical Modeling

466

(2)

8.2 Overview of the Caret Package

468

(2)

8.3 Introduction to Hyper-Parameters

470

(4)

8.4 Hyper-Parameter Optimization

474

(14)

8.4.1 Manual Search

475

(2)

8.4.2 Manual Grid Search

477

(2)

8.4.3 Automatic Grid Search

479

(2)

8.4.4 Optimal Search

481

(2)

8.4.5 Random Search

483

(2)

8.4.6 Custom Searching

485

(3)

8.5 The Bias and Variance Tradeoff

488

(5)

8.5.1 Bagging or Bootstrap Aggregation

492

(1)

8.5.2 Boosting

493

(1)

8.6 Introduction to Ensemble Learning

493

(5)

8.6.1 Voting Ensembles

494

(1)

8.6.2 Advanced Methods in Ensemble Learning

495

(3)

8.7 Ensemble Techniques Illustration in R

498

(13)

8.7.1 Bagging Trees

498

(2)

8.7.2 Gradient Boosting with a Decision Tree

500

(5)

8.7.3 Blending KNN and Rpart

505

(1)

8.7.4 Stacking Using caretEnsemble

506

(5)

8.8 Advanced Topic: Bayesian Optimization of Machine Learning Models

511

(5)

8.9 Summary

516

(1)

8.10 References

517

(2)

Chapter 9 Scalable Machine Learning and Related Technologies

519

(36)

9.1 Distributed Processing and Storage

520

(6)

9.1.1 Google File System (GFS)

520

(2)

9.1.2 MapReduce

522

(1)

9.1.3 Parallel Execution in R

523

(3)

9.2 The Hadoop Ecosystem

526

(15)

9.2.1 MapReduce

527

(4)

9.2.2 Hive

531

(4)

9.2.3 Apache Pig

535

(3)

9.2.4 HBase

538

(2)

9.2.5 Spark

540

(1)

9.3 Machine Learning in R with Spark

541

(5)

9.3.1 Setting the Environment Variable

542

(1)

9.3.2 Initializing the Spark Session

542

(1)

9.3.3 Loading Data and the Running Pre-Process

542

(1)

9.3.4 Creating SparkDataFrame

543

(1)

9.3.5 Building the ML Model

544

(1)

9.3.6 Predicting the Test Data

545

(1)

9.3.7 Stopping the SparkR Session

546

(1)

9.4 Machine Learning in R with H20

546

(7)

9.4.1 Installation of Packages

547

(1)

9.4.2 Initialization of H20 Clusters

547

(1)

9.4.3 Deep Learning Demo in R with H20

548

(5)

9.5 Summary

553

(1)

9.6 References

554

(1)

Index

555

Karthik Ramasubramanian, works for one of the largest and fastest growing technology unicorn in India, Hike Messenger. In his 7 years of research and industry experience, he has worked on cross-industry data science problems in retail, e-commerce, and technology, developing and prototyping data driven solutions. In his previous role at Snapdeal, one of the largest e-commerce retailers in India, he was leading core statistical modelling initiatives for customer growth and pricing analytics. Prior to Snapdeal, he was part of central database team, managing the data warehouses for global business applications of Reckitt Benckiser (RB). He has a Masters in Theoretical Computer Science from PSG College of Technology, Anna University and certified big data professional. He is passionate about teaching and mentoring future data scientists through different online and public forums. He also enjoys writing poems in his spare time and is an avid traveler.

Abhishek Singh, is based in Ireland as a Data Scientist in the Advanced Data Science team for Prudential Financial Inc. He has 5 years of professional and academic experience in the Data Science field. At Deloitte Advisory, he led Risk Analytics initiatives for top US banks in their regulatory risk, credit risk, and balance sheet modelling requirements. In his current role, he is working on scalable machine learning algorithms for Individual Life Insurance branch of Prudential. He was also a trainer at Deloitte Professional University and development initiatives for professionals in the areas of statistics, economics, financial risk and data science tools (SAS and R). Abhishek is a B.Tech. in Mathematics and Computing from Indian Institute of Technology, Guwahati and has an MBA from Indian Institute of Management, Bangalore. He speaks at public events on Data Science and is working with leading universities towards bringing data science skills to graduates. He also holds a Post Graduate Diploma in Cyber Law from NALSAR University. He enjoys cooking and photography during his free hours.

Machine Learning Using R 1st ed. [Pehme köide]

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv