Muutke küpsiste eelistusi

Machine Learning Using R 1st ed. [Pehme köide]

  • Formaat: Paperback / softback, 566 pages, kõrgus x laius: 235x155 mm, kaal: 8832 g, 155 Illustrations, color; 54 Illustrations, black and white; XXIII, 566 p. 209 illus., 155 illus. in color., 1 Paperback / softback
  • Ilmumisaeg: 24-Dec-2016
  • Kirjastus: APress
  • ISBN-10: 1484223330
  • ISBN-13: 9781484223338
  • Pehme köide
  • Hind: 58,59 €*
  • * saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
  • See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 566 pages, kõrgus x laius: 235x155 mm, kaal: 8832 g, 155 Illustrations, color; 54 Illustrations, black and white; XXIII, 566 p. 209 illus., 155 illus. in color., 1 Paperback / softback
  • Ilmumisaeg: 24-Dec-2016
  • Kirjastus: APress
  • ISBN-10: 1484223330
  • ISBN-13: 9781484223338

This book is inspired by the Machine Learning Model Building Process Flow, which provides the reader the ability to understand a ML algorithm and apply the entire process of building a ML model from the raw data.

This new paradigm of teaching Machine Learning will bring about a radical change in perception for many of those who think this subject is difficult to learn. Though theory sometimes looks difficult, especially when there is heavy mathematics involved, the seamless flow from the theoretical aspects to example-driven learning provided in Blockchain and Capitalism makes it easy for someone to connect the dots.

For every Machine Learning algorithm covered in this book, a 3-D approach of theory, case-study and practice will be given. And where appropriate, the mathematics will be explained through visualization in R.

All practical demonstrations will be explored in R, a powerful programming language and software environment for statistical computing and graphics. The various packages and methods available in R will be used to explain the topics. In the end, readers will learn some of the latest technological advancements in building a scalable machine learning model with Big Data.


Who This Book is For:

Data scientists, data science professionals and researchers in academia who want to understand the nuances of Machine learning approaches/algorithms along with ways to see them in practice using R. The book will also benefit the readers who want to understand the technology behind implementing a scalable machine learning model using Apache Hadoop, Hive, Pig and Spark.

What you will learn: 

1. ML model building process flow
2. Theoretical aspects of Machine Learning
3. Industry based Case-Study
4. Example based understanding of ML algorithm using R
5. Building ML models using Apache Hadoop and Spark

Arvustused

This is a fantastic and commendable effort by the authors to write a comprehensive book on machine learning. They have taken special care to provide complete R software code while discussing machine learning concepts and use cases. While there are plenty of resources on the Internet about machine learning, this book will serve as a single-source reference for both theoretical and practical machine learning leveraging R. (Computing Reviews, October, 2017)

About the Authors xix
About the Technical Reviewer xxi
Acknowledgments xxiii
Chapter 1 Introduction to Machine Learning and R
1(30)
1.1 Understanding the Evolution
2(4)
1.1.1 Statistical Learning
2(1)
1.1.2 Machine Learning (ML)
3(1)
1.1.3 Artificial Intelligence (AI)
3(1)
1.1.4 Data Mining
4(1)
1.1.5 Data Science
5(1)
1.2 Probability and Statistics
6(12)
1.2.1 Counting and Probability Definition
7(2)
1.2.2 Events and Relationships
9(3)
1.2.3 Randomness, Probability, and Distributions
12(1)
1.2.4 Confidence Interval and Hypothesis Testing
13(5)
1.3 Getting Started with R
18(8)
1.3.1 Basic Building Blocks
18(1)
1.3.2 Data Structures in R
19(2)
1.3.3 Subsetting
21(2)
1.3.4 Functions and Apply Family
23(3)
1.4 Machine Learning Process Flow
26(2)
1.4.1 Plan
26(1)
1.4.2 Explore
26(1)
1.4.3 Build
27(1)
1.4.4 Evaluate
27(1)
1.5 Other Technologies
28(1)
1.6 Summary
28(1)
1.7 References
28(3)
Chapter 2 Data Preparation and Exploration
31(36)
2.1 Planning the Gathering of Data
32(9)
2.1.1 Variables Types
32(1)
2.1.2 Data Formats
33(7)
2.1.3 Data Sources
40(1)
2.2 Initial Data Analysis (IDA)
41(10)
2.2.1 Discerning a First Look
41(2)
2.2.2 Organizing Multiple Sources of Data into One
43(3)
2.2.3 Cleaning the Data
46(3)
2.2.4 Supplementing with More Information
49(1)
2.2.5 Reshaping
50(1)
2.3 Exploratory Data Analysis
51(10)
2.3.1 Summary Statistics
52(3)
2.3.2 Moment
55(6)
2.4 Case Study: Credit Card Fraud
61(4)
2.4.1 Data Import
61(1)
2.4.2 Data Transformation
62(1)
2.4.3 Data Exploration
63(2)
2.5 Summary
65(1)
2.6 References
65(2)
Chapter 3 Sampling and Resampling Techniques
67(62)
3.1 Introduction to Sampling
68(1)
3.2 Sampling Terminology
69(4)
3.2.1 Sample
69(1)
3.2.2 Sampling Distribution
70(1)
3.2.3 Population Mean and Variance
70(1)
3.2.4 Sample Mean and Variance
70(1)
3.2.5 Pooled Mean and Variance
70(1)
3.2.6 Sample Point
71(1)
3.2.7 Sampling Error
71(1)
3.2.8 Sampling Fraction
72(1)
3.2.9 Sampling Bias
72(1)
3.2.10 Sampling Without Replacement (SWOR)
72(1)
3.2.11 Sampling with Replacement (SWR)
72(1)
3.3 Credit Card Fraud: Population Statistics
73(5)
3.3.1 Data Description
73(1)
3.3.2 Population Mean
74(1)
3.3.3 Population Variance
74(1)
3.3.4 Pooled Mean and Variance
75(3)
3.4 Business Implications of Sampling
78(1)
3.4.1 Features of Sampling
79(1)
3.4.2 Shortcomings of Sampling
79(1)
3.5 Probability and Non-Probability Sampling
79(2)
3.5.1 Types of Non-Probability Sampling
80(1)
3.6 Statistical Theory on Sampling Distributions
81(8)
3.6.1 Law of Large Numbers: LLN
81(4)
3.6.2 Central Limit Theorem
85(4)
3.7 Probability Sampling Techniques
89(35)
3.7.1 Population Statistics
89(4)
3.7.2 Simple Random Sampling
93(7)
3.7.3 Systematic Random Sampling
100(4)
3.7.4 Stratified Random Sampling
104(7)
3.7.5 Cluster Sampling
111(6)
3.7.6 Bootstrap Sampling
117(7)
3.8 Monte Carlo Method: Acceptance-Rejection Method
124(2)
3.9 A Qualitative Account of Computational Savings by Sampling
126(1)
3.10 Summary
127(2)
Chapter 4 Data Visualization in R
129(52)
4.1 Introduction to the ggplot2 Package
130(2)
4.2 World Development Indicators
132(1)
4.3 Line Chart
132(6)
4.4 Stacked Column Charts
138(6)
4.5 Scatterplots
144(1)
4.6 Boxplots
145(3)
4.7 Histograms and Density Plots
148(4)
4.8 Pie Charts
152(2)
4.9 Correlation Plots
154(2)
4.10 HeatMaps
156(2)
4.11 Bubble Charts
158(4)
4.12 Waterfall Charts
162(3)
4.13 Dendogram
165(2)
4.14 Wordclouds
167(2)
4.15 Sankey Plots
169(1)
4.16 Time Series Graphs
170(2)
4.17 Cohort Diagrams
172(2)
4.18 Spatial Maps
174(4)
4.19 Summary
178(1)
4.20 References
179(2)
Chapter 5 Feature Engineering
181(38)
5.1 Introduction to Feature Engineering
182(3)
5.1.1 Filter Methods
184(1)
5.1.2 Wrapper Methods
184(1)
5.1.3 Embedded Methods
184(1)
5.2 Understanding the Working Data
185(6)
5.2.1 Data Summary
186(1)
5.2.2 Properties of Dependent Variable
186(3)
5.2.3 Features Availability: Continuous or Categorical
189(2)
5.2.4 Setting Up Data Assumptions
191(1)
5.3 Feature Ranking
191(4)
5.4 Variable Subset Selection
195(15)
5.4.1 Filter Method
195(4)
5.4.2 Wrapper Methods
199(7)
5.4.3 Embedded Methods
206(4)
5.5 Dimensionality Reduction
210(5)
5.6 Feature Engineering Checklist
215(2)
5.7 Summary
217(1)
5.8 References
217(2)
Chapter 6 Machine Learning Theory and Practices
219(206)
6.1 Machine Learning Types
222(2)
6.1.1 Supervised Learning
222(1)
6.1.2 Unsupervised Learning
223(1)
6.1.3 Semi-Supervised Learning
223(1)
6.1.4 Reinforcement Learning
223(1)
6.2 Groups of Machine Learning Algorithms
224(5)
6.3 Real-World Datasets
229(4)
6.3.1 House Sale Prices
229(1)
6.3.2 Purchase Preference
230(1)
6.3.3 Twitter Feeds and Article
231(1)
6.3.4 Breast Cancer
231(1)
6.3.5 Market Basket
232(1)
6.3.6 Amazon Food Review
232(1)
6.4 Regression Analysis
233(2)
6.5 Correlation Analysis
235(55)
6.5.1 Linear Regression
238(3)
6.5.2 Simple Linear Regression
241(3)
6.5.3 Multiple Linear Regression
244(3)
6.5.4 Model Diagnostics: Linear Regression
247(14)
6.5.5 Polynomial Regression
261(4)
6.5.6 Logistic Regression
265(1)
6.5.7 Logit Transformation
266(1)
6.5.8 Odds Ratio
267(8)
6.5.9 Model Diagnostics: Logistic Regression
275(10)
6.5.10 Multinomial Logistic Regression
285(4)
6.5.11 Generalized Linear Models
289(1)
6.5.12 Conclusion
290(1)
6.6 Support Vector Machine SVM
290(7)
6.6.1 Linear SVM
292(1)
6.6.2 Binary SVM Classifier
293(2)
6.6.3 Multi-Class SVM
295(2)
6.6.4 Conclusion
297(1)
6.7 Decision Trees
297(33)
6.7.1 Types of Decision Trees
298(2)
6.7.2 Decision Measures
300(2)
6.7.3 Decision Tree Learning Methods
302(19)
6.7.4 Ensemble Trees
321(8)
6.7.5 Conclusion
329(1)
6.8 The Naive Bayes Method
330(7)
6.8.1 Conditional Probability
330(1)
6.8.2 Bayes Theorem
330(1)
6.8.3 Prior Probability
331(1)
6.8.4 Posterior Probability
331(1)
6.8.5 Likelihood and Marginal Likelihood
331(1)
6.8.6 Naive Bayes Methods
332(5)
6.8.7 Conclusion
337(1)
6.9 Cluster Analysis
337(17)
6.9.1 Introduction to Clustering
338(1)
6.9.2 Clustering Algorithms
339(12)
6.9.3 Internal Evaluation
351(2)
6.9.4 External Evaluation
353(1)
6.9.5 Conclusion
354(1)
6.10 Association Rule Mining
354(18)
6.10.1 Introduction to Association Concepts
355(2)
6.10.2 Rule-Mining Algorithms
357(7)
6.10.3 Recommendation Algorithms
364(8)
6.10.4 Conclusion
372(1)
6.11 Artificial Neural Networks
372(24)
6.11.1 Human Cognitive Learning
372(2)
6.11.2 Perceptron
374(3)
6.11.3 Sigmoid Neuron
377(1)
6.11.4 Neural Network Architecture
377(2)
6.11.5 Supervised versus Unsupervised Neural Nets
379(1)
6.11.6 Neural Network Learning Algorithms
380(2)
6.11.7 Feed-Forward Back-Propagation
382(7)
6.11.8 Deep Learning
389(7)
6.11.9 Conclusion
396(1)
6.12 Text-Mining Approaches
396(21)
6.12.1 Introduction to Text Mining
397(1)
6.12.2 Text Summarization
398(2)
6.12.3 TF-IDF
400(2)
6.12.4 Part-of-Speech (POS) Tagging
402(4)
6.12.5 Word Cloud
406(1)
6.12.6 Text Analysis: Microsoft Cognitive Services
407(10)
6.12.7 Conclusion
417(1)
6.13 Online Machine Learning Algorithms
417(5)
6.13.1 Fuzzy C-Means Clustering
419(3)
6.13.2 Conclusion
422(1)
6.14 Model Building Checklist
422(1)
6.15 Summary
423(1)
6.16 References
423(2)
Chapter 7 Machine Learning Model Evaluation
425(40)
7.1 Dataset
426(4)
7.1.1 House Sale Prices
426(2)
7.1.2 Purchase Preference
428(2)
7.2 Introduction to Model Performance and Evaluation
430(1)
7.3 Objectives of Model Performance Evaluation
431(1)
7.4 Population Stability Index
432(5)
7.5 Model Evaluation for Continuous Output
437(8)
7.5.1 Mean Absolute Error
439(2)
7.5.2 Root Mean Square Error
441(1)
7.5.3 R-Square
442(3)
7.6 Model Evaluation for Discrete Output
445(10)
7.6.1 Classification Matrix
446(5)
7.6.2 Sensitivity and Specificity
451(1)
7.6.3 Area Under ROC Curve
452(3)
7.7 Probabilistic Techniques
455(4)
7.7.1 K-Fold Cross Validation
456(2)
7.7.2 Bootstrap Sampling
458(1)
7.8 The Kappa Error Metric
459(4)
7.9 Summary
463(1)
7.10 References
464(1)
Chapter 8 Model Performance Improvement
465(54)
8.1 Machine Learning and Statistical Modeling
466(2)
8.2 Overview of the Caret Package
468(2)
8.3 Introduction to Hyper-Parameters
470(4)
8.4 Hyper-Parameter Optimization
474(14)
8.4.1 Manual Search
475(2)
8.4.2 Manual Grid Search
477(2)
8.4.3 Automatic Grid Search
479(2)
8.4.4 Optimal Search
481(2)
8.4.5 Random Search
483(2)
8.4.6 Custom Searching
485(3)
8.5 The Bias and Variance Tradeoff
488(5)
8.5.1 Bagging or Bootstrap Aggregation
492(1)
8.5.2 Boosting
493(1)
8.6 Introduction to Ensemble Learning
493(5)
8.6.1 Voting Ensembles
494(1)
8.6.2 Advanced Methods in Ensemble Learning
495(3)
8.7 Ensemble Techniques Illustration in R
498(13)
8.7.1 Bagging Trees
498(2)
8.7.2 Gradient Boosting with a Decision Tree
500(5)
8.7.3 Blending KNN and Rpart
505(1)
8.7.4 Stacking Using caretEnsemble
506(5)
8.8 Advanced Topic: Bayesian Optimization of Machine Learning Models
511(5)
8.9 Summary
516(1)
8.10 References
517(2)
Chapter 9 Scalable Machine Learning and Related Technologies
519(36)
9.1 Distributed Processing and Storage
520(6)
9.1.1 Google File System (GFS)
520(2)
9.1.2 MapReduce
522(1)
9.1.3 Parallel Execution in R
523(3)
9.2 The Hadoop Ecosystem
526(15)
9.2.1 MapReduce
527(4)
9.2.2 Hive
531(4)
9.2.3 Apache Pig
535(3)
9.2.4 HBase
538(2)
9.2.5 Spark
540(1)
9.3 Machine Learning in R with Spark
541(5)
9.3.1 Setting the Environment Variable
542(1)
9.3.2 Initializing the Spark Session
542(1)
9.3.3 Loading Data and the Running Pre-Process
542(1)
9.3.4 Creating SparkDataFrame
543(1)
9.3.5 Building the ML Model
544(1)
9.3.6 Predicting the Test Data
545(1)
9.3.7 Stopping the SparkR Session
546(1)
9.4 Machine Learning in R with H20
546(7)
9.4.1 Installation of Packages
547(1)
9.4.2 Initialization of H20 Clusters
547(1)
9.4.3 Deep Learning Demo in R with H20
548(5)
9.5 Summary
553(1)
9.6 References
554(1)
Index 555
Karthik Ramasubramanian, works for one of the largest and fastest growing technology unicorn in India, Hike Messenger. In his 7 years of research and industry experience, he has worked on cross-industry data science problems in retail, e-commerce, and technology, developing and prototyping data driven solutions. In his previous role at Snapdeal, one of the largest e-commerce retailers in India, he was leading core statistical modelling initiatives for customer growth and pricing analytics. Prior to Snapdeal, he was part of central database team, managing the data warehouses for global business applications of Reckitt Benckiser (RB). He has a Masters in Theoretical Computer Science from PSG College of Technology, Anna University and certified big data professional. He is passionate about teaching and mentoring future data scientists through different online and public forums. He also enjoys writing poems in his spare time and is an avid traveler.



Abhishek Singh, is based in Ireland as a Data Scientist in the Advanced Data Science team for Prudential Financial Inc. He has 5 years of professional and academic experience in the Data Science field. At Deloitte Advisory, he led Risk Analytics initiatives for top US banks in their regulatory risk, credit risk, and balance sheet modelling requirements. In his current role, he is working on scalable machine learning algorithms for Individual Life Insurance branch of Prudential. He was also a trainer at Deloitte Professional University and development initiatives for professionals in the areas of statistics, economics, financial risk and data science tools (SAS and R). Abhishek is a B.Tech. in Mathematics and Computing from Indian Institute of Technology, Guwahati and has an MBA from Indian Institute of Management, Bangalore. He speaks at public events on Data Science and is working with leading universities towards bringing data science skills to graduates. He also holds a Post Graduate Diploma in Cyber Law from NALSAR University. He enjoys cooking and photography during his free hours.