Muutke küpsiste eelistusi

Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R 1st ed. [Pehme köide]

  • Formaat: Paperback / softback, 372 pages, kõrgus x laius: 254x178 mm, kaal: 7409 g, 359 Illustrations, black and white; XXI, 372 p. 359 illus., 1 Paperback / softback
  • Ilmumisaeg: 01-Jul-2018
  • Kirjastus: APress
  • ISBN-10: 1484235630
  • ISBN-13: 9781484235638
  • Pehme köide
  • Hind: 62,59 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 73,64 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 372 pages, kõrgus x laius: 254x178 mm, kaal: 7409 g, 359 Illustrations, black and white; XXI, 372 p. 359 illus., 1 Paperback / softback
  • Ilmumisaeg: 01-Jul-2018
  • Kirjastus: APress
  • ISBN-10: 1484235630
  • ISBN-13: 9781484235638
Bridge the gap between a high-level understanding of how an algorithm works and knowing the nuts and bolts to tune your models better. This book will give you the confidence and skills when developing all the major machine learning models. In Pro Machine Learning Algorithms, you will first develop the algorithm in Excel so that you get a practical understanding of all the levers that can be tuned in a model, before implementing the models in Python/R.

You will cover all the major algorithms: supervised and unsupervised learning, which include linear/logistic regression; k-means clustering; PCA; recommender system; decision tree; random forest; GBM; and neural networks. You will also be exposed to the latest in deep learning through CNNs, RNNs, and word2vec for text mining. You will be learning not only the algorithms, but also the concepts of feature engineering to maximize the performance of a model. You will see the theory along with case studies, such as sentiment classification, fraud detection, recommender systems, and image recognition, so that you get the best of both theory and practice for the vast majority of the machine learning algorithms used in industry. Along with learning the algorithms, you will also be exposed to running machine-learning models on all the major cloud service providers.

You are expected to have minimal knowledge of statistics/software programming and by the end of this book you should be able to work on a machine learning project with confidence. 

What You Will Learn
  • Get an in-depth understanding of all the major machine learning and deep learning algorithms 
  • Fully appreciate the pitfalls to avoid while building models
  • Implement machine learning algorithms in the cloud 
  • Follow a hands-on approach through case studies for each algorithm
  • Gain the tricks of ensemble learning to build more accurate models
  • Discover the basics of programming in R/Python and the Keras framework for deep learning
Who This Book Is For

Business analysts/ IT professionals who want to transition into data science roles. Data scientists who want to solidify their knowledge in machine learning.



About the Author xv
About the Technical Reviewer xvii
Acknowledgments xix
Introduction xxi
Chapter 1 Basics of Machine Learning 1(16)
Regression and Classification
1(10)
Training and Testing Data
2(1)
The Need for Validation Dataset
3(2)
Measures of Accuracy
5(2)
AUC Value and ROC Curve
7(4)
Unsupervised Learning
11(1)
Typical Approach Towards Building a Model
12(3)
Where Is the Data Fetched From?
12(1)
Which Data Needs to Be Fetched?
12(1)
Pre-processing the Data
13(1)
Feature Interaction
14(1)
Feature Generation
14(1)
Building the Models
14(1)
Productionalizing the Models
14(1)
Build, Deploy, Test, and Iterate
15(1)
Summary
15(2)
Chapter 2 Linear Regression 17(32)
Introducing Linear Regression
17(1)
Variables: Dependent and Independent
18(1)
Correlation
18(1)
Causation
18(1)
Simple vs. Multivariate Linear Regression
18(1)
Formalizing Simple Linear Regression
19(1)
The Bias Term
19(1)
The Slope
20(1)
Solving a Simple Linear Regression
20(3)
More General Way of Solving a Simple Linear Regression
23(2)
Minimizing the Overall Sum of Squared Error
23(1)
Solving the Formula
24(1)
Working Details of Simple Linear Regression
25(5)
Complicating Simple Linear Regression a Little
26(3)
Arriving at Optimal Coefficient Values
29(1)
Introducing Root Mean Squared Error
29(1)
Running a Simple Linear Regression in R
30(6)
Residuals
31(1)
Coefficients
32(2)
SSE of Residuals (Residual Deviance)
34(1)
Null Deviance
34(1)
R Squared
34(1)
F-statistic
35(1)
Running a Simple Linear Regression in Python
36(1)
Common Pitfalls of Simple Linear Regression
37(1)
Multivariate Linear Regression
38(7)
Working details of Multivariate Linear Regression
40(1)
Multivariate Linear Regression in R
41(1)
Multivariate Linear Regression in Python
42(1)
Issue of Having a Non-significant Variable in the Model
42(1)
Issue of Multicollinearity
43(1)
Mathematical Intuition of Multicollinearity
43(1)
Further Points to Consider in Multivariate Linear Regression
44(1)
Assumptions of Linear Regression
45(2)
Summary
47(2)
Chapter 3 Logistic Regression 49(22)
Why Does Linear Regression Fail for Discrete Outcomes?
49(2)
A More General Solution: Sigmoid Curve
51(8)
Formalizing the Sigmoid Curve (Sigmoid Activation)
52(1)
From Sigmoid Curve to Logistic Regression
53(1)
Interpreting the Logistic Regression
53(1)
Working Details of Logistic Regression
54(2)
Estimating Error
56(1)
Least Squares Method and Assumption of Linearity
57(2)
Running a Logistic Regression in R
59(2)
Running a Logistic Regression in Python
61(1)
Identifying the MeaSure of Interest
61(7)
Common Pitfalls
68(1)
Time Between Prediction and the Event Happening
69(1)
Outliers in Independent variables
69(1)
Summary
69(2)
Chapter 4 Decision Tree 71(34)
Components of a Decision Tree
73(1)
Classification Decision Tree When There Are Multiple Discrete Independent Variables
74(11)
Information Gain
75(1)
Calculating Uncertainty: Entropy
75(1)
Calculating Information Gain
76(1)
Uncertainty in the Original Dataset
76(1)
Measuring the Improvement in Uncertainty
77(2)
Which Distinct Values Go to the Left and Right Nodes
79(5)
When Does the Splitting Process Stop?
84(1)
Classification Decision Tree for Continuous Independent Variables
85(3)
Classification Decision Tree When There Are Multiple Independent Variables
88(5)
Classification Decision Tree When There Are Continuous and Discrete
Independent Variables
93(1)
What If the Response Variable Is Continuous?
94(5)
Continuous Dependent Variable and Multiple Continuous Independent Variables
95(2)
Continuous Dependent Variable and Discrete Independent Variable
97(1)
Continuous Dependent Variable and Discrete, Continuous Independent Variables
98(1)
Implementing a Decision Tree in R
99(1)
Implementing a Decision Tree in Python
99(1)
Common Techniques in Tree Building
100(1)
Visualizing a Tree Build
101(1)
Impact of Outliers on Decision Trees
102(1)
Summary
103(2)
Chapter 5 Random Forest 105(12)
A Random Forest Scenario
105(3)
Bagging
107(1)
Working Details of a Random Forest
107(1)
Implementing a Random Forest in R
108(8)
Parameters to Tune in a Random Forest
112(2)
Variation of AUC by Depth of Tree
114(2)
Implementing a Random Forest in Python
116(1)
Summary
116(1)
Chapter 6 Gradient Boosting Machine 117(18)
Gradient Boosting Machine
117(1)
Working details of GBM
118(5)
Shrinkage
123(3)
AdaBoost
126(6)
Theory of AdaBoost
126(1)
Working Details of AdaBoost
127(5)
Additional Functionality for GBM
132(1)
Implementing GBM in Python
132(1)
Implementing GBM in R
133(1)
Summary
134(1)
Chapter 7 Artificial Neural Network 135(32)
Structure of a Neural Network
136(2)
Working Details of Training a Neural Network
138(14)
Forward Propagation
138(3)
Applying the Activation Function
141(5)
Back Propagation
146(1)
Working Out Back Propagation
146(2)
Stochastic Gradient Descent
148(1)
Diving Deep into Gradient Descent
148(4)
Why Have a Learning Rate?
152(1)
Batch Training
152(3)
The Concept of Softmax
153(2)
Different Loss Optimization Functions
155(2)
Scaling a Dataset
156(1)
Implementing Neural Network in Python
157(3)
Avoiding Over-fitting using Regularization
160(2)
Assigning Weightage to Regularization term
162(1)
Implementing Neural Network in R
163(2)
Summary
165(2)
Chapter 8 Word2vec 167(12)
Hand-Building a Word Vector
168(5)
Methods of Building a Word Vector
173(1)
Issues to Watch For in a Word2vec Model
174(1)
Frequent Words
174(1)
Negative Sampling
175(1)
Implementing Word2vec in Python
175(3)
Summary
178(1)
Chapter 9 Convolutional Neural Network 179(38)
The Problem with Traditional NN
180(7)
Scenario 1
183(1)
Scenario 2
184(1)
Scenario 3
185(1)
Scenario 4
186(1)
Understanding the Convolutional in CNN
187(3)
From Convolution to Activation
189(1)
From Convolution Activation to Pooling
189(1)
How Do Convolution and Pooling Help?
190(1)
Creating CNNs with Code
190(4)
Working Details of CNN
194(9)
Deep Diving into Convolutions/Kernels
203(2)
From Convolution and Pooling to Flattening: Fully Connected Layer
205(1)
From One Fully Connected Layer to Another
206(1)
From Fully Connected Layer to Output Layer
206(1)
Connecting the Dots: Feed Forward Network
206(1)
Other Details of CNN
207(2)
Backward Propagation in CNN
209(1)
Putting It All Together
210(2)
Data Augmentation
212(2)
Implementing CNN in R
214(1)
Summary
215(2)
Chapter 10 Recurrent Neural Network 217(42)
Understanding the Architecture
218(1)
Interpreting an RNN
219(1)
Working Details of RNN
220(7)
Time Step 1
224(1)
Time Step 2
224(1)
Time Step 3
225(2)
Implementing RNN: SimpleRNN
227(7)
Compiling a Model
228(2)
Verifying the Output of RNN
230(4)
Implementing RNN: Text Generation
234(4)
Embedding Layer in RNN
238(5)
Issues with Traditional RNN
243(2)
The Problem of Vanishing Gradient
244(1)
The Problem of Exploding Gradients
245(1)
LSTM
245(2)
Implementing Basic LSTM in keras
247(8)
Implementing LSTM for Sentiment Classification
255(1)
Implementing RNN in R
256(1)
Summary
257(2)
Chapter 11 Clustering 259(24)
Intuition of clustering
259(5)
Building Store Clusters for Performance Comparison
260(1)
Ideal Clustering
261(1)
Striking a Balance Between No Clustering and Too Much Clustering: K-means Clustering
262(2)
The Process of Clustering
264(4)
Working Details of K-means Clustering Algorithm
268(6)
Applying the K-means Algorithm on a Dataset
269(2)
Properties of the K-means Clustering Algorithm
271(3)
Implementing K-means Clustering in R
274(1)
Implementing K-means Clustering in Python
275(1)
Significance of the Major Metrics
276(1)
Identifying the Optimal K
276(2)
Top-Down Vs. Bottom-Up Clustering
278(2)
Hierarchical Clustering
278(2)
Major Drawback of Hierarchical Clustering
280(1)
Industry Use-Case of K-means Clustering
280(1)
Summary
281(2)
Chapter 12 Principal Component Analysis 283(16)
Intuition of PCA
283(3)
Working Details of PCA
286(5)
Scaling Data in PCA
291(1)
Extending PCA to Multiple Variables
291(3)
Implementing PCA in R
294(1)
Implementing PCA in Python
295(1)
Applying PCA to MNIST
296(1)
Summary
297(2)
Chapter 13 Recommender Systems 299(28)
Understanding k-nearest Neighbors
300(2)
Working Details of User-Based Collaborative Filtering
302(10)
Euclidian Distance
303(3)
Cosine Similarity
306(5)
Issues with UBCF
311(1)
Item-Based Collaborative Filtering
312(1)
Implementing Collaborative Filtering in R
313(1)
Implementing Collaborative Filtering in Python
314(1)
Working Details of Matrix Factorization
315(6)
Implementing Matrix Factorization in Python
321(3)
Implementing Matrix Factorization in R
324(1)
Summary
325(2)
Chapter 14 Implementing Algorithms in the Cloud 327(18)
Google Cloud Platform
327(4)
Microsoft Azure Cloud Platform
331(2)
Amazon Web Services
333(7)
Transferring Files to the Cloud Instance
340(2)
Running Instance Jupyter Notebooks from Your Local Machine
342(1)
Installing R on the Instance
343(1)
Summary
344(1)
Appendix: Basics of Excel, R, and Python 345(20)
Basics of Excel
345(2)
Basics of R
347(9)
Downloading R
348(1)
Installing and Configuring RStudio
348(1)
Getting Started with RStudio
349(7)
Basics of Python
356(9)
Downloading and installing Python
356(2)
Basic operations in Python
358(2)
Numpy
360(1)
Number generation using Numpy
361(1)
Slicing and indexing
362(1)
Pandas
363(1)
Indexing and slicing using Pandas
363(1)
Summarizing data
364(1)
Index 365
V Kishore Ayyadevara currently leads retail analytics consulting in a start-up. He received his MBA from IIM Calcutta. Following that, he worked for American Express in risk management and in Amazon's supply chain analytics teams. He is passionate about leveraging data to make informed decisions - faster and more accurately. Kishore's interests include identifying business problems that can be solved using data, simplifying the complexity within data science and applying data science to achieve quantifiable business results.