Preface |
|
vii | |
|
|
1 | (24) |
|
|
1 | (4) |
|
Problems Machine Learning Can Solve |
|
|
2 | (2) |
|
Knowing Your Task and Knowing Your Data |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
5 | (2) |
|
|
6 | (1) |
|
Essential Libraries and Tools |
|
|
7 | (5) |
|
|
7 | (1) |
|
|
7 | (1) |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
12 | (1) |
|
Versions Used in this Book |
|
|
12 | (1) |
|
A First Application: Classifying Iris Species |
|
|
13 | (10) |
|
|
14 | (3) |
|
Measuring Success: Training and Testing Data |
|
|
17 | (2) |
|
First Things First: Look at Your Data |
|
|
19 | (1) |
|
Building Your First Model: k-Nearest Neighbors |
|
|
20 | (2) |
|
|
22 | (1) |
|
|
22 | (1) |
|
|
23 | (2) |
|
|
25 | (106) |
|
Classification and Regression |
|
|
25 | (1) |
|
Generalization, Overfitting, and Underfitting |
|
|
26 | (3) |
|
Relation of Model Complexity to Dataset Size |
|
|
29 | (1) |
|
Supervised Machine Learning Algorithms |
|
|
29 | (90) |
|
|
30 | (5) |
|
|
35 | (10) |
|
|
45 | (23) |
|
|
68 | (2) |
|
|
70 | (13) |
|
Ensembles of Decision Trees |
|
|
83 | (9) |
|
Kernelized Support Vector Machines |
|
|
92 | (12) |
|
Neural Networks (Deep Learning) |
|
|
104 | (15) |
|
Uncertainty Estimates from Classifiers |
|
|
119 | (8) |
|
|
120 | (2) |
|
|
122 | (2) |
|
Uncertainty in Multiclass Classification |
|
|
124 | (3) |
|
|
127 | (4) |
|
3 Unsupervised Learning and Preprocessing |
|
|
131 | (80) |
|
Types of Unsupervised Learning |
|
|
131 | (1) |
|
Challenges in Unsupervised Learning |
|
|
132 | (1) |
|
Preprocessing and Scaling |
|
|
132 | (8) |
|
Different Kinds of Preprocessing |
|
|
133 | (1) |
|
Applying Data Transformations |
|
|
134 | (2) |
|
Scaling Training and Test Data the Same Way |
|
|
136 | (2) |
|
The Effect of Preprocessing on Supervised Learning |
|
|
138 | (2) |
|
Dimensionality Reduction, Feature Extraction, and Manifold Learning |
|
|
140 | (28) |
|
Principal Component Analysis (PCA) |
|
|
140 | (16) |
|
Non-Negative Matrix Factorization (NMF) |
|
|
156 | (7) |
|
Manifold Learning with t-SNE |
|
|
163 | (5) |
|
|
168 | (40) |
|
|
168 | (14) |
|
|
182 | (5) |
|
|
187 | (4) |
|
Comparing and Evaluating Clustering Algorithms |
|
|
191 | (16) |
|
Summary of Clustering Methods |
|
|
207 | (1) |
|
|
208 | (3) |
|
4 Representing Data and Engineering Features |
|
|
211 | (40) |
|
|
212 | (8) |
|
One-Hot-Encoding (Dummy Variables) |
|
|
213 | (5) |
|
Numbers Can Encode Categoricals |
|
|
218 | (2) |
|
Binning, Discretization, Linear Models, and Trees |
|
|
220 | (4) |
|
Interactions and Polynomials |
|
|
224 | (8) |
|
Univariate Nonlinear Transformations |
|
|
232 | (4) |
|
Automatic Feature Selection |
|
|
236 | (6) |
|
|
236 | (2) |
|
Model-Based Feature Selection |
|
|
238 | (2) |
|
Iterative Feature Selection |
|
|
240 | (2) |
|
Utilizing Expert Knowledge |
|
|
242 | (8) |
|
|
250 | (1) |
|
5 Model Evaluation and Improvement |
|
|
251 | (54) |
|
|
252 | (8) |
|
Cross-Validation in scikit-learn |
|
|
253 | (1) |
|
Benefits of Cross-Validation |
|
|
254 | (1) |
|
Stratified k-Fold Cross-Validation and Other Strategies |
|
|
254 | (6) |
|
|
260 | (15) |
|
|
261 | (1) |
|
The Danger of Overfitting the Parameters and the Validation Set |
|
|
261 | (2) |
|
Grid Search with Cross-Validation |
|
|
263 | (12) |
|
Evaluation Metrics and Scoring |
|
|
275 | (27) |
|
Keep the End Goal in Mind |
|
|
275 | (1) |
|
Metrics for Binary Classification |
|
|
276 | (20) |
|
Metrics for Multiclass Classification |
|
|
296 | (3) |
|
|
299 | (1) |
|
Using Evaluation Metrics in Model Selection |
|
|
300 | (2) |
|
|
302 | (3) |
|
6 Algorithm Chains and Pipelines |
|
|
305 | (18) |
|
Parameter Selection with Preprocessing |
|
|
306 | (2) |
|
|
308 | (1) |
|
Using Pipelines in Grid Searches |
|
|
309 | (3) |
|
The General Pipeline Interface |
|
|
312 | (5) |
|
Convenient Pipeline Creation with make_pipeline |
|
|
313 | (1) |
|
Accessing Step Attributes |
|
|
314 | (1) |
|
Accessing Attributes in a Grid-Searched Pipeline |
|
|
315 | (2) |
|
Grid-Searching Preprocessing Steps and Model Parameters |
|
|
317 | (2) |
|
Grid-Searching Which Model To Use |
|
|
319 | (1) |
|
|
320 | (3) |
|
|
323 | (34) |
|
Types of Data Represented as Strings |
|
|
323 | (2) |
|
Example Application: Sentiment Analysis of Movie Reviews |
|
|
325 | (2) |
|
Representing Text Data as a Bag of Words |
|
|
327 | (7) |
|
Applying Bag-of-Words to a Toy Dataset |
|
|
329 | (1) |
|
Bag-of-Words for Movie Reviews |
|
|
330 | (4) |
|
|
334 | (2) |
|
Rescaling the Data with tf-idf |
|
|
336 | (2) |
|
Investigating Model Coefficients |
|
|
338 | (1) |
|
Bag-of-Words with More Than One Word (n-Grams) |
|
|
339 | (5) |
|
Advanced Tokenization, Stemming, and Lemmatization |
|
|
344 | (3) |
|
Topic Modeling and Document Clustering |
|
|
347 | (8) |
|
Latent Dirichlet Allocation |
|
|
348 | (7) |
|
|
355 | (2) |
|
|
357 | (10) |
|
Approaching a Machine Learning Problem |
|
|
357 | (1) |
|
|
358 | (1) |
|
From Prototype to Production |
|
|
359 | (1) |
|
Testing Production Systems |
|
|
359 | (1) |
|
Building Your Own Estimator |
|
|
360 | (1) |
|
|
361 | (5) |
|
|
361 | (1) |
|
Other Machine Learning Frameworks and Packages |
|
|
362 | (1) |
|
Ranking, Recommender Systems, and Other Kinds of Learning |
|
|
363 | (1) |
|
Probabilistic Modeling, Inference, and Probabilistic Programming |
|
|
363 | (1) |
|
|
364 | (1) |
|
Scaling to Larger Datasets |
|
|
364 | (1) |
|
|
365 | (1) |
|
|
366 | (1) |
Index |
|
367 | |