Preface |
|
xi | |
Automated Machine Learning (AutoML) |
|
xii | |
A Note to Instructors |
|
xii | |
Acknowledgments |
|
xiii | |
Book Outline |
|
xiii | |
Dataset Download |
|
xvii | |
Copyrights |
|
xvii | |
|
SECTION I WHY USE AUTOMATED MACHINE LEARNING? |
|
|
|
1 What Is Machine Learning? |
|
|
3 | (8) |
|
|
3 | (1) |
|
1.2 Machine Learning Is Everywhere |
|
|
4 | (2) |
|
1.3 What Is Machine Learning? |
|
|
6 | (2) |
|
1.4 Data for Machine Learning |
|
|
8 | (2) |
|
|
10 | (1) |
|
2 Automating Machine Learning |
|
|
11 | (14) |
|
2.1 What Is Automated Machine Learning? |
|
|
12 | (2) |
|
2.2 What Automated Machine Learning Is Not |
|
|
14 | (1) |
|
2.3 Available Tools and Platforms |
|
|
15 | (2) |
|
2.4 Eight Criteria for AutoML Excellence |
|
|
17 | (3) |
|
2.5 How Do the Fundamental Principles of Machine Learning and Artificial Intelligence Transferto AutoML? A Point-by-Point Evaluation |
|
|
20 | (1) |
|
|
21 | (4) |
|
SECTION II DEFINING PROJECT OBJECTIVES |
|
|
|
3 Specify Business Problem |
|
|
25 | (6) |
|
3.1 Why Start with a Business Problem? |
|
|
25 | (1) |
|
|
26 | (3) |
|
|
29 | (2) |
|
4 Acquire Subject Matter Expertise |
|
|
31 | (2) |
|
4.1 Importance of Subject Matter Expertise |
|
|
31 | (1) |
|
|
32 | (1) |
|
5 Define Prediction Target |
|
|
33 | (4) |
|
5.1 What Is a Prediction Target? |
|
|
33 | (2) |
|
5.2 How Is the Target Important for Machine Learning? |
|
|
35 | (1) |
|
|
36 | (1) |
|
6 Decide On Unit Of Analysis |
|
|
37 | (3) |
|
6.1 What Is a Unit of Analysis? |
|
|
37 | (1) |
|
6.2 How to Determine Unit of Analysis |
|
|
38 | (1) |
|
|
39 | (1) |
|
7 Success, Risk, And Continuation |
|
|
40 | (11) |
|
7.1 Identify Success Criteria |
|
|
40 | (1) |
|
|
41 | (3) |
|
7.3 Decide Whether to Continue |
|
|
44 | (1) |
|
|
45 | (6) |
|
SECTION III ACQUIRE AND INTEGRATE DATA |
|
|
|
8 Accessing And Storing Data |
|
|
51 | (8) |
|
8.1 Track Down Relevant Data |
|
|
51 | (3) |
|
8.2 Examine Data and Remove Columns |
|
|
54 | (1) |
|
|
55 | (3) |
|
|
58 | (1) |
|
|
59 | (11) |
|
|
60 | (9) |
|
|
69 | (1) |
|
|
70 | (10) |
|
10.1 Splitting and Extracting New Columns |
|
|
70 | (8) |
|
10.1.1 IF-THEN Statements and One-hot Encoding |
|
|
70 | (2) |
|
10.1.2 Regular Expressions (RegEx) |
|
|
72 | (6) |
|
|
78 | (1) |
|
|
79 | (1) |
|
|
80 | (8) |
|
|
80 | (4) |
|
|
84 | (3) |
|
|
87 | (1) |
|
12 Data Reduction And Splitting |
|
|
88 | (9) |
|
|
88 | (3) |
|
|
91 | (1) |
|
|
92 | (2) |
|
|
94 | (3) |
|
|
|
|
97 | (6) |
|
|
97 | (5) |
|
|
102 | (1) |
|
14 Feature Understanding And Selection |
|
|
103 | (11) |
|
14.1 Descriptive Statistics |
|
|
103 | (4) |
|
|
107 | (3) |
|
14.3 Evaluations of Feature Content |
|
|
110 | (2) |
|
|
112 | (1) |
|
|
113 | (1) |
|
15 Build Candidate Models |
|
|
114 | (20) |
|
15.1 Starting the Process |
|
|
114 | (2) |
|
|
116 | (5) |
|
15.3 Starting the Analytical Process |
|
|
121 | (6) |
|
15.4 Model Selection Process |
|
|
127 | (6) |
|
15.4.1 Tournament Round 1:32% Sample |
|
|
128 | (3) |
|
15.4.2 Tournament Round 2:64% Sample |
|
|
131 | (1) |
|
15.4.3 Tournament Round 3: Cross Validation |
|
|
131 | (1) |
|
15.4.4 Tournament Round 4: Blending |
|
|
132 | (1) |
|
|
133 | (1) |
|
16 Understanding The Process |
|
|
134 | (23) |
|
16.1 Learning Curves and Speed |
|
|
134 | (4) |
|
|
138 | (1) |
|
|
139 | (15) |
|
16.3.1 Numeric Data Cleansing (Imputation) |
|
|
140 | (2) |
|
|
142 | (1) |
|
|
143 | (4) |
|
16.3.4 Ordinal Encoding - |
|
|
147 | (2) |
|
16.3.5 Matrix of Word-gram Occurrences |
|
|
149 | (2) |
|
|
151 | (3) |
|
16.4 Hyperparameter Optimization (Advanced Content) |
|
|
154 | (2) |
|
|
156 | (1) |
|
17 Evaluate Model Performance |
|
|
157 | (23) |
|
|
157 | (2) |
|
17.2 A Sample Algorithm and Model |
|
|
159 | (5) |
|
|
164 | (12) |
|
17.4 Using the Lift Chart and Profit Curve for Business Decisions |
|
|
176 | (3) |
|
|
179 | (1) |
|
|
180 | (11) |
|
|
180 | (5) |
|
18.2 Prioritizing Modeling Criteria and Selecting a Model |
|
|
185 | (2) |
|
|
187 | (4) |
|
SECTION V INTERPRET AND COMMUNICATE |
|
|
|
|
191 | (15) |
|
19.1 Feature Impacts on Target |
|
|
191 | (1) |
|
19.2 The Overall Impact of Features on the Target without Consideration of Other Features |
|
|
192 | (1) |
|
19.3 The Overall Impact of a Feature Adjusted for the Impact of Other Features |
|
|
193 | (1) |
|
19.4 The Directional Impact of Features on Target |
|
|
194 | (1) |
|
19.5 The Partial Impact of Features on Target |
|
|
195 | (3) |
|
19.6 The Power of Language |
|
|
198 | (3) |
|
|
201 | (2) |
|
19.8 Prediction Explanations |
|
|
203 | (2) |
|
|
205 | (1) |
|
20 Communicate Model Insights |
|
|
206 | (15) |
|
|
207 | (2) |
|
20.2 Business Problem First |
|
|
209 | (1) |
|
20.3 Pre-processing and Model Quality Metrics |
|
|
210 | (3) |
|
20.4 Areas Where the Model Struggles |
|
|
213 | (1) |
|
20.5 Most Predictive Features |
|
|
214 | (1) |
|
20.6 Not All Features Are Created Equal |
|
|
214 | (3) |
|
20.7 Recommended Business Actions |
|
|
217 | (1) |
|
|
218 | (3) |
|
SECTION VI IMPLEMENT, DOCUMENT AND MAINTAIN |
|
|
|
21 Set Up Prediction System |
|
|
221 | (7) |
|
|
221 | (1) |
|
21.2 Choose Deployment Strategy |
|
|
222 | (5) |
|
|
227 | (1) |
|
22 Document Modeling Process For Reproducibility |
|
|
228 | (2) |
|
|
228 | (1) |
|
|
229 | (1) |
|
23 Create Model Monitoring And Maintenance Plan |
|
|
230 | (3) |
|
|
230 | (1) |
|
|
230 | (2) |
|
|
232 | (1) |
|
24 Seven Types Of Target Leakage In Machine Learning And An Exercise |
|
|
233 | (7) |
|
24.1 Types of Target Lea kage |
|
|
233 | (3) |
|
24.2 A Hands-on Exercise in Detecting Target Leakage |
|
|
236 | (3) |
|
|
239 | (1) |
|
|
240 | (19) |
|
25.1 An Example of Time-Aware Modeling |
|
|
240 | (18) |
|
|
240 | (1) |
|
|
241 | (1) |
|
25.1.3 Initialize Analysis |
|
|
241 | (1) |
|
25.1.4 Time-Aware Modeling Background |
|
|
241 | (3) |
|
|
244 | (3) |
|
25.1.6 Model Building and Residuals |
|
|
247 | (1) |
|
|
247 | (2) |
|
25.1.8 Selecting and Examining a Model |
|
|
249 | (4) |
|
25.1.9 A Small Detour into Residuals |
|
|
253 | (3) |
|
|
256 | (1) |
|
25.1.11 Learning about Avocado Price Drivers |
|
|
256 | (2) |
|
|
258 | (1) |
|
|
259 | (18) |
|
26.1 The Assumptions of Time-Series Machine Learning |
|
|
259 | (1) |
|
26.2 A Hands-on Exercise in Time-Series Analysis |
|
|
260 | (15) |
|
|
260 | (2) |
|
|
262 | (1) |
|
26.2.3 Specify Time Unit and Generate Features |
|
|
262 | (6) |
|
26.2.3 Examine Candidate Models |
|
|
268 | (2) |
|
26.2.4 Digging into the Preferred Model |
|
|
270 | (3) |
|
|
273 | (2) |
|
|
275 | (2) |
|
|
277 | (31) |
|
A.1 Diabetes Patients Readmissions |
|
|
277 | (3) |
|
|
277 | (1) |
|
|
277 | (1) |
|
|
277 | (3) |
|
|
280 | (1) |
|
|
280 | (1) |
|
|
280 | (3) |
|
|
280 | (1) |
|
|
281 | (1) |
|
|
281 | (2) |
|
|
283 | (1) |
|
|
283 | (4) |
|
|
283 | (1) |
|
|
284 | (1) |
|
|
284 | (3) |
|
|
287 | (1) |
|
|
287 | (2) |
|
|
287 | (1) |
|
|
287 | (1) |
|
|
287 | (1) |
|
|
288 | (1) |
|
|
289 | (1) |
|
A.5 Student Grades Portuguese |
|
|
289 | (4) |
|
|
289 | (1) |
|
|
289 | (1) |
|
|
289 | (1) |
|
|
290 | (3) |
|
|
293 | (1) |
|
|
293 | (7) |
|
|
293 | (1) |
|
|
294 | (1) |
|
|
294 | (6) |
|
|
300 | (1) |
|
A.7 College Starting Salaries |
|
|
300 | (1) |
|
|
300 | (1) |
|
|
300 | (1) |
|
|
300 | (1) |
|
|
301 | (1) |
|
|
301 | (1) |
|
|
301 | (4) |
|
|
301 | (1) |
|
|
302 | (1) |
|
|
302 | (2) |
|
|
304 | (1) |
|
|
305 | (1) |
|
A.9 Avocadopocalypse Now? |
|
|
305 | (3) |
|
|
305 | (1) |
|
|
306 | (1) |
|
|
306 | (1) |
|
|
307 | (1) |
|
|
307 | (1) |
|
Appendix B Optimization and Sorting Measures |
|
|
308 | (3) |
|
Appendix C More on Cross Validation |
|
|
311 | (4) |
References |
|
315 | (4) |
Index |
|
319 | |