Muutke küpsiste eelistusi

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications [Pehme köide]

4.47/5 (1691 hinnangut Goodreads-ist)
  • Formaat: Paperback / softback, 360 pages, kõrgus x laius: 233x178 mm
  • Ilmumisaeg: 31-May-2022
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1098107969
  • ISBN-13: 9781098107963
Teised raamatud teemal:
  • Pehme köide
  • Hind: 63,19 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 74,34 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 360 pages, kõrgus x laius: 233x178 mm
  • Ilmumisaeg: 31-May-2022
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1098107969
  • ISBN-13: 9781098107963
Teised raamatud teemal:

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they're data dependent, with data varying wildly from one use case to the next. In this book, you'll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

Author Chip Huyen, co-founder of Claypot AI, considers each design decision--such as how to process and create training data, which features to use, how often to retrain models, and what to monitor--in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.

This book will help you tackle scenarios such as:

  • Engineering data and choosing the right metrics to solve a business problem
  • Automating the process for continually developing, evaluating, deploying, and updating models
  • Developing a monitoring system to quickly detect and address issues your models might encounter in production
  • Architecting an ML platform that serves across use cases
  • Developing responsible ML systems

Preface ix
1 Overview of Machine Learning Systems
1(24)
When to Use Machine Learning
3(9)
Machine Learning Use Cases
9(3)
Understanding Machine Learning Systems
12(11)
Machine Learning in Research Versus in Production
12(10)
Machine Learning Systems Versus Traditional Software
22(1)
Summary
23(2)
2 Introduction to Machine Learning Systems Design
25(24)
Business and ML Objectives
26(3)
Requirements for ML Systems
29(3)
Reliability
29(1)
Scalability
30(1)
Maintainability
31(1)
Adaptability
31(1)
Iterative Process
32(3)
Framing ML Problems
35(8)
Types of ML Tasks
36(4)
Objective Functions
40(3)
Mind Versus Data
43(3)
Summary
46(3)
3 Data Engineering Fundamentals
49(32)
Data Sources
50(3)
Data Formats
53(5)
JSON
54(1)
Row-Major Versus Column-Major Format
54(3)
Text Versus Binary Format
57(1)
Data Models
58(9)
Relational Model
59(4)
NoSQL
63(3)
Structured Versus Unstructured Data
66(1)
Data Storage Engines and Processing
67(5)
Transactional and Analytical Processing
67(3)
ETL: Extract, Transform, and Load
70(2)
Modes of Dataflow
72(6)
Data Passing Through Databases
72(1)
Data Passing Through Services
73(1)
Data Passing Through Real-Time Transport
74(4)
Batch Processing Versus Stream Processing
78(1)
Summary
79(2)
4 Training Data
81(38)
Sampling
82(6)
Nonprobability Sampling
83(1)
Simple Random Sampling
84(1)
Stratified Sampling
84(1)
Weighted Sampling
85(1)
Reservoir Sampling
86(1)
Importance Sampling
87(1)
Labeling
88(14)
Hand Labels
88(3)
Natural Labels
91(3)
Handling the Lack of Labels
94(8)
Class Imbalance
102(11)
Challenges of Class Imbalance
103(2)
Handling Class Imbalance
105(8)
Data Augmentation
113(5)
Simple Label-Preserving Transformations
114(1)
Perturbation
114(2)
Data Synthesis
116(2)
Summary
118(1)
5 Feature Engineering
119(30)
Learned Features Versus Engineered Features
120(3)
Common Feature Engineering Operations
123(12)
Handling Missing Values
123(3)
Scaling
126(2)
Discretization
128(1)
Encoding Categorical Features
129(3)
Feature Crossing
132(1)
Discrete and Continuous Positional Embeddings
133(2)
Data Leakage
135(6)
Common Causes for Data Leakage
137(3)
Detecting Data Leakage
140(1)
Engineering Good Features
141(5)
Feature Importance
142(2)
Feature Generalization
144(2)
Summary
146(3)
6 Model Development and Offline Evaluation
149(42)
Model Development and Training
150(28)
Evaluating ML Models
150(6)
Ensembles
156(6)
Experiment Tracking and Versioning
162(6)
Distributed Training
168(4)
AutoML
172(6)
Model Offline Evaluation
178(10)
Baselines
179(2)
Evaluation Methods
181(7)
Summary
188(3)
7 Model Deployment and Prediction Service
191(34)
Machine Learning Deployment Myths
194(3)
Myth 1 You Only Deploy One or Two ML Models at a Time
194(1)
Myth 2 If We Don't Do Anything, Model Performance Remains the Same
195(1)
Myth 3 You Won't Need to Update Your Models as Much
196(1)
Myth 4 Most ML Engineers Don't Need to Worry About Scale
196(1)
Batch Prediction Versus Online Prediction
197(9)
From Batch Prediction to Online Prediction
201(2)
Unifying Batch Pipeline and Streaming Pipeline
203(3)
Model Compression
206(6)
Low-Rank Factorization
206(2)
Knowledge Distillation
208(1)
Pruning
208(1)
Quantization
209(3)
ML on the Cloud and on the Edge
212(11)
Compiling and Optimizing Models for Edge Devices
214(8)
ML in Browsers
222(1)
Summary
223(2)
8 Data Distribution Shifts and Monitoring
225(38)
Causes of ML System Failures
226(11)
Software System Failures
227(2)
ML-Specific Failures
229(8)
Data Distribution Shifts
237(13)
Types of Data Distribution Shifts
237(4)
General Data Distribution Shifts
241(1)
Detecting Data Distribution Shifts
242(6)
Addressing Data Distribution Shifts
248(2)
Monitoring and Observability
250(11)
ML-Specific Metrics
251(5)
Monitoring Toolbox
256(3)
Observability
259(2)
Summary
261(2)
9 Continual Learning and Test in Production
263(30)
Continual Learning
264(17)
Stateless Retraining Versus Stateful Training
265(3)
Why Continual Learning?
268(2)
Continual Learning Challenges
270(4)
Four Stages of Continual Learning
274(5)
How Often to Update Your Models
279(2)
Test in Production
281(10)
Shadow Deployment
282(1)
A/B Testing
283(2)
Canary Release
285(1)
Interleaving Experiments
285(2)
Bandits
287(4)
Summary
291(2)
10 Infrastructure and Tooling for MLOps
293(38)
Storage and Compute
297(5)
Public Cloud Versus Private Data Centers
300(2)
Development Environment
302(9)
Dev Environment Setup
303(3)
Standardizing Dev Environments
306(2)
From Dev to Prod: Containers
308(3)
Resource Management
311(8)
Cron, Schedulers, and Orchestrators
311(3)
Data Science Workflow Management
314(5)
ML Platform
319(8)
Model Deployment
320(1)
Model Store
321(4)
Feature Store
325(2)
Build Versus Buy
327(2)
Summary
329(2)
11 The Human Side of Machine Learning
331(24)
User Experience
331(3)
Ensuring User Experience Consistency
332(1)
Combatting "Mosdy Correct" Predictions
332(2)
Smooth Failing
334(1)
Team Structure
334(5)
Cross-functional Teams Collaboration
335(1)
End-to-End Data Scientists
335(4)
Responsible AI
339(14)
Irresponsible AI: Case Studies
341(6)
A Framework for Responsible AI
347(6)
Summary
353(2)
Epilogue 355(2)
Index 357