Muutke küpsiste eelistusi

Data Management in Machine Learning Systems [Pehme köide]

  • Formaat: Paperback / softback, 173 pages, kõrgus x laius: 235x191 mm, kaal: 333 g
  • Sari: Synthesis Lectures on Data Management
  • Ilmumisaeg: 25-Feb-2019
  • Kirjastus: Morgan & Claypool Publishers
  • ISBN-10: 1681734966
  • ISBN-13: 9781681734965
Teised raamatud teemal:
  • Formaat: Paperback / softback, 173 pages, kõrgus x laius: 235x191 mm, kaal: 333 g
  • Sari: Synthesis Lectures on Data Management
  • Ilmumisaeg: 25-Feb-2019
  • Kirjastus: Morgan & Claypool Publishers
  • ISBN-10: 1681734966
  • ISBN-13: 9781681734965
Teised raamatud teemal:

Large-scale data analytics using machine learning (ML) underpins many modern data-driven applications. ML systems provide means of specifying and executing these ML workloads in an efficient and scalable manner. Data management is at the heart of many ML systems due to data-driven application characteristics, data-centric workload characteristics, and system architectures inspired by classical data management techniques.

In this book, we follow this data-centric view of ML systems and aim to provide a comprehensive overview of data management in ML systems for the end-to-end data science or ML lifecycle. We review multiple interconnected lines of work: (1) ML support in database (DB) systems, (2) DB-inspired ML systems, and (3) ML lifecycle systems. Covered topics include: in-database analytics via query generation and user-defined functions, factorized and statistical-relational learning; optimizing compilers for ML workloads; execution strategies and hardware accelerators; data access methods such as compression, partitioning and indexing; resource elasticity and cloud markets; as well as systems for data preparation for ML, model selection, model management, model debugging, and model serving. Given the rapidly evolving field, we strive for a balance between an up-to-date survey of ML systems, an overview of the underlying concepts and techniques, as well as pointers to open research questions. Hence, this book might serve as a starting point for both systems researchers and developers.

Preface xiii
Acknowledgments xv
1 Introduction
1(6)
1.1 Overview of ML Lifecycle and ML Users
1(2)
1.2 Motivation
3(1)
1.3 Outline and Scope
4(3)
2 ML Through Database Queries and UDFs
7(14)
2.1 Linear Algebra
8(4)
2.2 Iterative Algorithms
12(2)
2.3 Sampling-Based Methods
14(2)
2.4 Discussion
16(2)
2.5 Summary
18(3)
3 Multi-Table ML and Deep Systems Integration
21(12)
3.1 Learning over Joins
22(4)
3.2 Statistical Relational Learning and Non-IID Models
26(3)
3.3 Deeper Integration and Specialized DBMSs
29(3)
3.4 Summary
32(1)
4 Rewrites and Optimization
33(20)
4.1 Optimization Scope
33(3)
4.2 Logical Rewrites and Planning
36(6)
4.3 Physical Rewrites and Operators
42(5)
4.4 Automatic Operator Fusion
47(3)
4.5 Runtime Adaptation
50(2)
4.6 Summary
52(1)
5 Execution Strategies
53(20)
5.1 Data-Parallel Execution
54(4)
5.2 Task-Parallel Execution
58(4)
5.3 Parameter Servers (Model-Parallel Execution)
62(4)
5.4 Hybrid Execution Strategies
66(2)
5.5 Accelerators (GPUs, FPGAs, ASICs)
68(3)
5.6 Summary
71(2)
6 Data Access Methods
73(12)
6.1 Caching and Buffer Pool Management
73(3)
6.2 Compression
76(3)
6.3 NUMA-Aware Partitioning and Replication
79(1)
6.4 Index Structures
80(3)
6.5 Summary
83(2)
7 Resource Heterogeneity and Elasticity
85(16)
7.1 Provisioning, Configuration, and Scheduling
86(4)
7.2 Handling Failures
90(3)
7.3 Working with Markets of Transient Resources
93(5)
7.4 Summary
98(3)
8 Systems for ML Lifecycle Tasks
101(22)
8.1 Data Sourcing and Cleaning for ML
101(4)
8.2 Feature Engineering and Deep Learning
105(5)
8.3 Model Selection and Model Management
110(5)
8.4 Interaction, Visualization, Debugging, and Inspection
115(2)
8.5 Model Deployment and Serving
117(2)
8.6 Benchmarking ML Systems
119(1)
8.7 Summary
120(3)
9 Conclusions
123(4)
Bibliography 127(30)
Authors' Biographies 157