Muutke küpsiste eelistusi

Presto: The Definitive Guide: SQL at Any Scale, On Any Storage, In Any Environment [Pehme köide]

  • Formaat: Paperback / softback, 350 pages, kõrgus x laius: 233x178 mm
  • Ilmumisaeg: 17-Apr-2020
  • Kirjastus: O'Reilly Media
  • ISBN-10: 149204427X
  • ISBN-13: 9781492044277
Teised raamatud teemal:
  • Pehme köide
  • Hind: 85,59 €*
  • * saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
  • See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 350 pages, kõrgus x laius: 233x178 mm
  • Ilmumisaeg: 17-Apr-2020
  • Kirjastus: O'Reilly Media
  • ISBN-10: 149204427X
  • ISBN-13: 9781492044277
Teised raamatud teemal:

Perform fast interactive analytics against different data sources using the Presto high-performance, distributed SQL query engine. With this practical guide, you&;ll learn how to conduct analytics on data where it lives, whether it&;s Hive, Cassandra, a relational database, or a proprietary data store. Analysts, software engineers, and production engineers will learn how to manage, use, and even develop with Presto.

Initially developed by Facebook, open source Presto is now used by Netflix, Airbnb, LinkedIn, Twitter, Uber, and many other companies. Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Presto query can combine data from multiple sources to allow for analytics across your entire organization.

  • Get started: Explore Presto&;s use cases and learn about tools that will help you connect to Presto and query data
  • Go deeper: Learn Presto&;s internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more
  • Put Presto in production: Secure Presto, monitor workloads, tune queries, and connect more applications; learn how other organizations apply Presto
Foreword xiii
Preface xv
Part I Getting Started with Presto
1 Introducing Presto
3(16)
The Problems with Big Data
3(1)
Presto to the Rescue
4(3)
Designed for Performance and Scale
5(1)
SQL-on-Anything
6(1)
Separation of Data Storage and Query Compute Resources
7(1)
Presto Use Cases
7(5)
One SQL Analytics Access Point
7(1)
Access Point to Data Warehouse and Source Systems
8(1)
Provide SQL-Based Access to Anything
9(1)
Federated Queries
10(1)
Semantic Layer for a Virtual Data Warehouse
10(1)
Data Lake Query Engine
11(1)
SQL Conversions and ETL
11(1)
Better Insights Due to Faster Response Times
11(1)
Big Data, Machine Learning, and Artificial Intelligence
12(1)
Other Use Cases
12(1)
Presto Resources
12(4)
Website
12(1)
Documentation
13(1)
Community Chat
13(1)
Source Code, License, and Version
14(1)
Contributing
14(1)
Book Repository
15(1)
Iris Data Set
15(1)
Flight Data Set
16(1)
A Brief History of Presto
16(1)
Conclusion
17(2)
2 Installing and Configuring Presto
19(6)
Trying Presto with the Docker Container
19(1)
Installing from Archive File
20(3)
Java Virtual Machine
20(1)
Python
21(1)
Installation
21(1)
Configuration
22(1)
Adding a Data Source
23(1)
Running Presto
24(1)
Conclusion
24(1)
3 Using Presto
25(18)
Presto Command-Line Interface
25(5)
Getting Started
25(3)
Pagination
28(1)
History
28(1)
Additional Diagnostics
28(1)
Executing Queries
29(1)
Output Formats
30(1)
Ignoring Errors
30(1)
Presto JDBC Driver
30(5)
Downloading and Registering the Driver
32(1)
Establishing a Connection to Presto
32(3)
Presto and ODBC
35(1)
Client Libraries
35(1)
Presto Web UI
35(1)
SQL with Presto
36(4)
Concepts
37(1)
First Examples
37(3)
Conclusion
40(3)
Part II Diving Deeper into Presto
4 Presto Architecture
43(30)
Coordinator and Workers in a Cluster
43(2)
Coordinator
45(1)
Discovery Service
46(1)
Workers
46(1)
Connector-Based Architecture
47(1)
Catalogs, Schemas, and Tables
48(1)
Query Execution Model
48(5)
Query Planning
53(4)
Parsing and Analysis
54(1)
Initial Query Planning
54(3)
Optimization Rules
57(3)
Predicate Pushdown
57(1)
Cross Join Elimination
58(1)
TopN
58(1)
Partial Aggregations
59(1)
Implementation Rules
60(2)
Lateral Join Decorrelation
60(1)
Semi-Join (IN) Decorrelation
61(1)
Cost-Based Optimizer
62(8)
The Cost Concept
62(2)
Cost of the Join
64(1)
Table Statistics
65(1)
Filter Statistics
66(1)
Table Statistics for Partitioned Tables
67(1)
Join Enumeration
68(1)
Broadcast Versus Distributed Joins
68(2)
Working with Table Statistics
70(2)
Presto ANALYZE
70(1)
Gathering Statistics When Writing to Disk
71(1)
Hive ANALYZE
71(1)
Displaying Table Statistics
72(1)
Conclusion
72(1)
5 Production-Ready Deployment
73(12)
Configuration Details
73(1)
Server Configuration
73(2)
Logging
75(1)
Node Configuration
76(1)
JVM Configuration
77(1)
Launcher
77(2)
Cluster Installation
79(1)
RPM Installation
80(2)
Installation Directory Structure
81(1)
Configuration
82(1)
Uninstall Presto
82(1)
Installation in the Cloud
82(1)
Cluster Sizing Considerations
83(1)
Conclusion
84(1)
6 Connectors
85(24)
Configuration
86(1)
RDBMS Connector Example PostgreSQL
87(5)
Query Pushdown
88(2)
Parallelism and Concurrency
90(1)
Other RDBMS Connectors
90(2)
Security
92(1)
Presto TPC-H and TPC-DS Connectors
92(1)
Hive Connector for Distributed Storage Data Sources
93(11)
Apache Hadoop and Hive
94(1)
Hive Connector
95(1)
Hive-Style Table Format
96(1)
Managed and External Tables
97(1)
Partitioned Data
98(2)
Loading Data
100(2)
File Formats and Compression
102(1)
MinIO Example
103(1)
Non-Relational Data Sources
104(1)
Presto JMX Connector
104(2)
Black Hole Connector
106(1)
Memory Connector
107(1)
Other Connectors
107(1)
Conclusion
108(1)
7 Advanced Connector Examples
109(22)
Connecting to HBase with Phoenix
109(1)
Key-Value Store Connector Example: Accumulo
110(7)
Using the Presto Accumulo Connector
113(2)
Predicate Pushdown in Accumulo
115(2)
Apache Cassandra Connector
117(1)
Streaming System Connector Example: Kafka
118(2)
Document Store Connector Example: Elasticsearch
120(2)
Overview
120(1)
Configuration and Usage
121(1)
Query Processing
121(1)
Full-Text Search
122(1)
Summary
122(1)
Query Federation in Presto
122(7)
Extract, Transform, Load and Federated Queries
129(1)
Conclusion
129(2)
8 Using SQL in Presto
131(38)
Presto Statements
132(2)
Presto System Tables
134(2)
Catalogs
136(1)
Schemas
137(1)
Information Schema
138(1)
Tables
139(6)
Table and Column Properties
141(1)
Copying an Existing Table
142(1)
Creating a New Table from Query Results
143(1)
Modifying a Table
144(1)
Deleting a Table
144(1)
Table Limitations from Connectors
144(1)
Views
145(1)
Session Information and Configuration
146(1)
Data Types
147(8)
Collection Data Types
149(1)
Temporal Data Types
150(4)
Type Casting
154(1)
SELECT Statement Basics
155(2)
WHERE Clause
157(1)
GROUP BY and HAVING Clauses
158(1)
ORDER BY and LIMIT Clauses
159(1)
JOIN Statements
160(1)
UNION, INTERSECT, and EXCEPT Clauses
161(1)
Grouping Operations
162(2)
WITH Clause
164(1)
Subqueries
165(2)
Scalar Subquery
165(1)
EXISTS Subquery
166(1)
Quantified Subquery
166(1)
Deleting Data from a Table
167(1)
Conclusion
167(2)
9 Advanced SQL
169(30)
Functions and Operators Introduction
169(1)
Scalar Functions and Operators
170(1)
Boolean Operators
171(1)
Logical Operators
172(1)
Range Selection with the BETWEEN Statement
173(1)
Value Detection with IS (NOT) NULL
174(1)
Mathematical Functions and Operators
174(1)
Trigonometric Functions
175(1)
Constant and Random Functions
176(1)
String Functions and Operators
176(1)
Strings and Maps
177(1)
Unicode
178(1)
Regular Expressions
179(3)
Unnesting Complex Data Types
182(1)
JSON Functions
183(1)
Date and Time Functions and Operators
184(2)
Histograms
186(1)
Aggregate Functions
187(3)
Map Aggregate Functions
187(2)
Approximate Aggregate Functions
189(1)
Window Functions
190(2)
Lambda Expressions
192(1)
Geospatial Functions
193(1)
Prepared Statements
194(2)
Conclusion
196(3)
Part III Presto in Real-World Uses
10 Security
199(30)
Authentication
200(3)
Password and LDAP Authentication
201(2)
Authorization
203(6)
System Access Control
204(3)
Connector Access Control
207(2)
Encryption
209(8)
Encrypting Presto Client-to-Coordinator Communication
211(3)
Creating Java Keystores and Java Truststores
214(2)
Encrypting Communication Within the Presto Cluster
216(1)
Certificate Authority Versus Self-Signed Certificates
217(2)
Certificate Authentication
219(3)
Kerberos
222(2)
Prerequisites
222(1)
Kerberos Client Authentication
222(1)
Cluster Internal Kerberos
223(1)
Data Source Access and Configuration for Security
224(1)
Kerberos Authentication with the Hive Connector
225(2)
Hive Metastore Thrift Service Authentication
226(1)
HDFS Authentication
227(1)
Cluster Separation
227(1)
Conclusion
227(2)
11 Integrating Presto with Other Tools
229(10)
Queries, Visualizations, and More with Apache Superset
229(1)
Performance Improvements with RubiX
230(1)
Workflows with Apache Airflow
231(1)
Embedded Presto Example: Amazon Athena
231(4)
Starburst Enterprise Presto
235(1)
Other Integration Examples
235(1)
Custom Integrations
236(1)
Conclusion
236(3)
12 Presto in Production
239(28)
Monitoring with the Presto Web UI
239(12)
Cluster-Level Details
240(1)
Query List
241(3)
Query Details View
244(7)
Tuning Presto SQL Queries
251(3)
Memory Management
254(4)
Task Concurrency
258(1)
Worker Scheduling
258(1)
Scheduling Splits per Task and per Node
259(1)
Local Scheduling
259(1)
Network Data Exchange
259(1)
Concurrency
260(1)
Buffer Sizes
260(1)
Tuning Java Virtual Machine
260(2)
Resource Groups
262(4)
Resource Group Definition
264(1)
Scheduling Policy
265(1)
Selector Rules Definition
265(1)
Conclusion
266(1)
13 Real-World Examples
267(6)
Deployment and Runtime Platforms
267(1)
Cluster Sizing
268(2)
Hadoop/Hive Migration Use Case
270(1)
Other Data Sources
270(1)
Users and Traffic
271(1)
Conclusion
272(1)
14 Conclusion
273(2)
Index 275
Matt Fuller is a cofounder at Starburst, the Presto Company. Prior to founding Starburst, Matt was a director of engineering at Teradata, where he worked to build the new Center for Hadoop division within the company. As a major part of this, Matt worked to bring Presto to the enterprise market. Matt has managed a team contributing to the open source Presto project since 2015 and led the internal Presto product roadmap. Starburst was later formed from this team at Teradata.

Before Teradata, Matt was an early engineer at Vertica, where he co-built the query optimizer. Matt is also a Very Large Databases (VLDB) published author and has US patents in the database management systems space.

Manfred Moser is a community advocate, writer, trainer and software engineer at Starburst. Manfred has a long history of developing and advocating open source software. He is an Apache Maven committer, wrote the Hudson book and others, and continues to be active in the open source community and his projects. He is a seasoned trainer and conference presenter for CI/CD, Cloud Native, Agile and other softwaredevelopment tools and processes, having trained well over 20,000 developers for companies including Walmart Labs, Sonatype, and Telus.

His database background includes designing databases and related applications in the RDBMS space and working as business intelligence consultant wrangling thousands of lines of SQL by hand. He is glad he can use Presto now, and is spreading the word about how great Presto is.

Martin Traverso is the cofounder of the Presto Software Foundation and CTO at Starburst. Prior to Starburst, Martin worked as a software engineer at Facebook where he saw the need for fast interactive SQL analytics. Martin and three other engineers worked to create what became Presto. Martin led the Presto development team and in the spring of 2013 Presto was rolled out into production, later made opensource in the fall of 2013. Since then, Presto has gained wide adoption both internal and external to Facebook.

Prior to Facebook, Martin was an architect at Proofpoint and Ning, where he led development and architecture design of numerous complex enterprise and social network applications.