Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Presto: The Definitive Guide: SQL at Any Scale, On Any Storage, In Any Environment [Pehme köide]

4.29/5 (48 hinnangut Goodreads-ist)

Martin Traverso, Matt Fuller

Formaat: Paperback / softback, 350 pages, kõrgus x laius: 233x178 mm
Ilmumisaeg: 17-Apr-2020
Kirjastus: O'Reilly Media
ISBN-10: 149204427X
ISBN-13: 9781492044277

Teised raamatud teemal:

Database programming

Pehme köide
Hind: 85,59 €*
* saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Lisa soovinimekirja

Formaat: Paperback / softback, 350 pages, kõrgus x laius: 233x178 mm
Ilmumisaeg: 17-Apr-2020
Kirjastus: O'Reilly Media
ISBN-10: 149204427X
ISBN-13: 9781492044277

Teised raamatud teemal:

Database programming

Püsilink: https://www.kriso.ee/db/9781492044277.html

Märksõnad:

Perform fast interactive analytics against different data sources using the Presto high-performance, distributed SQL query engine. With this practical guide, you&;ll learn how to conduct analytics on data where it lives, whether it&;s Hive, Cassandra, a relational database, or a proprietary data store. Analysts, software engineers, and production engineers will learn how to manage, use, and even develop with Presto.

Initially developed by Facebook, open source Presto is now used by Netflix, Airbnb, LinkedIn, Twitter, Uber, and many other companies. Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Presto query can combine data from multiple sources to allow for analytics across your entire organization.

Get started: Explore Presto&;s use cases and learn about tools that will help you connect to Presto and query data
Go deeper: Learn Presto&;s internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more
Put Presto in production: Secure Presto, monitor workloads, tune queries, and connect more applications; learn how other organizations apply Presto

Foreword

xiii

Preface

Part I Getting Started with Presto

1 Introducing Presto

(16)

The Problems with Big Data

(1)

Presto to the Rescue

(3)

Designed for Performance and Scale

(1)

SQL-on-Anything

(1)

Separation of Data Storage and Query Compute Resources

(1)

Presto Use Cases

(5)

One SQL Analytics Access Point

(1)

Access Point to Data Warehouse and Source Systems

(1)

Provide SQL-Based Access to Anything

(1)

Federated Queries

(1)

Semantic Layer for a Virtual Data Warehouse

(1)

Data Lake Query Engine

(1)

SQL Conversions and ETL

(1)

Better Insights Due to Faster Response Times

(1)

Big Data, Machine Learning, and Artificial Intelligence

(1)

Other Use Cases

(1)

Presto Resources

(4)

Website

(1)

Documentation

(1)

Community Chat

(1)

Source Code, License, and Version

(1)

Contributing

(1)

Book Repository

(1)

Iris Data Set

(1)

Flight Data Set

(1)

A Brief History of Presto

(1)

Conclusion

(2)

2 Installing and Configuring Presto

(6)

Trying Presto with the Docker Container

(1)

Installing from Archive File

(3)

Java Virtual Machine

(1)

Python

(1)

Installation

(1)

Configuration

(1)

Adding a Data Source

(1)

Running Presto

(1)

Conclusion

(1)

3 Using Presto

(18)

Presto Command-Line Interface

(5)

Getting Started

(3)

Pagination

(1)

History

(1)

Additional Diagnostics

(1)

Executing Queries

(1)

Output Formats

(1)

Ignoring Errors

(1)

Presto JDBC Driver

(5)

Downloading and Registering the Driver

(1)

Establishing a Connection to Presto

(3)

Presto and ODBC

(1)

Client Libraries

(1)

Presto Web UI

(1)

SQL with Presto

(4)

Concepts

(1)

First Examples

(3)

Conclusion

(3)

Part II Diving Deeper into Presto

4 Presto Architecture

(30)

Coordinator and Workers in a Cluster

(2)

Coordinator

(1)

Discovery Service

(1)

Workers

(1)

Connector-Based Architecture

(1)

Catalogs, Schemas, and Tables

(1)

Query Execution Model

(5)

Query Planning

(4)

Parsing and Analysis

(1)

Initial Query Planning

(3)

Optimization Rules

(3)

Predicate Pushdown

(1)

Cross Join Elimination

(1)

TopN

(1)

Partial Aggregations

(1)

Implementation Rules

(2)

Lateral Join Decorrelation

(1)

Semi-Join (IN) Decorrelation

(1)

Cost-Based Optimizer

(8)

The Cost Concept

(2)

Cost of the Join

(1)

Table Statistics

(1)

Filter Statistics

(1)

Table Statistics for Partitioned Tables

(1)

Join Enumeration

(1)

Broadcast Versus Distributed Joins

(2)

Working with Table Statistics

(2)

Presto ANALYZE

(1)

Gathering Statistics When Writing to Disk

(1)

Hive ANALYZE

(1)

Displaying Table Statistics

(1)

Conclusion

(1)

5 Production-Ready Deployment

(12)

Configuration Details

(1)

Server Configuration

(2)

Logging

(1)

Node Configuration

(1)

JVM Configuration

(1)

Launcher

(2)

Cluster Installation

(1)

RPM Installation

(2)

Installation Directory Structure

(1)

Configuration

(1)

Uninstall Presto

(1)

Installation in the Cloud

(1)

Cluster Sizing Considerations

(1)

Conclusion

(1)

6 Connectors

(24)

Configuration

(1)

RDBMS Connector Example PostgreSQL

(5)

Query Pushdown

(2)

Parallelism and Concurrency

(1)

Other RDBMS Connectors

(2)

Security

(1)

Presto TPC-H and TPC-DS Connectors

(1)

Hive Connector for Distributed Storage Data Sources

(11)

Apache Hadoop and Hive

(1)

Hive Connector

(1)

Hive-Style Table Format

(1)

Managed and External Tables

(1)

Partitioned Data

(2)

Loading Data

100

(2)

File Formats and Compression

102

(1)

MinIO Example

103

(1)

Non-Relational Data Sources

104

(1)

Presto JMX Connector

104

(2)

Black Hole Connector

106

(1)

Memory Connector

107

(1)

Other Connectors

107

(1)

Conclusion

108

(1)

7 Advanced Connector Examples

109

(22)

Connecting to HBase with Phoenix

109

(1)

Key-Value Store Connector Example: Accumulo

110

(7)

Using the Presto Accumulo Connector

113

(2)

Predicate Pushdown in Accumulo

115

(2)

Apache Cassandra Connector

117

(1)

Streaming System Connector Example: Kafka

118

(2)

Document Store Connector Example: Elasticsearch

120

(2)

Overview

120

(1)

Configuration and Usage

121

(1)

Query Processing

121

(1)

Full-Text Search

122

(1)

Summary

122

(1)

Query Federation in Presto

122

(7)

Extract, Transform, Load and Federated Queries

129

(1)

Conclusion

129

(2)

8 Using SQL in Presto

131

(38)

Presto Statements

132

(2)

Presto System Tables

134

(2)

Catalogs

136

(1)

Schemas

137

(1)

Information Schema

138

(1)

Tables

139

(6)

Table and Column Properties

141

(1)

Copying an Existing Table

142

(1)

Creating a New Table from Query Results

143

(1)

Modifying a Table

144

(1)

Deleting a Table

144

(1)

Table Limitations from Connectors

144

(1)

Views

145

(1)

Session Information and Configuration

146

(1)

Data Types

147

(8)

Collection Data Types

149

(1)

Temporal Data Types

150

(4)

Type Casting

154

(1)

SELECT Statement Basics

155

(2)

WHERE Clause

157

(1)

GROUP BY and HAVING Clauses

158

(1)

ORDER BY and LIMIT Clauses

159

(1)

JOIN Statements

160

(1)

UNION, INTERSECT, and EXCEPT Clauses

161

(1)

Grouping Operations

162

(2)

WITH Clause

164

(1)

Subqueries

165

(2)

Scalar Subquery

165

(1)

EXISTS Subquery

166

(1)

Quantified Subquery

166

(1)

Deleting Data from a Table

167

(1)

Conclusion

167

(2)

9 Advanced SQL

169

(30)

Functions and Operators Introduction

169

(1)

Scalar Functions and Operators

170

(1)

Boolean Operators

171

(1)

Logical Operators

172

(1)

Range Selection with the BETWEEN Statement

173

(1)

Value Detection with IS (NOT) NULL

174

(1)

Mathematical Functions and Operators

174

(1)

Trigonometric Functions

175

(1)

Constant and Random Functions

176

(1)

String Functions and Operators

176

(1)

Strings and Maps

177

(1)

Unicode

178

(1)

Regular Expressions

179

(3)

Unnesting Complex Data Types

182

(1)

JSON Functions

183

(1)

Date and Time Functions and Operators

184

(2)

Histograms

186

(1)

Aggregate Functions

187

(3)

Map Aggregate Functions

187

(2)

Approximate Aggregate Functions

189

(1)

Window Functions

190

(2)

Lambda Expressions

192

(1)

Geospatial Functions

193

(1)

Prepared Statements

194

(2)

Conclusion

196

(3)

Part III Presto in Real-World Uses

10 Security

199

(30)

Authentication

200

(3)

Password and LDAP Authentication

201

(2)

Authorization

203

(6)

System Access Control

204

(3)

Connector Access Control

207

(2)

Encryption

209

(8)

Encrypting Presto Client-to-Coordinator Communication

211

(3)

Creating Java Keystores and Java Truststores

214

(2)

Encrypting Communication Within the Presto Cluster

216

(1)

Certificate Authority Versus Self-Signed Certificates

217

(2)

Certificate Authentication

219

(3)

Kerberos

222

(2)

Prerequisites

222

(1)

Kerberos Client Authentication

222

(1)

Cluster Internal Kerberos

223

(1)

Data Source Access and Configuration for Security

224

(1)

Kerberos Authentication with the Hive Connector

225

(2)

Hive Metastore Thrift Service Authentication

226

(1)

HDFS Authentication

227

(1)

Cluster Separation

227

(1)

Conclusion

227

(2)

11 Integrating Presto with Other Tools

229

(10)

Queries, Visualizations, and More with Apache Superset

229

(1)

Performance Improvements with RubiX

230

(1)

Workflows with Apache Airflow

231

(1)

Embedded Presto Example: Amazon Athena

231

(4)

Starburst Enterprise Presto

235

(1)

Other Integration Examples

235

(1)

Custom Integrations

236

(1)

Conclusion

236

(3)

12 Presto in Production

239

(28)

Monitoring with the Presto Web UI

239

(12)

Cluster-Level Details

240

(1)

Query List

241

(3)

Query Details View

244

(7)

Tuning Presto SQL Queries

251

(3)

Memory Management

254

(4)

Task Concurrency

258

(1)

Worker Scheduling

258

(1)

Scheduling Splits per Task and per Node

259

(1)

Local Scheduling

259

(1)

Network Data Exchange

259

(1)

Concurrency

260

(1)

Buffer Sizes

260

(1)

Tuning Java Virtual Machine

260

(2)

Resource Groups

262

(4)

Resource Group Definition

264

(1)

Scheduling Policy

265

(1)

Selector Rules Definition

265

(1)

Conclusion

266

(1)

13 Real-World Examples

267

(6)

Deployment and Runtime Platforms

267

(1)

Cluster Sizing

268

(2)

Hadoop/Hive Migration Use Case

270

(1)

Other Data Sources

270

(1)

Users and Traffic

271

(1)

Conclusion

272

(1)

14 Conclusion

273

(2)

Index

275

Matt Fuller is a cofounder at Starburst, the Presto Company. Prior to founding Starburst, Matt was a director of engineering at Teradata, where he worked to build the new Center for Hadoop division within the company. As a major part of this, Matt worked to bring Presto to the enterprise market. Matt has managed a team contributing to the open source Presto project since 2015 and led the internal Presto product roadmap. Starburst was later formed from this team at Teradata.

Before Teradata, Matt was an early engineer at Vertica, where he co-built the query optimizer. Matt is also a Very Large Databases (VLDB) published author and has US patents in the database management systems space.

Manfred Moser is a community advocate, writer, trainer and software engineer at Starburst. Manfred has a long history of developing and advocating open source software. He is an Apache Maven committer, wrote the Hudson book and others, and continues to be active in the open source community and his projects. He is a seasoned trainer and conference presenter for CI/CD, Cloud Native, Agile and other softwaredevelopment tools and processes, having trained well over 20,000 developers for companies including Walmart Labs, Sonatype, and Telus.

His database background includes designing databases and related applications in the RDBMS space and working as business intelligence consultant wrangling thousands of lines of SQL by hand. He is glad he can use Presto now, and is spreading the word about how great Presto is.

Martin Traverso is the cofounder of the Presto Software Foundation and CTO at Starburst. Prior to Starburst, Martin worked as a software engineer at Facebook where he saw the need for fast interactive SQL analytics. Martin and three other engineers worked to create what became Presto. Martin led the Presto development team and in the spring of 2013 Presto was rolled out into production, later made opensource in the fall of 2013. Since then, Presto has gained wide adoption both internal and external to Facebook.

Prior to Facebook, Martin was an architect at Proofpoint and Ning, where he led development and architecture design of numerous complex enterprise and social network applications.

Presto: The Definitive Guide: SQL at Any Scale, On Any Storage, In Any Environment [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv