Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Designing Cloud Data Platforms [Pehme köide]

4.37/5 (132 hinnangut Goodreads-ist)

Lynda Partner, Danil Zburivsky

Formaat: Paperback / softback, 336 pages, kõrgus x laius x paksus: 235x187x24 mm, kaal: 600 g
Ilmumisaeg: 11-Jun-2021
Kirjastus: Manning Publications
ISBN-10: 1617296449
ISBN-13: 9781617296444

Teised raamatud teemal:

Computing & information technology - (Hetkel poes: 25 nimetust)

Pehme köide
Hind: 67,59 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 336 pages, kõrgus x laius x paksus: 235x187x24 mm, kaal: 600 g
Ilmumisaeg: 11-Jun-2021
Kirjastus: Manning Publications
ISBN-10: 1617296449
ISBN-13: 9781617296444

Teised raamatud teemal:

Computing & information technology - (Hetkel poes: 25 nimetust)

Püsilink: https://www.kriso.ee/db/9781617296444.html

Märksõnad:

Well-designed pipelines, storage systems, and APIs eliminate the complicated scaling and maintenance required with on-prem data centers. Once you learn the patterns for designing cloud data platforms, you'll maximize performance no matter which cloud vendor you use. In Designing cloud data platforms, Danil Zburivsky and Lynda Partner reveal a six-layer approach that increases flexibility and reduces costs. Discover patterns for ingesting data from a variety of sources, then learn to harness pre-built services provided by cloud vendors.

In Designing Cloud Data Platforms, Danil Zburivsky and Lynda Partner reveal a six-layer approach that increases flexibility and reduces costs. Discover patterns for ingesting data from a variety of sources, then learn to harness pre-built services provided by cloud vendors.

Summary
Centralized data warehouses, the long-time defacto standard for housing data for analytics, are rapidly giving way to multi-faceted cloud data platforms. Companies that embrace modern cloud data platforms benefit from an integrated view of their business using all of their data and can take advantage of advanced analytic practices to drive predictions and as yet unimagined data services. Designing Cloud Data Platforms is a hands-on guide to envisioning and designing a modern scalable data platform that takes full advantage of the flexibility of the cloud. As you read, you'll learn the core components of a cloud data platform design, along with the role of key technologies like Spark and Kafka Streams. You'll also explore setting up processes to manage cloud-based data, keep it secure, and using advanced analytic and BI tools to analyze it.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Well-designed pipelines, storage systems, and APIs eliminate the complicated scaling and maintenance required with on-prem data centers. Once you learn the patterns for designing cloud data platforms, you'll maximize performance no matter which cloud vendor you use.

About the book
In Designing Cloud Data Platforms, Danil Zburivsky and Lynda Partner reveal a six-layer approach that increases flexibility and reduces costs. Discover patterns for ingesting data from a variety of sources, then learn to harness pre-built services provided by cloud vendors.

What's inside
    Best practices for structured and unstructured data sets
    Cloud-ready machine learning tools
    Metadata and real-time analytics
    Defensive architecture, access, and security

About the reader
For data professionals familiar with the basics of cloud computing, and Hadoop or Spark.

About the author
Danil Zburivsky has over 10 years of experience designing and supporting large-scale data infrastructure for enterprises across the globe. Lynda Partner is the VP of Analytics-as-a-Service at Pythian, and has been on the business side of data for over 20 years.

Table of Contents
1 Introducing the data platform
2 Why a data platform and not just a data warehouse
3 Getting bigger and leveraging the Big 3: Amazon, Microsoft Azure, and Google
4 Getting data into the platform
5 Organizing and processing data
6 Real-time data processing and analytics
7 Metadata layer architecture
8 Schema management
9 Data access and security
10 Fueling business value with data platforms

preface

acknowledgments

xiii

About this book

About the authors

xviii

About the cover illustration

xix

1 Introducing the data platform

(17)

1.1 The trends behind the change from data warehouses to data platforms

(1)

1.2 Data warehouses struggle with data variety, volume, and velocity

(3)

Variety

(1)

Volume

(1)

Velocity

(1)

All the V's at once

(1)

1.3 Data lakes to the rescue?

(1)

1.4 Along came the cloud

(2)

1.5 Cloud, data lakes, and data warehouses: The emergence of cloud data platforms

(1)

1.6 Building blocks of a cloud data platform

(4)

Ingestion layer

(1)

Storage layer

(1)

Processing layer

(1)

Serving layer

(1)

1.7 How the cloud data platform deals with the three V's

(2)

Variety

(1)

Volume

(1)

Velocity

(1)

Two more V's

(1)

1.8 Common use cases

(2)

2 Why a data platform and not just a data warehouse

(19)

2.1 Cloud data platforms and cloud data warehouses: The practical aspects

(5)

A closer look at the data sources

(2)

An example cloud data warehouse-only architecture

(1)

An example cloud data platform architecture

(1)

2.2 Ingesting data

(4)

Ingesting data directly into Azure Synapse

(1)

Ingesting data into an Azure data platform

(1)

Managing changes in upstream data sources

(2)

2.3 Processing data

(5)

Processing data in the warehouse

(2)

Processing data in the data platform

(2)

2.4 Accessing data

(1)

2.5 Cloud cost considerations

(2)

2.6 Exercise answers

(1)

3 Getting bigger and leveraging the Big 3: Amazon, Microsoft Azure, and Google

(41)

3.1 Cloud data platform layered architecture

(21)

Data ingestion layer

(4)

Fast and slow storage

(2)

Processing layer

(1)

Technical metadata layer

(2)

The serving layer and data consumers

(4)

Orchestration and ETL overlay layers

(6)

3.2 The importance of layers in a data platform architecture

(1)

3.3 Mapping cloud data platform layers to specific tools

(14)

AWS

(4)

Google Cloud

(4)

Azure

(4)

3.4 Open source and commercial alternatives

(3)

Batch data ingestion

(1)

Streaming data ingestion and real-time analytics

(1)

Orchestration layer

(2)

3.5 Exercise answers

(1)

4 Getting data into the platform

(49)

4.1 Databases, files, APIs, and streams

(4)

Relational databases

(1)

Files

(1)

SaaS data via API

(1)

Streams

(1)

4.2 Ingesting data from relational databases

(24)

Ingesting data from RDBMSs using a SQL interface

(2)

Full-table ingestion

(5)

Incremental table ingestion

(3)

Change data capture (CDC)

(4)

CDC vendors overview

(2)

Datatype conversion

100

(3)

Ingesting data from NoSQL databases

103

(1)

Capturing important metadata for RDBMS or NoSQL ingestion pipelines

104

(3)

4.3 Ingesting data from files

107

(7)

Tracking ingested files

109

(3)

Capturing file ingestion metadata

112

(2)

4.4 Ingesting data from streams

114

(6)

Differences between batch and streaming ingestion

117

(2)

Capturing streaming pipeline metadata

119

(1)

4.5 Ingesting data from SaaS applications

120

(3)

No standard approach to API design

121

(1)

No standard way to deal with full vs. incremental data exports

122

(1)

Resulting data is typically highly nested JSON

122

(1)

4.6 Network and security considerations for data ingestion into the cloud

123

(3)

Connecting other networks to your cloud data platform

123

(3)

4.7 Exercise answers

126

(1)

5 Organizing and processing data

127

(29)

5.1 Processing as a separate layer in the data platform

129

(2)

5.2 Data processing stages

131

(1)

5.3 Organizing your cloud storage

132

(8)

Cloud storage containers and folders

134

(6)

5.4 Common data processing steps

140

(12)

File format conversion

140

(5)

Data deduplication

145

(5)

Data quality checks

150

(2)

5.5 Configurable pipelines

152

(3)

5.6 Exercise answers

155

(1)

6 Real-time data processing and analytics

156

(41)

6.1 Real-time ingestion vs. real-time processing

157

(3)

6.2 Use cases for real-time data processing

160

(4)

Retail use case: Real-time ingestion

160

(1)

Online gaming use case: Real-time ingestion and real-time processing

161

(3)

Summary of real-time ingestion vs. real-time processing

164

(1)

6.3 When should you use real-time ingestion and/or real-time processing?

164

(3)

6.4 Organizing data for real-time use

167

(11)

The anatomy of fast storage

167

(3)

How does fast storage scale?

170

(2)

Organizing data in the real-time storage

172

(6)

6.5 Common data transformations in real time

178

(12)

Causes of duplicates in real-time systems

178

(3)

Deduplicating data in real-time systems

181

(5)

Converting message formats in real-time pipelines

186

(1)

Real-time data quality checks

187

(1)

Combining batch and real-time data

188

(2)

6.6 Cloud services for real-time data processing

190

(5)

AWS real-time processing services

190

(2)

Google Cloud real-time processing services

192

(1)

Azure real-time processing services

193

(2)

6.7 Exercise answers

195

(2)

7 Metadata layer architecture

197

(31)

7.1 What we mean by metadata

198

(1)

Business metadata

198

(1)

Data platform internal metadata or "pipeline metadata"

199

(1)

7.2 Taking advantage of pipeline metadata

199

(4)

7.3 Metadata model

203

(10)

Metadata domains

204

(9)

7.4 Metadata layer implementation options

213

(7)

Metadata layer as a collection of configuration files

214

(3)

Metadata database

217

(1)

Metadata API

218

(2)

7.5 Overview of existing solutions

220

(7)

Cloud metadata services

221

(2)

Open source metadata layer implementations

223

(4)

7.6 Exercise answers

227

(1)

8 Schema management

228

(33)

8.1 Why schema management

229

(3)

Schema changes in a traditional data warehouse architecture

230

(1)

Schema-on-read approach

231

(1)

8.2 Schema-management approaches

232

(11)

Schema as a contract

233

(2)

Schema management in the data platform

235

(6)

Monitoring schema changes

241

(2)

8.3 Schema Registry Implementation

243

(5)

Apache Avro schemas

243

(2)

Existing Schema Registry implementations

245

(1)

Schema Registry as part of a Metadata layer

246

(2)

8.4 Schema evolution scenarios

248

(7)

Schema compatibility rules

249

(2)

Schema evolution and data transformation pipelines

251

(4)

8.5 Schema evolution and data warehouses

255

(5)

Schema-management features of cloud data warehouses

257

(3)

8.6 Exercise answers

260

(1)

9 Data access and security

261

(28)

9.1 Different types of data consumers

262

(1)

9.2 Cloud data warehouses

263

(11)

AWS Redshift

264

(4)

Azure Synapse

268

(2)

Google BigQuery

270

(3)

Choosing the right data warehouse

273

(1)

9.3 Application data access

274

(4)

Cloud relational databases

275

(1)

Cloud key/value data stores

276

(1)

Full-text search services

277

(1)

In-memory cache

278

(1)

9.4 Machine learning on the data platform

278

(5)

Machine learning model lifecycle on a cloud data platform

279

(3)

ML cloud collaboration tools

282

(1)

9.5 Business intelligence and reporting tools

283

(2)

Traditional BI tools and cloud data platform integration

283

(1)

Using Excel as a BI tool

284

(1)

BI tools that are external to the cloud provider

284

(1)

9.6 Data security

285

(3)

Users, groups, and roles

285

(1)

Credentials and configuration management

286

(1)

Data encryption

286

(1)

Network boundaries

287

(1)

9.7 Exercise Answers

288

(1)

10 Fueling business value with data platforms

289

(15)

10.1 Why you need a data strategy

290

(1)

10.2 The analytics maturity journey

291

(5)

SEE: Getting insights from data

292

(1)

PREDICT: Using data to predict what to do

293

(1)

DO: Making your analytics actionable

294

(1)

CREATE: Going beyond analytics into products

295

(1)

10.3 The data platform: The engine that powers analytics maturity

296

(1)

10.4 Platform project stoppers

297

(7)

Time does indeed kill

297

(1)

User adoption

298

(1)

User trust and the need for data governance

299

(1)

Operating in a platform silo

300

(1)

The dollar dance

301

(3)

index

304

Designing Cloud Data Platforms [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv