Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library

3.50/5 (8 hinnangut Goodreads-ist)

Hien Luu

Formaat: EPUB+DRM
Ilmumisaeg: 22-Oct-2021
Kirjastus: APress
Keel: eng
ISBN-13: 9781484273838

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 67,91 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: EPUB+DRM
Ilmumisaeg: 22-Oct-2021
Kirjastus: APress
Keel: eng
ISBN-13: 9781484273838

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Take a journey toward discovering, learning, and using Apache Spark 3.0. In this book, you will gain expertise on the powerful and efficient distributed data processing engine inside of Apache Spark; its user-friendly, comprehensive, and flexible programming model for processing data in batch and streaming; and the scalable machine learning algorithms and practical utilities to build machine learning applications.

Beginning Apache Spark 3 begins by explaining different ways of interacting with Apache Spark, such as Spark Concepts and Architecture, and Spark Unified Stack. Next, it offers an overview of Spark SQL before moving on to its advanced features. It covers tips and techniques for dealing with performance issues, followed by an overview of the structured streaming processing engine. It concludes with a demonstration of how to develop machine learning applications using Spark MLlib and how to manage the machine learning development lifecycle. This book is packed with practical examples and code snippets to help you master concepts and features immediately after they are covered in each section.

After reading this book, you will have the knowledge required to build your own big data pipelines, applications, and machine learning applications.

What You Will Learn

Master the Spark unified data analytics engine and its various components Work in tandem to provide a scalable, fault tolerant and performant data processing engine Leverage the user-friendly and flexible programming model to perform simple to complex data analytics using dataframe and Spark SQL Develop machine learning applications using Spark MLlib Manage the machine learning development lifecycle using MLflow

Who This Book Is For

Data scientists, data engineers and software developers.

About the Author

About the Technical Reviewers

xiii

Acknowledgments

Introduction

xvii

Chapter 1 Introduction to Apache Spark

(16)

Overview

(1)

History

(1)

Spark Core Concepts and Architecture

(7)

Spark Cluster and Resource Management System

(1)

Spark Applications

(1)

Spark Drivers and Executors

(1)

Spark Unified Stack

(4)

Apache Spark 3.0

(1)

Adaptive Query Execution Framework

(1)

Dynamic Partition Pruning (DPP)

(1)

Accelerator-aware Scheduler

(1)

Apache Spark Applications

(1)

Spark Example Applications

(1)

Apache Spark Ecosystem

(1)

Delta Lake

(1)

Koalas

(1)

MLflow

(1)

Summary

(3)

Chapter 2 Working with Apache Spark

(34)

Downloading and Installation

(4)

Downloading Spark

(1)

Installing Spark

(3)

Having Fun with the Spark Scala Shell

(11)

Useful Spark Scala Shell Command and Tips

(3)

Basic Interactions with Scala and Spark

(8)

Introduction to Collaborative Notebooks

(15)

Create a Cluster

(3)

Create a Folder

(2)

Create a Notebook

(7)

Setting up Spark Source Code

(1)

Summary

(3)

Chapter 3 Spark SQL: Foundation

(60)

Understanding RDD

(1)

Introduction to the DataFrame API

(1)

Creating a DataFrame

(40)

Creating a DataFrame from RDD

(3)

Creating a DataFrame from a Range of Numbers

(3)

Creating a DataFrame from Data Sources

(14)

Working with Structured Operations

(20)

Introduction to Datasets

(5)

Creating Datasets

(1)

Working with Datasets

(2)

Using SQL in Spark SQL

(4)

Running SQL in Spark

(4)

Writing Data Out to Storage Systems

103

(3)

The Trio: DataFrame, Dataset, and SQL

106

(1)

DataFrame Persistence

107

(1)

Summary

108

(3)

Chapter 4 Spark SQL: Advanced

111

(72)

Aggregations

111

(17)

Aggregation Functions

112

(9)

Aggregation with Grouping

121

(4)

Aggregation with Pivoting

125

(3)

Joins

128

(14)

Join Expression and Join Types

128

(2)

Working with Joins

130

(7)

Dealing with Duplicate Column Names

137

(2)

Overview of Join Implementation

139

(3)

Functions

142

(18)

Working with Built-in Functions

142

(16)

Working with User-Defined Functions (UDFs)

158

(2)

Advanced Analytics Functions

160

(15)

Aggregation with Rollups and Cubes

160

(1)

Rollups

161

(2)

Cubes

163

(12)

Exploring Catalyst Optimizer

175

(7)

Logical Plan

175

(1)

Physical Plan

176

(1)

Catalyst in Action

176

(4)

Project Tungsten

180

(2)

Summary

182

(1)

Chapter 5 Optimizing Spark Applications

183

(38)

Common Performance Issues

183

(10)

Spark Configurations

184

(3)

Spark Memory Management

187

(6)

Leverage In-Memory Computation

193

(5)

When to Persist and Cache Data

193

(1)

Persistence and Caching APIs

193

(2)

Persistence and Caching Example

195

(3)

Understanding Spark Joins

198

(6)

Broadcast Hash Join

199

(2)

Shuffle Sort Merge Join

201

(3)

Adaptive Query Execution

204

(14)

Dynamically Coalescing Shuffle Partitions

206

(5)

Dynamically Switching Join Strategies

211

(2)

Dynamically Optimizing Skew Joins

213

(5)

Summary

218

(3)

Chapter 6 Spark Streaming

221

(66)

Stream Processing

222

(8)

Concepts

224

(4)

Stream Processing Engine Landscape

228

(2)

Spark Streaming Overview

230

(1)

Spark DStream

231

(53)

Spark Structured Streaming

234

(1)

Overview

234

(2)

Core Concepts

236

(6)

Structured Streaming Applications

242

(7)

Streaming DataFrame Operations

249

(3)

Working with Data Sources

252

(12)

Working with Data Sinks

264

(10)

Output Modes

274

(5)

Triggers

279

(5)

Summary

284

(3)

Chapter 7 Advanced Spark Streaming

287

(44)

Event Time

287

(13)

Fixed Window Aggregation over an Event Time

289

(2)

Sliding Window Aggregation over Event Time

291

(4)

Aggregation State

295

(1)

Watermarking: Limit State and Handle Late Data

296

(4)

Arbitrary Stateful Processing

300

(16)

Arbitrary Stateful Processing with Structured Streaming

300

(3)

Handling State Timeouts

303

(1)

Arbitrary State Processing in Action

304

(12)

Handling Duplicate Data

316

(2)

Fault Tolerance

318

(2)

Streaming Application Code Change

319

(1)

Spark Runtime Change

320

(1)

Streaming Query Metrics and Monitoring

320

(8)

Streaming Query Metrics

320

(3)

Monitoring Streaming Queries via Callback

323

(1)

Monitoring Streaming Queries via Visualization UI

324

(1)

Streaming Query Summary Information

325

(1)

Streaming Query Detailed Statistics Information

326

(1)

Troubleshooting Streaming Query

327

(1)

Summary

328

(3)

Chapter 8 Machine Learning with Spark

331

(64)

Machine Learning Overview

332

(9)

Machine Learning Terminologies

333

(2)

Machine Learning Types

335

(4)

Machine Learning Development Process

339

(2)

Spark Machine Learning Library

341

(34)

Machine Learning Pipelines

341

(34)

Machine Learning Tasks in Action

375

(16)

Classification

375

(4)

Regression

379

(3)

Recommendation

382

(9)

Deep Learning Pipeline

391

(1)

Summary

392

(3)

Chapter 9 Managing the Machine Learning Life Cycle

395

(36)

The Rise of MLOps

396

(2)

MLOps Overview

396

(2)

MLflow Overview

398

(29)

MLflow Components

399

(1)

MLflow in Action

400

(27)

Model Deployment and Prediction

427

(1)

Summary

428

(3)

Index

431

Hien Luu has extensive experience in designing and building big data applications and machine learning infrastructure. He is particularly passionate about the intersection between big data and machine learning. Hien enjoys working with open source software and has contributed to Apache Pig and Azkaban. Teaching is also one of his passions, and he serves as an instructor at the UCSC Silicon Valley Extension school teaching Apache Spark. He has given presentations at various conferences such as Data+AI Summit, MLOps World, QCon SF, QCon London, Hadoop Summit, and JavaOne.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97814842738386e.html

Märksõnad:

E-raamat: Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv