Customer Support: +372 7440010

Help | New account | Log In

E-book: Data Science on AWS

3.94/5 (84 ratings by Goodreads)

Chris Fregly, Antje Barth

Format: 524 pages
Pub. Date: 07-Apr-2021
Publisher: O'Reilly Media
Language: eng
ISBN-13: 9781492079347

Other books in subject:

Machine learning

Format - EPUB+DRM
Price: 56,15 €*
* the price is final i.e. no additional discount will apply
Add to basket
Add to Wishlist
This ebook is for personal use only. E-Books are non-refundable.

Format: 524 pages
Pub. Date: 07-Apr-2021
Publisher: O'Reilly Media
Language: eng
ISBN-13: 9781492079347

Other books in subject:

Machine learning

DRM restrictions

Copying (copy/paste):

not allowed
Printing:

not allowed
Usage:

Digital Rights Management (DRM)
The publisher has supplied this book in encrypted form, which means that you need to install free software in order to unlock and read it. To read this e-book you have to create Adobe ID More info here. Ebook can be read and downloaded up to 6 devices (single user with the same Adobe ID).

Required software
To read this ebook on a mobile device (phone or tablet) you'll need to install this free app: PocketBook Reader (iOS / Android)

To download and read this eBook on a PC or Mac you need Adobe Digital Editions (This is a free app specially developed for eBooks. It's not the same as Adobe Reader, which you probably already have on your computer.)

You can't read this ebook with Amazon Kindle

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance.

Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more
Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot
Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment
Tie everything together into a repeatable machine learning operations pipeline
Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka
Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

Preface

xiii

1 Introduction to Data Science on AWS

(28)

Benefits of Cloud Computing

(3)

Data Science Pipelines and Workflows

(3)

MLOps Best Practices

(3)

Amazon AI Services and AutoML with Amazon SageMaker

(3)

Data Ingestion, Exploration, and Preparation in AWS

(5)

Model Training and Tuning with Amazon SageMaker

(3)

Model Deployment with Amazon SageMaker and AWS Lambda Functions

(1)

Streaming Analytics and Machine Learning on AWS

(2)

AWS Infrastructure and Custom-Built Hardware

(3)

Reduce Cost with Tags, Budgets, and Alerts

(1)

Summary

(3)

2 Data Science Use Cases

(46)

Innovation Across Every Industry

(1)

Personalized Product Recommendations

(6)

Detect Inappropriate Videos with Amazon Rekognition

(2)

Demand Forecasting

(4)

Identify Fake Accounts with Amazon Fraud Detector

(1)

Enable Privacy-Leak Detection with Amazon Macie

(1)

Conversational Devices and Voice Assistants

(1)

Text Analysis and Natural Language Processing

(5)

Cognitive Search and Natural Language Understanding

(1)

Intelligent Customer Support Centers

(1)

Industrial AI Services and Predictive Maintenance

(1)

Home Automation with AWS IoT and Amazon SageMaker

(1)

Extract Medical Information from Healthcare Documents

(1)

Self-Optimizing and Intelligent Cloud Infrastructure

(1)

Cognitive and Predictive Business Intelligence

(4)

Educating the Next Generation of AI and ML Developers

(5)

Program Nature's Operating System with Quantum Computing

(5)

Increase Performance and Reduce Cost

(3)

Summary

(2)

3 Automated Machine Learning

(22)

Automated Machine Learning with SageMaker Autopilot

(2)

Track Experiments with SageMaker Autopilot

(1)

Train and Deploy a Text Classifier with SageMaker Autopilot

(13)

Automated Machine Learning with Amazon Comprehend

(4)

Summary

(2)

4 Ingest Data into the Cloud

(30)

Data Lakes

(7)

Query the Amazon S3 Data Lake with Amazon Athena

105

(4)

Continuously Ingest New Data with AWS Glue Crawler

109

(2)

Build a Lake House with Amazon Redshift Spectrum

111

(7)

Choose Between Amazon Athena and Amazon Redshift

118

(1)

Reduce Cost and Increase Performance

119

(7)

Summary

126

(1)

5 Explore the Dataset

127

(46)

Tools for Exploring Data in AWS

128

(1)

Visualize Our Data Lake with SageMaker Studio

129

(13)

Query Our Data Warehouse

142

(8)

Create Dashboards with Amazon QuickSight

150

(1)

Detect Data-Quality Issues with Amazon SageMaker and Apache Spark

151

(8)

Detect Bias in Our Dataset

159

(7)

Detect Different Types of Drift with SageMaker Clarify

166

(2)

Analyze Our Data with AWS Glue DataBrew

168

(2)

Reduce Cost and Increase Performance

170

(2)

Summary

172

(1)

6 Prepare the Dataset for Model Training

173

(34)

Perform Feature Selection and Engineering

173

(14)

Scale Feature Engineering with SageMaker Processing Jobs

187

(7)

Share Features Through SageMaker Feature Store

194

(4)

Ingest and Transform Data with SageMaker Data Wrangler

198

(1)

Track Artifact and Experiment Lineage with Amazon SageMaker

199

(5)

Ingest and Transform Data with AWS Glue DataBrew

204

(2)

Summary

206

(1)

7 Train Your First Model

207

(70)

Understand the SageMaker Infrastructure

207

(5)

Deploy a Pre-Trained BERT Model with SageMaker JumpStart

212

(2)

Develop a SageMaker Model

214

(2)

A Brief History of Natural Language Processing

216

(3)

BERT Transformer Architecture

219

(2)

Training BERT from Scratch

221

(2)

Fine Tune a Pre-Trained BERT Model

223

(3)

Create the Training Script

226

(6)

Launch the Training Script from a SageMaker Notebook

232

(7)

Evaluate Models

239

(6)

Debug and Profile Model Training with SageMaker Debugger

245

(4)

Interpret and Explain Model Predictions

249

(6)

Detect Model Bias and Explain Predictions

255

(4)

More Training Options for BERT

259

(9)

Reduce Cost and Increase Performance

268

(6)

Summary

274

(3)

8 Train and Optimize Models at Scale

277

(24)

Automatically Find the Best Model Hyper-Parameters

277

(7)

Use Warm Start for Additional SageMaker Hyper-Parameter Tuning Jobs

284

(4)

Scale Out with SageMaker Distributed Training

288

(8)

Reduce Cost and Increase Performance

296

(4)

Summary

300

(1)

9 Deploy Models to Production

301

(68)

Choose Real-Time or Batch Predictions

301

(1)

Real-Time Predictions with SageMaker Endpoints

302

(8)

Auto-Scale SageMaker Endpoints Using Amazon CloudWatch

310

(5)

Strategies to Deploy New and Updated Models

315

(4)

Testing and Comparing New Models

319

(12)

Monitor Model Performance and Detect Drift

331

(4)

Monitor Data Quality of Deployed SageMaker Endpoints

335

(6)

Monitor Model Quality of Deployed SageMaker Endpoints

341

(4)

Monitor Bias Drift of Deployed SageMaker Endpoints

345

(3)

Monitor Feature Attribution Drift of Deployed SageMaker Endpoints

348

(3)

Perform Batch Predictions with SageMaker Batch Transform

351

(5)

AWS Lambda Functions and Amazon API Gateway

356

(1)

Optimize and Manage Models at the Edge

357

(1)

Deploy a PyTorch Model with TorchServe

357

(3)

TensorFlow-BERT Inference with AWS Deep Java Library

360

(2)

Reduce Cost and Increase Performance

362

(5)

Summary

367

(2)

10 Pipelines and MLOps

369

(40)

Machine Learning Operations

369

(2)

Software Pipelines

371

(1)

Machine Learning Pipelines

371

(4)

Pipeline Orchestration with SageMaker Pipelines

375

(11)

Automation with SageMaker Pipelines

386

(5)

More Pipeline Options

391

(9)

Human-in-the-Loop Workflows

400

(6)

Reduce Cost and Improve Performance

406

(1)

Summary

407

(2)

11 Streaming Analytics and Machine Learning

409

(34)

Online Learning Versus Offline Learning

410

(1)

Streaming Applications

410

(1)

Windowed Queries on Streaming Data

411

(4)

Streaming Analytics and Machine Learning on AWS

415

(2)

Classify Real-Time Product Reviews with Amazon Kinesis, AWS Lambda, and Amazon SageMaker

417

(1)

Implement Streaming Data Ingest Using Amazon Kinesis Data Firehose

418

(4)

Summarize Real-Time Product Reviews with Streaming Analytics

422

(2)

Setting Up Amazon Kinesis Data Analytics

424

(8)

Amazon Kinesis Data Analytics Applications

432

(7)

Classify Product Reviews with Apache Kafka, AWS Lambda, and Amazon SageMaker

439

(1)

Reduce Cost and Improve Performance

440

(2)

Summary

442

(1)

12 Secure Data Science on AWS

443

(44)

Shared Responsibility Model Between AWS and Customers

443

(1)

Applying AWS Identity and Access Management

444

(8)

Isolating Compute and Network Environments

452

(3)

Securing Amazon S3 Data Access

455

(8)

Encryption at Rest

463

(4)

Encryption in Transit

467

(2)

Securing SageMaker Notebook Instances

469

(2)

Securing SageMaker Studio

471

(2)

Securing SageMaker Jobs and Models

473

(4)

Securing AWS Lake Formation

477

(1)

Securing Database Credentials with AWS Secrets Manager

478

(1)

Governance

478

(3)

Auditability

481

(2)

Reduce Cost and Improve Performance

483

(2)

Summary

485

(2)

Index

487

Chris Fregly is a Developer Advocate for AI and Machine Learning at AWS, based in San Francisco, California. He is also the founder of the Advanced Spark, TensorFlow, and KubeFlow Meetup Series based in San Francisco. Chris regularly speaks at AI and Machine Learning conferences across the world including the O'Reilly AI, Strata, and Velocity Conferences. Previously, Chris was Founder at PipelineAI where he worked with many AI-first startups and enterprises to continuously deploy ML/AI Pipelines using Apache Spark ML, Kubernetes, TensorFlow, Kubeflow, Amazon EKS, and Amazon SageMaker. He is also the author of the O'Reilly Online Training Series "High Performance TensorFlow in Production with GPUs". Antje Barth is a Developer Advocate for AI and Machine Learning at AWS, based in Dusseldorf, Germany. She is also co-founder of the Dusseldorf chapter of Women in Big Data Meetup. Antje frequently speaks at AI and Machine Learning conferences and meetups around the world, including the O'Reilly AI and Strata conferences. Besides ML/AI, Antje is passionate about helping developers leverage Big Data, container and Kubernetes platforms in the context of AI and Machine Learning. Prior to joining AWS, Antje worked in technical evangelist and solutions engineering roles at MapR and Cisco

More information about ebooks

Permanent link: https://www.kriso.ee/db/97814920793476e.html

Keywords:

E-book: Data Science on AWS

DRM restrictions

Copying (copy/paste):

Printing:

Usage:

Account & settings

Search

Search database

Refine By

Subjects Ebook Subjects

Choose shopping cart