Update cookies preferences

E-book: Data Science on AWS

3.94/5 (84 ratings by Goodreads)
  • Format: 524 pages
  • Pub. Date: 07-Apr-2021
  • Publisher: O'Reilly Media
  • Language: eng
  • ISBN-13: 9781492079347
Other books in subject:
  • Format - EPUB+DRM
  • Price: 56,15 €*
  • * the price is final i.e. no additional discount will apply
  • Add to basket
  • Add to Wishlist
  • This ebook is for personal use only. E-Books are non-refundable.
  • Format: 524 pages
  • Pub. Date: 07-Apr-2021
  • Publisher: O'Reilly Media
  • Language: eng
  • ISBN-13: 9781492079347
Other books in subject:

DRM restrictions

  • Copying (copy/paste):

    not allowed

  • Printing:

    not allowed

  • Usage:

    Digital Rights Management (DRM)
    The publisher has supplied this book in encrypted form, which means that you need to install free software in order to unlock and read it.  To read this e-book you have to create Adobe ID More info here. Ebook can be read and downloaded up to 6 devices (single user with the same Adobe ID).

    Required software
    To read this ebook on a mobile device (phone or tablet) you'll need to install this free app: PocketBook Reader (iOS / Android)

    To download and read this eBook on a PC or Mac you need Adobe Digital Editions (This is a free app specially developed for eBooks. It's not the same as Adobe Reader, which you probably already have on your computer.)

    You can't read this ebook with Amazon Kindle

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance.

  • Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more
  • Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot
  • Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment
  • Tie everything together into a repeatable machine learning operations pipeline
  • Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka
  • Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more
Preface xiii
1 Introduction to Data Science on AWS
1(28)
Benefits of Cloud Computing
1(3)
Data Science Pipelines and Workflows
4(3)
MLOps Best Practices
7(3)
Amazon AI Services and AutoML with Amazon SageMaker
10(3)
Data Ingestion, Exploration, and Preparation in AWS
13(5)
Model Training and Tuning with Amazon SageMaker
18(3)
Model Deployment with Amazon SageMaker and AWS Lambda Functions
21(1)
Streaming Analytics and Machine Learning on AWS
21(2)
AWS Infrastructure and Custom-Built Hardware
23(3)
Reduce Cost with Tags, Budgets, and Alerts
26(1)
Summary
26(3)
2 Data Science Use Cases
29(46)
Innovation Across Every Industry
29(1)
Personalized Product Recommendations
30(6)
Detect Inappropriate Videos with Amazon Rekognition
36(2)
Demand Forecasting
38(4)
Identify Fake Accounts with Amazon Fraud Detector
42(1)
Enable Privacy-Leak Detection with Amazon Macie
43(1)
Conversational Devices and Voice Assistants
44(1)
Text Analysis and Natural Language Processing
45(5)
Cognitive Search and Natural Language Understanding
50(1)
Intelligent Customer Support Centers
51(1)
Industrial AI Services and Predictive Maintenance
52(1)
Home Automation with AWS IoT and Amazon SageMaker
53(1)
Extract Medical Information from Healthcare Documents
54(1)
Self-Optimizing and Intelligent Cloud Infrastructure
55(1)
Cognitive and Predictive Business Intelligence
56(4)
Educating the Next Generation of AI and ML Developers
60(5)
Program Nature's Operating System with Quantum Computing
65(5)
Increase Performance and Reduce Cost
70(3)
Summary
73(2)
3 Automated Machine Learning
75(22)
Automated Machine Learning with SageMaker Autopilot
76(2)
Track Experiments with SageMaker Autopilot
78(1)
Train and Deploy a Text Classifier with SageMaker Autopilot
78(13)
Automated Machine Learning with Amazon Comprehend
91(4)
Summary
95(2)
4 Ingest Data into the Cloud
97(30)
Data Lakes
98(7)
Query the Amazon S3 Data Lake with Amazon Athena
105(4)
Continuously Ingest New Data with AWS Glue Crawler
109(2)
Build a Lake House with Amazon Redshift Spectrum
111(7)
Choose Between Amazon Athena and Amazon Redshift
118(1)
Reduce Cost and Increase Performance
119(7)
Summary
126(1)
5 Explore the Dataset
127(46)
Tools for Exploring Data in AWS
128(1)
Visualize Our Data Lake with SageMaker Studio
129(13)
Query Our Data Warehouse
142(8)
Create Dashboards with Amazon QuickSight
150(1)
Detect Data-Quality Issues with Amazon SageMaker and Apache Spark
151(8)
Detect Bias in Our Dataset
159(7)
Detect Different Types of Drift with SageMaker Clarify
166(2)
Analyze Our Data with AWS Glue DataBrew
168(2)
Reduce Cost and Increase Performance
170(2)
Summary
172(1)
6 Prepare the Dataset for Model Training
173(34)
Perform Feature Selection and Engineering
173(14)
Scale Feature Engineering with SageMaker Processing Jobs
187(7)
Share Features Through SageMaker Feature Store
194(4)
Ingest and Transform Data with SageMaker Data Wrangler
198(1)
Track Artifact and Experiment Lineage with Amazon SageMaker
199(5)
Ingest and Transform Data with AWS Glue DataBrew
204(2)
Summary
206(1)
7 Train Your First Model
207(70)
Understand the SageMaker Infrastructure
207(5)
Deploy a Pre-Trained BERT Model with SageMaker JumpStart
212(2)
Develop a SageMaker Model
214(2)
A Brief History of Natural Language Processing
216(3)
BERT Transformer Architecture
219(2)
Training BERT from Scratch
221(2)
Fine Tune a Pre-Trained BERT Model
223(3)
Create the Training Script
226(6)
Launch the Training Script from a SageMaker Notebook
232(7)
Evaluate Models
239(6)
Debug and Profile Model Training with SageMaker Debugger
245(4)
Interpret and Explain Model Predictions
249(6)
Detect Model Bias and Explain Predictions
255(4)
More Training Options for BERT
259(9)
Reduce Cost and Increase Performance
268(6)
Summary
274(3)
8 Train and Optimize Models at Scale
277(24)
Automatically Find the Best Model Hyper-Parameters
277(7)
Use Warm Start for Additional SageMaker Hyper-Parameter Tuning Jobs
284(4)
Scale Out with SageMaker Distributed Training
288(8)
Reduce Cost and Increase Performance
296(4)
Summary
300(1)
9 Deploy Models to Production
301(68)
Choose Real-Time or Batch Predictions
301(1)
Real-Time Predictions with SageMaker Endpoints
302(8)
Auto-Scale SageMaker Endpoints Using Amazon CloudWatch
310(5)
Strategies to Deploy New and Updated Models
315(4)
Testing and Comparing New Models
319(12)
Monitor Model Performance and Detect Drift
331(4)
Monitor Data Quality of Deployed SageMaker Endpoints
335(6)
Monitor Model Quality of Deployed SageMaker Endpoints
341(4)
Monitor Bias Drift of Deployed SageMaker Endpoints
345(3)
Monitor Feature Attribution Drift of Deployed SageMaker Endpoints
348(3)
Perform Batch Predictions with SageMaker Batch Transform
351(5)
AWS Lambda Functions and Amazon API Gateway
356(1)
Optimize and Manage Models at the Edge
357(1)
Deploy a PyTorch Model with TorchServe
357(3)
TensorFlow-BERT Inference with AWS Deep Java Library
360(2)
Reduce Cost and Increase Performance
362(5)
Summary
367(2)
10 Pipelines and MLOps
369(40)
Machine Learning Operations
369(2)
Software Pipelines
371(1)
Machine Learning Pipelines
371(4)
Pipeline Orchestration with SageMaker Pipelines
375(11)
Automation with SageMaker Pipelines
386(5)
More Pipeline Options
391(9)
Human-in-the-Loop Workflows
400(6)
Reduce Cost and Improve Performance
406(1)
Summary
407(2)
11 Streaming Analytics and Machine Learning
409(34)
Online Learning Versus Offline Learning
410(1)
Streaming Applications
410(1)
Windowed Queries on Streaming Data
411(4)
Streaming Analytics and Machine Learning on AWS
415(2)
Classify Real-Time Product Reviews with Amazon Kinesis, AWS Lambda, and Amazon SageMaker
417(1)
Implement Streaming Data Ingest Using Amazon Kinesis Data Firehose
418(4)
Summarize Real-Time Product Reviews with Streaming Analytics
422(2)
Setting Up Amazon Kinesis Data Analytics
424(8)
Amazon Kinesis Data Analytics Applications
432(7)
Classify Product Reviews with Apache Kafka, AWS Lambda, and Amazon SageMaker
439(1)
Reduce Cost and Improve Performance
440(2)
Summary
442(1)
12 Secure Data Science on AWS
443(44)
Shared Responsibility Model Between AWS and Customers
443(1)
Applying AWS Identity and Access Management
444(8)
Isolating Compute and Network Environments
452(3)
Securing Amazon S3 Data Access
455(8)
Encryption at Rest
463(4)
Encryption in Transit
467(2)
Securing SageMaker Notebook Instances
469(2)
Securing SageMaker Studio
471(2)
Securing SageMaker Jobs and Models
473(4)
Securing AWS Lake Formation
477(1)
Securing Database Credentials with AWS Secrets Manager
478(1)
Governance
478(3)
Auditability
481(2)
Reduce Cost and Improve Performance
483(2)
Summary
485(2)
Index 487
Chris Fregly is a Developer Advocate for AI and Machine Learning at AWS, based in San Francisco, California. He is also the founder of the Advanced Spark, TensorFlow, and KubeFlow Meetup Series based in San Francisco. Chris regularly speaks at AI and Machine Learning conferences across the world including the O'Reilly AI, Strata, and Velocity Conferences. Previously, Chris was Founder at PipelineAI where he worked with many AI-first startups and enterprises to continuously deploy ML/AI Pipelines using Apache Spark ML, Kubernetes, TensorFlow, Kubeflow, Amazon EKS, and Amazon SageMaker. He is also the author of the O'Reilly Online Training Series "High Performance TensorFlow in Production with GPUs". Antje Barth is a Developer Advocate for AI and Machine Learning at AWS, based in Dusseldorf, Germany. She is also co-founder of the Dusseldorf chapter of Women in Big Data Meetup. Antje frequently speaks at AI and Machine Learning conferences and meetups around the world, including the O'Reilly AI and Strata conferences. Besides ML/AI, Antje is passionate about helping developers leverage Big Data, container and Kubernetes platforms in the context of AI and Machine Learning. Prior to joining AWS, Antje worked in technical evangelist and solutions engineering roles at MapR and Cisco