10. juulil oleme suletud

E-raamat: Official Google Cloud Certified Professional Data Engineer Study Guide

  • Formaat: 352 pages
  • Ilmumisaeg: 18-May-2020
  • Kirjastus: Sybex Inc.,U.S.
  • Keel: eng
  • ISBN-13: 9781119618447
Teised raamatud teemal:
  • Formaat - PDF+DRM
  • Hind: 46,31 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa soovinimekirja
  • Lisa ostukorvi
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks.
  • Formaat: 352 pages
  • Ilmumisaeg: 18-May-2020
  • Kirjastus: Sybex Inc.,U.S.
  • Keel: eng
  • ISBN-13: 9781119618447
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    E-raamatu lugemiseks on vaja luua Adobe ID ning laadida arvutisse Adobe Digital Editions. Lähemalt siit. E-raamatut saab lugeda ning alla laadida kuni 6'de seadmesse.
    E-raamatut ei saa lugeda Amazon Kindle's. Ülejäänud meie e-poes pakutavad e-lugerid võimaldavad lugeda Adobe ID-ga kaitstud e-raamatuid.

The proven Study Guide that prepares you for this new Google Cloud exam

The Google Cloud Certified Professional Data Engineer Study Guide, provides everything you need to prepare for this important exam and master the skills necessary to land that coveted Google Cloud Professional Data Engineer certification. Beginning with a pre-book assessment quiz to evaluate what you know before you begin, each chapter features exam objectives and review questions, plus the online learning environment includes additional complete practice tests. 

Written by Dan Sullivan, a popular and experienced online course author for machine learning, big data, and Cloud topics, Google Cloud Certified Professional Data Engineer Study Guide is your ace in the hole for deploying and managing analytics and machine learning applications. 

•    Build and operationalize storage systems, pipelines, and compute infrastructure

•    Understand machine learning models and learn how to select pre-built models

•    Monitor and troubleshoot machine learning models

•    Design analytics and machine learning applications that are secure, scalable, and highly available. 

This exam guide is designed to help you develop an in depth understanding of data engineering and machine learning on Google Cloud Platform.

Introduction xxiii Assessment Test xxix
Chapter 1 Selecting Appropriate Storage Technologies 1 From Business Requirements to Storage Systems 2 Ingest 3 Store 5 Process and Analyze 6 Explore and Visualize 8 Technical Aspects of Data: Volume, Velocity, Variation, Access, and Security 8 Volume 8 Velocity 9 Variation in Structure 10 Data Access Patterns 11 Security Requirements 12 Types of Structure: Structured, Semi-Structured, and Unstructured 12 Structured: Transactional vs. Analytical 13 Semi-Structured: Fully Indexed vs. Row Key Access 13 Unstructured Data 15 Google's Storage Decision Tree 16 Schema Design Considerations 16 Relational Database Design 17 NoSQL Database Design 20 Exam Essentials 23 Review Questions 24
Chapter 2 Building and Operationalizing Storage Systems 29 Cloud SQL 30 Configuring Cloud SQL 31 Improving Read Performance with Read Replicas 33 Importing and Exporting Data 33 Cloud Spanner 34 Configuring Cloud Spanner 34 Replication in Cloud Spanner 35 Database Design Considerations 36 Importing and Exporting Data 36 Cloud Bigtable 37 Configuring Bigtable 37 Database Design Considerations 38 Importing and Exporting 39 Cloud Firestore 39 Cloud Firestore Data Model 40 Indexing and Querying 41 Importing and Exporting 42 BigQuery 42 BigQuery Datasets 43 Loading and Exporting Data 44 Clustering, Partitioning, and Sharding Tables 45 Streaming Inserts 46 Monitoring and Logging in BigQuery 46 BigQuery Cost Considerations 47 Tips for Optimizing BigQuery 47 Cloud Memorystore 48 Cloud Storage 50 Organizing Objects in a Namespace 50 Storage Tiers 51 Cloud Storage Use Cases 52 Data Retention and Lifecycle Management 52 Unmanaged Databases 53 Exam Essentials 54 Review Questions 56
Chapter 3 Designing Data Pipelines 61 Overview of Data Pipelines 62 Data Pipeline Stages 63 Types of Data Pipelines 66 GCP Pipeline Components 73 Cloud Pub/Sub 74 Cloud Dataflow 76 Cloud Dataproc 79 Cloud Composer 82 Migrating Hadoop and Spark to GCP 82 Exam Essentials 83 Review Questions 86
Chapter 4 Designing a Data Processing Solution 89 Designing Infrastructure 90 Choosing Infrastructure 90 Availability, Reliability, and Scalability of Infrastructure 93 Hybrid Cloud and Edge Computing 96 Designing for Distributed Processing 98 Distributed Processing: Messaging 98 Distributed Processing: Services 101 Migrating a Data Warehouse 102 Assessing the Current State of a Data Warehouse 102 Designing the Future State of a Data Warehouse 103 Migrating Data, Jobs, and Access Controls 104 Validating the Data Warehouse 105 Exam Essentials 105 Review Questions 107
Chapter 5 Building and Operationalizing Processing Infrastructure 111 Provisioning and Adjusting Processing Resources 112 Provisioning and Adjusting Compute Engine 113 Provisioning and Adjusting Kubernetes Engine 118 Provisioning and Adjusting Cloud Bigtable 124 Provisioning and Adjusting Cloud Dataproc 127 Configuring Managed Serverless Processing Services 129 Monitoring Processing Resources 130 Stackdriver Monitoring 130 Stackdriver Logging 130 Stackdriver Trace 131 Exam Essentials 132 Review Questions 134
Chapter 6 Designing for Security and Compliance 139 Identity and Access Management with Cloud IAM 140 Predefined Roles 141 Custom Roles 143 Using Roles with Service Accounts 145 Access Control with Policies 146 Using IAM with Storage and Processing Services 148 Cloud Storage and IAM 148 Cloud Bigtable and IAM 149 BigQuery and IAM 149 Cloud Dataflow and IAM 150 Data Security 151 Encryption 151 Key Management 153 Ensuring Privacy with the Data Loss Prevention API 154 Detecting Sensitive Data 154 Running Data Loss Prevention Jobs 155 Inspection Best Practices 156 Legal Compliance 156 Health Insurance Portability and Accountability Act (HIPAA) 156 Children's Online Privacy Protection Act 157 FedRAMP 158 General Data Protection Regulation 158 Exam Essentials 158 Review Questions 161
Chapter 7 Designing Databases for Reliability, Scalability, and Availability 165 Designing Cloud Bigtable Databases for Scalability and Reliability 166 Data Modeling with Cloud Bigtable 166 Designing Row-keys 168 Designing for Time Series 170 Use Replication for Availability and Scalability 171 Designing Cloud Spanner Databases for Scalability and Reliability 172 Relational Database Features 173 Interleaved Tables 174 Primary Keys and Hotspots 174 Database Splits 175 Secondary Indexes 176 Query Best Practices 177 Designing BigQuery Databases for Data Warehousing 179 Schema Design for Data Warehousing 179 Clustered and Partitioned Tables 181 Querying Data in BigQuery 182 External Data Access 183 BigQuery ML 185 Exam Essentials 185 Review Questions 188
Chapter 8 Understanding Data Operations for Flexibility and Portability 191 Cataloging and Discovery with Data Catalog 192 Searching in Data Catalog 193 Tagging in Data Catalog 194 Data Preprocessing with Dataprep 195 Cleansing Data 196 Discovering Data 196 Enriching Data 197 Importing and Exporting Data 197 Structuring and Validating Data 198 Visualizing with Data Studio 198 Connecting to Data Sources 198 Visualizing Data 200 Sharing Data 200 Exploring Data with Cloud Datalab 200 Jupyter Notebooks 201 Managing Cloud Datalab Instances 201 Adding Libraries to Cloud Datalab Instances 202 Orchestrating Workflows with Cloud Composer 202 Airflow Environments 203 Creating DAGs 203 Airflow Logs 204 Exam Essentials 204 Review Questions 206
Chapter 9 Deploying Machine Learning Pipelines 209 Structure of ML Pipelines 210 Data Ingestion 211 Data Preparation 212 Data Segregation 215 Model Training 217 Model Evaluation 218 Model Deployment 220 Model Monitoring 221 GCP Options for Deploying Machine Learning Pipeline 221 Cloud AutoML 221 BigQuery ML 223 Kubeflow 223 Spark Machine Learning 224 Exam Essentials 225 Review Questions 227
Chapter 10 Choosing Training and Serving Infrastructure 231 Hardware Accelerators 232 Graphics Processing Units 232 Tensor Processing Units 233 Choosing Between CPUs, GPUs, and TPUs 233 Distributed and Single Machine Infrastructure 234 Single Machine Model Training 234 Distributed Model Training 235 Serving Models 236 Edge Computing with GCP 237 Edge Computing Overview 237 Edge Computing Components and Processes 239 Edge TPU 240 Cloud IoT 240 Exam Essentials 241 Review Questions 244
Chapter 11 Measuring, Monitoring, and Troubleshooting Machine Learning Models 247 Three Types of Machine Learning Algorithms 248 Supervised Learning 248 Unsupervised Learning 253 Anomaly Detection 254 Reinforcement Learning 254 Deep Learning 255 Engineering Machine Learning Models 257 Model Training and Evaluation 257 Operationalizing ML Models 262 Common Sources of Error in Machine Learning Models 263 Data Quality 264 Unbalanced Training Sets 264 Types of Bias 264 Exam Essentials 265 Review Questions 267
Chapter 12 Leveraging Prebuilt Models as a Service 269 Sight 270 Vision AI 270 Video AI 272 Conversation 274 Dialogflow 274 Cloud Text-to-Speech API 275 Cloud Speech-to-Text API 275 Language 276 Translation 276 Natural Language 277 Structured Data 278 Recommendations AI API 278 Cloud Inference API 280 Exam Essentials 280 Review Questions 282 Appendix Answers to Review Questions 285
Chapter 1: Selecting Appropriate Storage Technologies 286
Chapter 2: Building and Operationalizing Storage Systems 288
Chapter 3: Designing Data Pipelines 290
Chapter 4: Designing a Data Processing Solution 291
Chapter 5: Building and Operationalizing Processing Infrastructure 293
Chapter 6: Designing for Security and Compliance 295
Chapter 7: Designing Databases for Reliability, Scalability, and Availability 296
Chapter 8: Understanding Data Operations for Flexibility and Portability 298
Chapter 9: Deploying Machine Learning Pipelines 299
Chapter 10: Choosing Training and Serving Infrastructure 301
Chapter 11: Measuring, Monitoring, and Troubleshooting Machine Learning Models 303
Chapter 12: Leveraging Prebuilt Models as a Service 304 Index 307
DAN SULLIVAN is a software architect specializing in data architecture, machine learning, and cloud computing. Dan is a Google Cloud Certified Professional Data Engineer, Professional Architect, and Associate Cloud Engineer. Dan is the author of six books and numerous articles. He is an instructor with LinkedIn Learning and Udemy for Business.