Introduction |
|
xxiii | |
Assessment Test |
|
xxix | |
|
Chapter 1 Selecting Appropriate Storage Technologies |
|
|
1 | (28) |
|
From Business Requirements to Storage Systems |
|
|
2 | (6) |
|
|
3 | (2) |
|
|
5 | (1) |
|
|
6 | (2) |
|
|
8 | (1) |
|
Technical Aspects of Data: Volume, Velocity, Variation, Access, and Security |
|
|
8 | (4) |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
12 | (1) |
|
Types of Structure: Structured, Semi-Structured, and Unstructured |
|
|
12 | (4) |
|
Structured: Transactional vs. Analytical |
|
|
13 | (1) |
|
Semi-Structured: Fully Indexed vs. Row Key Access |
|
|
13 | (2) |
|
|
15 | (1) |
|
Google's Storage Decision Tree |
|
|
16 | (1) |
|
Schema Design Considerations |
|
|
16 | (7) |
|
Relational Database Design |
|
|
17 | (3) |
|
|
20 | (3) |
|
|
23 | (1) |
|
|
24 | (5) |
|
Chapter 2 Building and Operationalizing Storage Systems |
|
|
29 | (32) |
|
|
30 | (4) |
|
|
31 | (2) |
|
Improving Read Performance with Read Replicas |
|
|
33 | (1) |
|
Importing and Exporting Data |
|
|
33 | (1) |
|
|
34 | (3) |
|
Configuring Cloud Spanner |
|
|
34 | (1) |
|
Replication in Cloud Spanner |
|
|
35 | (1) |
|
Database Design Considerations |
|
|
36 | (1) |
|
Importing and Exporting Data |
|
|
36 | (1) |
|
|
37 | (2) |
|
|
37 | (1) |
|
Database Design Considerations |
|
|
38 | (1) |
|
|
39 | (1) |
|
|
39 | (3) |
|
Cloud Firestore Data Model |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
42 | (1) |
|
|
42 | (6) |
|
|
43 | (1) |
|
Loading and Exporting Data |
|
|
44 | (1) |
|
Clustering, Partitioning, and Sharding Tables |
|
|
45 | (1) |
|
|
46 | (1) |
|
Monitoring and Logging in BigQuery |
|
|
46 | (1) |
|
BigQuery Cost Considerations |
|
|
47 | (1) |
|
Tips for Optimizing BigQuery |
|
|
47 | (1) |
|
|
48 | (2) |
|
|
50 | (3) |
|
Organizing Objects in a Namespace |
|
|
50 | (1) |
|
|
51 | (1) |
|
|
52 | (1) |
|
Data Retention and Lifecycle Management |
|
|
52 | (1) |
|
|
53 | (1) |
|
|
54 | (2) |
|
|
56 | (5) |
|
Chapter 3 Designing Data Pipelines |
|
|
61 | (28) |
|
Overview of Data Pipelines |
|
|
62 | (11) |
|
|
63 | (3) |
|
|
66 | (7) |
|
|
73 | (9) |
|
|
74 | (2) |
|
|
76 | (3) |
|
|
79 | (3) |
|
|
82 | (1) |
|
Migrating Hadoop and Spark to GCP |
|
|
82 | (1) |
|
|
83 | (3) |
|
|
86 | (3) |
|
Chapter 4 Designing a Data Processing Solution |
|
|
89 | (22) |
|
|
90 | (8) |
|
|
90 | (3) |
|
Availability, Reliability, and Scalability of Infrastructure |
|
|
93 | (3) |
|
Hybrid Cloud and Edge Computing |
|
|
96 | (2) |
|
Designing for Distributed Processing |
|
|
98 | (4) |
|
Distributed Processing: Messaging |
|
|
98 | (3) |
|
Distributed Processing: Services |
|
|
101 | (1) |
|
Migrating a Data Warehouse |
|
|
102 | (3) |
|
Assessing the Current State of a Data Warehouse |
|
|
102 | (1) |
|
Designing the Future State of a Data Warehouse |
|
|
103 | (1) |
|
Migrating Data, Jobs, and Access Controls |
|
|
104 | (1) |
|
Validating the Data Warehouse |
|
|
105 | (1) |
|
|
105 | (2) |
|
|
107 | (4) |
|
Chapter 5 Building and Operationalizing Processing Infrastructure |
|
|
111 | (28) |
|
Provisioning and Adjusting Processing Resources |
|
|
112 | (18) |
|
Provisioning and Adjusting Compute Engine |
|
|
113 | (5) |
|
Provisioning and Adjusting Kubernetes Engine |
|
|
118 | (6) |
|
Provisioning and Adjusting Cloud Bigtable |
|
|
124 | (3) |
|
Provisioning and Adjusting Cloud Dataproc |
|
|
127 | (2) |
|
Configuring Managed Serverless Processing Services |
|
|
129 | (1) |
|
Monitoring Processing Resources |
|
|
130 | (2) |
|
|
130 | (1) |
|
|
130 | (1) |
|
|
131 | (1) |
|
|
132 | (2) |
|
|
134 | (5) |
|
Chapter 6 Designing for Security and Compliance |
|
|
139 | (26) |
|
Identity and Access Management with Cloud IAM |
|
|
140 | (8) |
|
|
141 | (2) |
|
|
143 | (2) |
|
Using Roles with Service Accounts |
|
|
145 | (1) |
|
Access Control with Policies |
|
|
146 | (2) |
|
Using IAM with Storage and Processing Services |
|
|
148 | (3) |
|
|
148 | (1) |
|
|
149 | (1) |
|
|
149 | (1) |
|
|
150 | (1) |
|
|
151 | (3) |
|
|
151 | (2) |
|
|
153 | (1) |
|
Ensuring Privacy with the Data Loss Prevention API |
|
|
154 | (2) |
|
|
154 | (1) |
|
Running Data Loss Prevention Jobs |
|
|
155 | (1) |
|
Inspection Best Practices |
|
|
156 | (1) |
|
|
156 | (2) |
|
Health Insurance Portability and Accountability Act (HIPAA) |
|
|
156 | (1) |
|
Children's Online Privacy Protection Act |
|
|
157 | (1) |
|
|
158 | (1) |
|
General Data Protection Regulation |
|
|
158 | (1) |
|
|
158 | (3) |
|
|
161 | (4) |
|
Chapter 7 Designing Databases for Reliability, Scalability, and Availability |
|
|
165 | (26) |
|
Designing Cloud Bigtable Databases for Scalability and Reliability |
|
|
166 | (6) |
|
Data Modeling with Cloud Bigtable |
|
|
166 | (2) |
|
|
168 | (2) |
|
Designing for Time Series |
|
|
170 | (1) |
|
Use Replication for Availability and Scalability |
|
|
171 | (1) |
|
Designing Cloud Spanner Databases for Scalability and Reliability |
|
|
172 | (7) |
|
Relational Database Features |
|
|
173 | (1) |
|
|
174 | (1) |
|
Primary Keys and Hotspots |
|
|
174 | (1) |
|
|
175 | (1) |
|
|
176 | (1) |
|
|
177 | (2) |
|
Designing BigQuery Databases for Data Warehousing |
|
|
179 | (6) |
|
Schema Design for Data Warehousing |
|
|
179 | (2) |
|
Clustered and Partitioned Tables |
|
|
181 | (1) |
|
Querying Data in BigQuery |
|
|
182 | (1) |
|
|
183 | (2) |
|
|
185 | (1) |
|
|
185 | (3) |
|
|
188 | (3) |
|
Chapter 8 Understanding Data Operations for Flexibility and Portability |
|
|
191 | (18) |
|
Cataloging and Discovery with Data Catalog |
|
|
192 | (3) |
|
Searching in Data Catalog |
|
|
193 | (1) |
|
|
194 | (1) |
|
Data Preprocessing with Dataprep |
|
|
195 | (3) |
|
|
196 | (1) |
|
|
196 | (1) |
|
|
197 | (1) |
|
Importing and Exporting Data |
|
|
197 | (1) |
|
Structuring and Validating Data |
|
|
198 | (1) |
|
Visualizing with Data Studio |
|
|
198 | (2) |
|
Connecting to Data Sources |
|
|
198 | (2) |
|
|
200 | (1) |
|
|
200 | (1) |
|
Exploring Data with Cloud Datalab |
|
|
200 | (2) |
|
|
201 | (1) |
|
Managing Cloud Datalab Instances |
|
|
201 | (1) |
|
Adding Libraries to Cloud Datalab Instances |
|
|
202 | (1) |
|
Orchestrating Workflows with Cloud Composer |
|
|
202 | (2) |
|
|
203 | (1) |
|
|
203 | (1) |
|
|
204 | (1) |
|
|
204 | (2) |
|
|
206 | (3) |
|
Chapter 9 Deploying Machine Learning Pipelines |
|
|
209 | (22) |
|
Structure of ML Pipelines |
|
|
210 | (11) |
|
|
211 | (1) |
|
|
212 | (3) |
|
|
215 | (2) |
|
|
217 | (1) |
|
|
218 | (2) |
|
|
220 | (1) |
|
|
221 | (1) |
|
GCP Options for Deploying Machine Learning Pipeline |
|
|
221 | (4) |
|
|
221 | (2) |
|
|
223 | (1) |
|
|
223 | (1) |
|
|
224 | (1) |
|
|
225 | (2) |
|
|
227 | (4) |
|
Chapter 10 Choosing Training and Serving Infrastructure |
|
|
231 | (16) |
|
|
232 | (2) |
|
Graphics Processing Units |
|
|
232 | (1) |
|
|
233 | (1) |
|
Choosing Between CPUs, GPUs, and TPUs |
|
|
233 | (1) |
|
Distributed and Single Machine Infrastructure |
|
|
234 | (3) |
|
Single Machine Model Training |
|
|
234 | (1) |
|
Distributed Model Training |
|
|
235 | (1) |
|
|
236 | (1) |
|
|
237 | (4) |
|
|
237 | (2) |
|
Edge Computing Components and Processes |
|
|
239 | (1) |
|
|
240 | (1) |
|
|
240 | (1) |
|
|
241 | (3) |
|
|
244 | (3) |
|
Chapter 11 Measuring, Monitoring, and Troubleshooting Machine Learning Models |
|
|
247 | (22) |
|
Three Types of Machine Learning Algorithms |
|
|
248 | (7) |
|
|
248 | (5) |
|
|
253 | (1) |
|
|
254 | (1) |
|
|
254 | (1) |
|
|
255 | (2) |
|
Engineering Machine Learning Models |
|
|
257 | (6) |
|
Model Training and Evaluation |
|
|
257 | (5) |
|
Operationalizing ML Models |
|
|
262 | (1) |
|
Common Sources of Error in Machine Learning Models |
|
|
263 | (2) |
|
|
264 | (1) |
|
|
264 | (1) |
|
|
264 | (1) |
|
|
265 | (2) |
|
|
267 | (2) |
|
Chapter 12 Leveraging Prebuilt Models as a Service |
|
|
269 | (16) |
|
|
270 | (4) |
|
|
270 | (2) |
|
|
272 | (2) |
|
|
274 | (2) |
|
|
274 | (1) |
|
|
275 | (1) |
|
|
275 | (1) |
|
|
276 | (2) |
|
|
276 | (1) |
|
|
277 | (1) |
|
|
278 | (2) |
|
|
278 | (2) |
|
|
280 | (1) |
|
|
280 | (2) |
|
|
282 | (3) |
|
Appendix Answers to Review Questions |
|
|
285 | (22) |
|
Chapter 1 Selecting Appropriate Storage Technologies |
|
|
286 | (2) |
|
Chapter 2 Building and Operationalizing Storage Systems |
|
|
288 | (2) |
|
Chapter 3 Designing Data Pipelines |
|
|
290 | (1) |
|
Chapter 4 Designing a Data Processing Solution |
|
|
291 | (2) |
|
Chapter 5 Building and Operationalizing Processing Infrastructure |
|
|
293 | (2) |
|
Chapter 6 Designing for Security and Compliance |
|
|
295 | (1) |
|
Chapter 7 Designing Databases for Reliability, Scalability, and Availability |
|
|
296 | (2) |
|
Chapter 8 Understanding Data Operations for Flexibility and Portability |
|
|
298 | (1) |
|
Chapter 9 Deploying Machine Learning Pipelines |
|
|
299 | (2) |
|
Chapter 10 Choosing Training and Serving Infrastructure |
|
|
301 | (2) |
|
Chapter 11 Measuring, Monitoring, and Troubleshooting Machine Learning Models |
|
|
303 | (1) |
|
Chapter 12 Leveraging Prebuilt Models as a Service |
|
|
304 | (3) |
Index |
|
307 | |