Preface |
|
vii | |
|
1 Essentials of Cloud Architecture |
|
|
1 | (36) |
|
|
1 | (4) |
|
Considerations for the Cloud |
|
|
2 | (2) |
|
Three Benefits of the Cloud |
|
|
4 | (1) |
|
|
5 | (4) |
|
|
5 | (1) |
|
|
6 | (2) |
|
|
8 | (1) |
|
Azure Environment Organization |
|
|
9 | (3) |
|
|
12 | (2) |
|
Welcome to the Azure Portal |
|
|
14 | (10) |
|
Setting Up a Resource Group |
|
|
14 | (4) |
|
|
18 | (4) |
|
|
22 | (2) |
|
Basics of the Bioinformatics Workflow |
|
|
24 | (13) |
|
|
24 | (2) |
|
|
26 | (3) |
|
|
29 | (1) |
|
|
30 | (1) |
|
|
30 | (7) |
|
2 Organizing Genomics Data with Data Lakes |
|
|
37 | (32) |
|
Organizing Your Genomics Data |
|
|
38 | (5) |
|
Going for Bronze, Silver, and Gold |
|
|
38 | (2) |
|
Letting Your Bioinformatics Workflow Dictate Your Data Lake Organization |
|
|
40 | (2) |
|
Planning for -omics and Non-omics Data Together |
|
|
42 | (1) |
|
Creating a Data Lake with Azure Storage |
|
|
43 | (6) |
|
Blob Storage Versus Data Lake Storage |
|
|
48 | (1) |
|
Balancing Costs Versus Performance in Data Storage |
|
|
49 | (9) |
|
The Goldilocks Method of Storage Tiers |
|
|
49 | (1) |
|
|
50 | (8) |
|
Managing Access Inside the Lake |
|
|
58 | (5) |
|
Role-Based Access Control |
|
|
59 | (2) |
|
|
61 | (2) |
|
Azure Open Datasets for Genomics |
|
|
63 | (6) |
|
3 Querying Variant Data in SQL |
|
|
69 | (44) |
|
Building a Genomics Data Warehouse |
|
|
71 | (6) |
|
|
71 | (1) |
|
Data Warehouse Architecture for Genomics |
|
|
72 | (5) |
|
|
77 | (19) |
|
Creating an Azure Synapse Analytics Workspace |
|
|
77 | (4) |
|
Registering Services in Subscriptions |
|
|
81 | (3) |
|
Getting to Work in the Synapse Workspace |
|
|
84 | (3) |
|
|
87 | (2) |
|
|
89 | (4) |
|
Did Someone Say "Pool Party"? |
|
|
93 | (3) |
|
Connecting to More Data Sources |
|
|
96 | (4) |
|
|
100 | (8) |
|
Creating a Database in Azure SQL DB |
|
|
100 | (8) |
|
Relaxing at Your Genomics Data Lakehouse |
|
|
108 | (5) |
|
|
109 | (4) |
|
4 Orchestrating Data Movement and Transformation |
|
|
113 | (36) |
|
Creating Your Data Factory |
|
|
114 | (5) |
|
Getting Started with Data Movement |
|
|
119 | (30) |
|
Getting Data into Your Data Lake Using the Copy Data Tool |
|
|
119 | (2) |
|
Linking to NCBI's FTP Server |
|
|
121 | (9) |
|
Transforming Data Using Data Flows |
|
|
130 | (17) |
|
Building and Triggering Pipelines for Automation |
|
|
147 | (2) |
|
5 Azure Databricks (and Apache Spark) |
|
|
149 | (46) |
|
Introduction to Apache Spark and Databricks |
|
|
149 | (4) |
|
Setting Up an Azure Databricks Workspace |
|
|
153 | (12) |
|
Connecting Databricks to Your Data Lake |
|
|
162 | (3) |
|
Processing Variant Data with the Glow Package |
|
|
165 | (5) |
|
|
168 | (2) |
|
Automating Variant Data Processing |
|
|
170 | (17) |
|
Orchestrating a Databricks Notebook from Data Factory |
|
|
173 | (12) |
|
A Brief Interlude About Distributed File Formats |
|
|
185 | (2) |
|
Using Other Tools in Databricks |
|
|
187 | (8) |
|
Single-Node Bioinformatics Tools |
|
|
188 | (1) |
|
|
189 | (1) |
|
|
190 | (5) |
|
|
195 | (32) |
|
How to Scale Machine Learning Tasks |
|
|
195 | (2) |
|
Creating an Azure Machine Learning Workspace |
|
|
197 | (3) |
|
Training a Drug Sensitivity Model |
|
|
200 | (13) |
|
Creating a Compute Instance in Azure Machine Learning Studio |
|
|
201 | (2) |
|
|
203 | (6) |
|
Experimenting with Cluster-Based Training |
|
|
209 | (4) |
|
Automating Model Training with AutoML |
|
|
213 | (5) |
|
Explainable Machine Learning |
|
|
216 | (2) |
|
Using Azure Machine Learning Not for Machine Learning |
|
|
218 | (9) |
|
Performing Alignment in a Notebook |
|
|
218 | (1) |
|
Custom Docker Images for Bioinformatics |
|
|
219 | (8) |
|
7 High-Performance Computing and Other Compute Services |
|
|
227 | (38) |
|
Bring Your Own Pipeline (BYOP) |
|
|
228 | (2) |
|
|
228 | (2) |
|
|
230 | (8) |
|
Scaling Workloads with Cromwell |
|
|
231 | (7) |
|
|
238 | (20) |
|
Setting Up CycleCloud Clusters |
|
|
239 | (19) |
|
|
258 | (7) |
|
Alignment and Variant Calling with the msgen Package |
|
|
258 | (7) |
|
8 Deployment, Security, Compliance, and Potpourri |
|
|
265 | (38) |
|
Automating the Deployment of Cloud Resources |
|
|
265 | (10) |
|
|
266 | (1) |
|
Lifting Your Deployment with ARMs and Biceps |
|
|
266 | (9) |
|
|
275 | (5) |
|
|
275 | (3) |
|
Role-Based Access Controls and Access-Control Lists |
|
|
278 | (2) |
|
|
280 | (6) |
|
HIPAA, HITECH, and HITRUST |
|
|
281 | (3) |
|
|
284 | (2) |
|
|
286 | (7) |
|
|
286 | (2) |
|
Retail Pricing Versus Enterprise Agreements |
|
|
288 | (1) |
|
|
289 | (4) |
|
|
293 | (4) |
|
Please, Sir, Can I Have Some More (vCPUs)? |
|
|
295 | (2) |
|
|
297 | (6) |
Conclusion |
|
303 | (4) |
Index |
|
307 | |