Muutke küpsiste eelistusi

E-raamat: Genomics in the Azure Cloud

  • Formaat: 330 pages
  • Ilmumisaeg: 14-Nov-2022
  • Kirjastus: O'Reilly Media
  • Keel: eng
  • ISBN-13: 9781098139018
Teised raamatud teemal:
  • Formaat - PDF+DRM
  • Hind: 56,15 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 330 pages
  • Ilmumisaeg: 14-Nov-2022
  • Kirjastus: O'Reilly Media
  • Keel: eng
  • ISBN-13: 9781098139018
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This practical guide bridges the gap between general cloud computing architecture in Microsoft Azure and scientific computing for bioinformatics and genomics. You'll get a solid understanding of the architecture patterns and services that are offered in Azure and how they might be used in your bioinformatics practice. You'll get code examples that you can reuse for your specific needs. And you'll get plenty of concrete examples to illustrate how a given service is used in a bioinformatics context.

You'll also get valuable advice on how to:

  • Use enterprise platform services to easily scale your bioinformatics workloads
  • Organize, query, and analyze genomic data at scale
  • Build a genomics data lake and accompanying data warehouse
  • Use Azure Machine Learning to scale your model training, track model performance, and deploy winning models
  • Orchestrate and automate processing pipelines using Azure Data Factory and Databricks
  • Cloudify your organization's existing bioinformatics pipelines by moving your workflows to Azure high-performance compute services
  • And more

Preface vii
1 Essentials of Cloud Architecture
1(36)
Cloud Horsepower
1(4)
Considerations for the Cloud
2(2)
Three Benefits of the Cloud
4(1)
Types of Cloud Services
5(4)
Infrastructure Services
5(1)
Platform Services
6(2)
Software Services
8(1)
Azure Environment Organization
9(3)
Getting an Azure Account
12(2)
Welcome to the Azure Portal
14(10)
Setting Up a Resource Group
14(4)
Creating Resources
18(4)
Free Services
22(2)
Basics of the Bioinformatics Workflow
24(13)
Primary Analysis
24(2)
Secondary Analysis
26(3)
Tertiary Analysis
29(1)
Other Analyses
30(1)
Other File Formats
30(7)
2 Organizing Genomics Data with Data Lakes
37(32)
Organizing Your Genomics Data
38(5)
Going for Bronze, Silver, and Gold
38(2)
Letting Your Bioinformatics Workflow Dictate Your Data Lake Organization
40(2)
Planning for -omics and Non-omics Data Together
42(1)
Creating a Data Lake with Azure Storage
43(6)
Blob Storage Versus Data Lake Storage
48(1)
Balancing Costs Versus Performance in Data Storage
49(9)
The Goldilocks Method of Storage Tiers
49(1)
Genomics Data Lifecycle
50(8)
Managing Access Inside the Lake
58(5)
Role-Based Access Control
59(2)
Access-Control Lists
61(2)
Azure Open Datasets for Genomics
63(6)
3 Querying Variant Data in SQL
69(44)
Building a Genomics Data Warehouse
71(6)
Example: Lab Results
71(1)
Data Warehouse Architecture for Genomics
72(5)
Azure Synapse Analytics
77(19)
Creating an Azure Synapse Analytics Workspace
77(4)
Registering Services in Subscriptions
81(3)
Getting to Work in the Synapse Workspace
84(3)
Using Open Row Sets
87(2)
Creating External Tables
89(4)
Did Someone Say "Pool Party"?
93(3)
Connecting to More Data Sources
96(4)
Azure SQL DB
100(8)
Creating a Database in Azure SQL DB
100(8)
Relaxing at Your Genomics Data Lakehouse
108(5)
Efficient File Formats
109(4)
4 Orchestrating Data Movement and Transformation
113(36)
Creating Your Data Factory
114(5)
Getting Started with Data Movement
119(30)
Getting Data into Your Data Lake Using the Copy Data Tool
119(2)
Linking to NCBI's FTP Server
121(9)
Transforming Data Using Data Flows
130(17)
Building and Triggering Pipelines for Automation
147(2)
5 Azure Databricks (and Apache Spark)
149(46)
Introduction to Apache Spark and Databricks
149(4)
Setting Up an Azure Databricks Workspace
153(12)
Connecting Databricks to Your Data Lake
162(3)
Processing Variant Data with the Glow Package
165(5)
Exploring DataFrames
168(2)
Automating Variant Data Processing
170(17)
Orchestrating a Databricks Notebook from Data Factory
173(12)
A Brief Interlude About Distributed File Formats
185(2)
Using Other Tools in Databricks
187(8)
Single-Node Bioinformatics Tools
188(1)
Koalas
189(1)
Hail
190(5)
6 Azure Machine Learning
195(32)
How to Scale Machine Learning Tasks
195(2)
Creating an Azure Machine Learning Workspace
197(3)
Training a Drug Sensitivity Model
200(13)
Creating a Compute Instance in Azure Machine Learning Studio
201(2)
Datastores and Datasets
203(6)
Experimenting with Cluster-Based Training
209(4)
Automating Model Training with AutoML
213(5)
Explainable Machine Learning
216(2)
Using Azure Machine Learning Not for Machine Learning
218(9)
Performing Alignment in a Notebook
218(1)
Custom Docker Images for Bioinformatics
219(8)
7 High-Performance Computing and Other Compute Services
227(38)
Bring Your Own Pipeline (BYOP)
228(2)
Why Azure for HPC?
228(2)
Azure Batch
230(8)
Scaling Workloads with Cromwell
231(7)
Azure CycleCloud
238(20)
Setting Up CycleCloud Clusters
239(19)
Microsoft Genomics
258(7)
Alignment and Variant Calling with the msgen Package
258(7)
8 Deployment, Security, Compliance, and Potpourri
265(38)
Automating the Deployment of Cloud Resources
265(10)
Dev, Staging, and Prod
266(1)
Lifting Your Deployment with ARMs and Biceps
266(9)
Security Planning
275(5)
Azure Active Directory
275(3)
Role-Based Access Controls and Access-Control Lists
278(2)
Compliance
280(6)
HIPAA, HITECH, and HITRUST
281(3)
Azure Blueprints
284(2)
Cost Considerations
286(7)
Azure Pricing Calculator
286(2)
Retail Pricing Versus Enterprise Agreements
288(1)
Budgeting Examples
289(4)
Quota Problems
293(4)
Please, Sir, Can I Have Some More (vCPUs)?
295(2)
Getting General Support
297(6)
Conclusion 303(4)
Index 307
Dr. Colby T. Ford is a professional AI cloud architect, data scientist, and computational biologist who uses machine learning and distributed computing to solve problems in the fields of infectious diseases and human genomics. For the last 8+ years, he has been consulting for companies across industries, leading the conversation for digital transformation using artificial intelligence and cloud computing. He currently serves as the Principal of Life Sciences at BlueGranite, a top-tier Microsoft partner, and focuses on building cloud-based bioinformatics solutions in the Azure cloud. In academia, his research includes the use of large-scale machine learning architecture in the study of infectious disease genomics and rare human diseases. In addition to his consulting and academic career, Dr. Ford is a co-founder of a digital health startup that focuses on the use of wearable devices to help study neurological disorders. Given Dr. Ford's interdisciplinary education background and parallel experience in industry and academia, he has a unique viewpoint and approach to effectively solve genomics research problems with cutting-edge technologies previously only used in industry blended with methods previously only seen in academia.