Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Computational Genomics with R

4.00/5 (6 hinnangut Goodreads-ist)

Altuna Akalin

Formaat: 462 pages
Sari: Chapman & Hall/CRC Computational Biology Series
Ilmumisaeg: 16-Dec-2020
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9780429532764

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 57,19 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 462 pages
Sari: Chapman & Hall/CRC Computational Biology Series
Ilmumisaeg: 16-Dec-2020
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9780429532764

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Computational Genomics with R provides a starting point for beginners in genomic data analysis and also guides more advanced practitioners to sophisticated data analysis techniques in genomics. The book covers topics from R programming, to machine learning and statistics, to the latest genomic data analysis techniques. The text provides accessible information and explanations, always with the genomics context in the background. This also contains practical and well-documented examples in R so readers can analyze their data by simply reusing the code presented. As the field of computational genomics is interdisciplinary, it requires different starting points for people with different backgrounds. For example, a biologist might skip sections on basic genome biology and start with R programming, whereas a computer scientist might want to start with genome biology.

After reading:

You will have the basics of R and be able to dive right into specialized uses of R for computational genomics such as using Bioconductor packages. You will be familiar with statistics, supervised and unsupervised learning techniques that are important in data modeling, and exploratory analysis of high-dimensional data. You will understand genomic intervals and operations on them that are used for tasks such as aligned read counting and genomic feature annotation. You will know the basics of processing and quality checking high-throughput sequencing data. You will be able to do sequence analysis, such as calculating GC content for parts of a genome or finding transcription factor binding sites. You will know about visualization techniques used in genomics, such as heatmaps, meta-gene plots, and genomic track visualization. You will be familiar with analysis of different high-throughput sequencing data sets, such as RNA-seq, ChIP-seq, and BS-seq. You will know basic techniques for integrating and interpreting multi-omics datasets.

Altuna Akalin is a group leader and head of the Bioinformatics and Omics Data Science Platform at the Berlin Institute of Medical Systems Biology, Max Delbrück Center, Berlin. He has been developing computational methods for analyzing and integrating large-scale genomics data sets since 2002. He has published an extensive body of work in this area. The framework for this book grew out of the yearly computational genomics courses he has been organizing and teaching since 2015.

Arvustused

'This book provides a basic overview of computational tools developed in R for carrying out data analyses in genomics. It can be a valuable companion for anyone whowants to utilise the computational tools developed within the Bioconductor and R environments for education and research. This books main target audience are students of computational biology to get a first look at the diversity of machine learning methods. Thebook will also servewell biomedical researchers needing a guide to packages that can help them with the analysis of data that they encounter in their work.'

- Krzysztof Podgórski, International Statistical Review (2021) doi: 10.1111/insr.12453

Preface

About the Authors

xxi

1 Introduction to Genomics

(22)

1.1 Genes, DNA and central dogma

(5)

1.1.1 What is a genome?

(1)

1.1.2 What is a gene?

(2)

1.1.3 How are genes controlled? Transcriptional and posttranscriptional regulation

(1)

1.1.4 What does a gene look like?

(1)

1.2 Elements of gene regulation

(7)

1.2.1 Transcriptional regulation

(5)

1.2.2 Post-transcriptional regulation

(2)

1.3 Shaping the genome: DNA mutation

(2)

1.4 High-throughput experimental methods in genomics

(4)

1.4.1 The general idea behind high-throughput techniques

(1)

1.4.2 High-throughput sequencing

(3)

1.5 Visualization and data repositories for genomics

(4)

2 Introduction to R for Genomic Data Analysis

(44)

2.1 Steps of (genomic) data analysis

(3)

2.1.1 Data collection

(1)

2.1.2 Data quality check and cleaning

(1)

2.1.3 Data processing

(1)

2.1.4 Exploratory data analysis and modeling

(1)

2.1.5 Visualization and reporting

(1)

2.1.6 Why use R for genomics 2

(1)

2.2 Getting started with R

(3)

2.2.1 Installing packages

(1)

2.2.2 Installing packages in custom locations

(1)

2.2.3 Getting help on functions and packages

(1)

2.3 Computations in R

(1)

2.4 Data structures

(6)

2.4.1 Vectors

(1)

2.4.2 Matrices

(2)

2.4.3 Data frames

(1)

2.4.4 Lists

(1)

2.4.5 Factors

(1)

2.5 Data types

(1)

2.6 Reading and writing data

(2)

2.6.1 Reading large files

(1)

2.7 Plotting in R with base graphics

(5)

2.7.1 Combining multiple plots

(1)

2.7.2 Saving plots

(1)

2.8 Plotting in R with ggplotz

(5)

2.8.1 Combining multiple plots

(2)

2.8.2 ggplot2 and tidyverse

(1)

2.9 Functions and control structures (for, if/else, etc.)

(7)

2.9.1 User-defined functions

(1)

2.9.2 Loops and looping structures in R

(6)

2.10 Exercises

(11)

2.10.1 Computations in R

(1)

2.10.2 Data structures in R

(3)

2.10.3 Reading in and writing data out in R

(1)

2.10.4 Plotting in R

(3)

2.10.5 Functions and control structures (for, if/else, etc.)

(4)

3 Statistics for Genomics

(44)

3.1 How to summarize collection of data points: The idea behind statistical distributions

(12)

3.1.1 Describing the central tendency: Mean and median

(2)

3.1.2 Describing the spread: Measurements of variation

(5)

3.1.3 Precision of estimates: Confidence intervals

(5)

3.2 How to test for differences between samples

(10)

3.2.1 Randomization-based testing for difference of the means

(1)

3.2.2 Using t-test for difference of the means between two samples

(3)

3.2.3 Multiple testing correction

(3)

3.2.4 Moderated t-tests: Using information from multiple comparisons

(3)

3.3 Relationship between variables: Linear models and correlation

(17)

3.3.1 How to fit a line

(4)

3.3.2 How to estimate the error of the coefficients

(3)

3.3.3 Accuracy of the model

(3)

3.3.4 Regression with categorical variables

102

(2)

3.3.5 Regression pitfalls

104

(2)

3.4 Exercises

106

(5)

3.4.1 How to summarize collection of data points: The idea behind statistical distributions

106

(1)

3.4.2 How to test for differences in samples

107

(1)

3.4.3 Relationship between variables: Linear models and correlation

108

(3)

4 Exploratory Data Analysis with Unsupervised Machine Learning

111

(36)

4.1 Clustering: Grouping samples based on their similarity

111

(16)

4.1.1 Distance metrics

111

(6)

4.1.2 Hiearchical clustering

117

(2)

4.1.3 K-means clustering

119

(1)

4.1.4 How to choose "k", the number of clusters

120

(7)

4.2 Dimensionality reduction techniques: Visualizing complex data sets in 2D

127

(17)

4.2.1 Principal component analysis

127

(7)

4.2.2 Other matrix factorization methods for dimensionality reduction

134

(5)

4.2.3 Multi-dimensional scaling

139

(1)

4.2.4 t-Distributed Stochastic Neighbor Embedding (t-SNE)

140

(4)

4.3 Exercises

144

(3)

4.3.1 Clustering

144

(1)

4.3.2 Dimension reduction

145

(2)

5 Predictive Modeling with Supervised Machine Learning

147

(56)

5.1 How are machine learning models fit?

148

(1)

5.1.1 Machine learning vs. statistics

149

(1)

5.2 Steps in supervised machine learning

149

(1)

5.3 Use case: Disease subtype from genomics data

150

(1)

5.4 Data preprocessing

151

(5)

5.4.1 Data transformation

152

(2)

5.4.2 Filtering data and scaling

154

(1)

5.4.3 Dealing with missing values

155

(1)

5.5 Splitting the data

156

(2)

5.5.1 Holdout test dataset

156

(1)

5.5.2 Cross-validation

157

(1)

5.5.3 Bootstrap resampling

158

(1)

5.6 Predicting the subtype with k-nearest neighbors

158

(1)

5.7 Assessing the performance of our model

159

(5)

5.7.1 Receiver Operating Characteristic (ROC) curves

162

(2)

5.8 Model tuning and avoiding overfitting

164

(8)

5.8.1 Model complexity and bias variance trade-off

167

(3)

5.8.2 Data split strategies for model tuning and testing

170

(2)

5.9 Variable importance

172

(2)

5.10 How to deal with class imbalance

174

(1)

5.10.1 Sampling for class balance

174

(1)

5.10.2 Altering case weights

175

(1)

5.10.3 Selecting different classification score cutoffs

175

(1)

5.11 Dealing with correlated predictors

175

(1)

5.12 Trees and forests: Random forests in action

176

(4)

5.12.1 Decision trees

176

(1)

5.12.2 Trees to forests

177

(2)

5.12.3 Variable importance

179

(1)

5.13 Logistic regression and regularization

180

(8)

5.13.1 Regularization in order to avoid overfitting

184

(2)

5.13.2 Variable importance

186

(2)

5.14 Other supervised algorithms

188

(8)

5.14.1 Gradient boosting

188

(2)

5.14.2 Support Vector Machines (SVM)

190

(3)

5.14.3 Neural networks and deep versions of it

193

(2)

5.14.4 Ensemble learning

195

(1)

5.15 Predicting continuous variables: Regression with machine learning

196

(4)

5.15.1 Use case: Predicting age from DNA methylation

196

(1)

5.15.2 Reading and processing the data

197

(1)

5.15.3 Running random forest regression

198

(2)

5.16 Exercises

200

(3)

5.16.1 Classification

200

(1)

5.16.2 Regression

201

(2)

6 Operations on Genomic Intervals and Genome Arithmetic

203

(34)

6.1 Operations on genomic intervals with Genomi cRanges package

204

(10)

6.1.1 How to create and manipulate a GRanges object

204

(3)

6.1.2 Getting genomic regions into R as GRanges objects

207

(3)

6.1.3 Finding regions that do/do not overlap with another set of regions

210

(4)

6.2 Dealing with mapped high-throughput sequencing reads

214

(1)

6.2.1 Counting mapped reads for a set of regions

214

(1)

6.3 Dealing with continuous scores over the genome

215

(6)

6.3.1 Extracting subsections of Rle and RleList objects

218

(3)

6.4 Genomic intervals with more information: Summarized Experiment class

221

(4)

6.4.1 Create a SummarizedExperiment object

221

(1)

6.4.2 Subset and manipulate the SummarizedExperiment object

222

(3)

6.5 Visualizing and summarizing genomic intervals

225

(9)

6.5.1 Visualizing intervals on a locus of interest

225

(1)

6.5.2 Summaries of genomic intervals on multiple loci

226

(4)

6.5.3 Making karyograms and circos plots

230

(4)

6.6 Exercises

234

(3)

6.6.1 Operations on genomic intervals with the GenomicRanges package

234

(1)

6.6.2 Dealing with mapped high-throughput sequencing reads

235

(1)

6.6.3 Dealing with contiguous scores over the genome

235

(1)

6.6.4 Visualizing and summarizing genomic intervals

235

(2)

7 Quality Check, Processing and Alignment of High-throughput Sequencing Reads

237

(14)

7.1 FASTA and FASTQ formats

237

(2)

7.2 Quality check on sequencing reads

239

(4)

7.2.1 Sequence quality per base/cycle

239

(1)

7.2.2 Sequence content per base/cycle

240

(1)

7.2.3 Read frequency plot

241

(1)

7.2.4 Other quality metrics and QC tools

241

(2)

7.3 Filtering and trimming reads

243

(3)

7.4 Mapping/aligning reads to the genome

246

(2)

7.5 Further processing of aligned reads

248

(1)

7.6 Exercises

248

(3)

8 RNA-seq Analysis

251

(44)

8.1 What is gene expression?

251

(1)

8.2 Methods to detect gene expression

252

(1)

8.3 Gene expression analysis using high-throughput sequencing technologies

252

(38)

8.3.1 Processing raw data

253

(1)

8.3.2 Alignment

253

(1)

8.3.3 Quantification

254

(1)

8.3.4 Within sample normalization of the read counts

255

(1)

8.3.5 Computing different normalization schemes in R

256

(3)

8.3.6 Exploratory analysis of the read count table

259

(5)

8.3.7 Differential expression analysis

264

(9)

8.3.8 Functional enrichment analysis

273

(4)

8.3.9 Accounting for additional sources of variation

277

(13)

8.4 Other applications of RNA-seq

290

(1)

8.5 Exercises

291

(4)

8.5.1 Exploring the count tables

291

(1)

8.5.2 Differential expression analysis

292

(1)

8.5.3 Functional enrichment analysis

292

(1)

8.5.4 Removing unwanted variation from the expression data

293

(2)

9 ChIP-seq analysis

295

(72)

9.1 Regulatory protein-DNA interactions

295

(1)

9.2 Measuring protein-DNA interactions with ChIP-seq

296

(2)

9.3 Factors that affect ChIP-seq experiment and analysis quality

298

(3)

9.3.1 Antibody specificity

298

(1)

9.3.2 Sequencing depth

298

(1)

9.3.3 PCR duplication

299

(1)

9.3.4 Biological replicates

299

(1)

9.3.5 Control experiments

299

(2)

9.3.6 Using tagged proteins

301

(1)

9.4 Pre-processing ChIP data

301

(2)

9.4.1 Mapping of ChIP-seq data

301

(2)

9.5 ChIP quality control

303

(24)

9.5.1 The data

303

(1)

9.5.2 Sample clustering

304

(4)

9.5.3 Visualization in the genome browser

308

(4)

9.5.4 Plus and minus strand cross-correlation

312

(4)

9.5.5 GC bias quantification

316

(4)

9.5.6 Sequence read genomic distribution

320

(7)

9.6 Peak calling

327

(32)

9.6.1 Types of ChIP-seq experiments

327

(5)

9.6.2 Peak calling: Sharp peaks

332

(9)

9.6.3 Peak calling: Broad regions

341

(3)

9.6.4 Peak quality control

344

(12)

9.6.5 Peak annotation

356

(3)

9.7 Motif discovery

359

(4)

9.7.1 Motif comparison

361

(2)

9.8 What to do next?

363

(1)

9.9 Exercises

364

(3)

9.9.1 Quality control

364

(3)

10 DNA methylation analysis using bisulfite sequencing data

367

(26)

10.1 What is DNA methylation?

367

(1)

10.1.1 How DNA methylation is set?

368

(1)

10.1.2 How to measure DNA methylation with bisulfite sequencing

368

(1)

10.2 Analyzing DNA methylation data

368

(1)

10.3 Processing raw data and getting data into R

369

(1)

10.4 Data filtering and exploratory analysis

370

(9)

10.4.1 Reading methylation call files

370

(2)

10.4.2 Further quality check

372

(1)

10.4.3 Merging samples into a single table

373

(1)

10.4.4 Filtering CpGs

374

(2)

10.4.5 Clustering samples

376

(2)

10.4.6 Principal component analysis

378

(1)

10.5 Extracting interesting regions: Differential methylation and segmentation

379

(9)

10.5.1 Differential methylation

379

(5)

10.5.2 Methylation segmentation

384

(3)

10.5.3 Working with large files

387

(1)

10.6 Annotation of DMRs/DMCs and segments

388

(2)

10.6.1 Further annotation with genes or gene sets

390

(1)

10.7 Other R packages that can be used for methylation analysis

390

(1)

10.8 Exercises

390

(3)

10.8.1 Differential methylation

390

(1)

10.8.2 Methylome segmentation

391

(2)

11 Multi-omics Analysis

393

(32)

11.1 Use case: Multi-omics data from colorectal cancer

393

(6)

11.2 Latent variable models for multi-omics integration

399

(1)

11.3 Matrix factorization methods for unsupervised multi-omics data integration

400

(13)

11.3.1 Multiple factor analysis

401

(3)

11.3.2 Joint non-negative matrix factorization

404

(5)

11.3.3 iCluster

409

(4)

11.4 Clustering using latent factors

413

(3)

11.4.1 One-hot clustering

414

(1)

11.4.2 K-means clustering

415

(1)

11.5 Biological interpretation of latent factors

416

(6)

11.5.1 Inspection of feature weights in loading vectors

416

(2)

11.5.2 Making sense of factors using enrichment analysis

418

(2)

11.5.3 Interpretation using additional covariates

420

(2)

11.6 Exercises

422

(3)

11.6.1 Matrix factorization methods

422

(1)

11.6.2 Clustering using latent factors

423

(1)

11.6.3 Biological interpretation of latent factors

423

(2)

Bibliography

425

(12)

Index

437

Dr. Altuna Akalin is a bioinformatics scientist and the head of Bioinformatics and Omics Data Science Platform at the Berlin Institute of Medical Systems Biology, Max Delbrück Center in Berlin. He has been developing computational methods for analyzing and integrating large-scale genomics data sets since 2002. His interest is in using machine learning and statistics to uncover patterns related to important biological variables such as disease state and type. He has lived in the USA, Norway, Turkey, Japan, and Switzerland in order to pursue research work and education related to computational genomics.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97804295327646e.html

Märksõnad:

E-raamat: Computational Genomics with R

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv