Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Multivariate Data Integration Using R: Methods and Applications with the mixOmics Package [Taylor & Francis e-raamat]

Kim-Anh Lź Cao, Zoe Marie Welham

Formaat: 298 pages, 16 Tables, black and white; 121 Line drawings, color; 121 Illustrations, color
Sari: Chapman & Hall/CRC Computational Biology Series
Ilmumisaeg: 09-Nov-2021
Kirjastus: Chapman & Hall/CRC
ISBN-13: 9781003026860

Teised raamatud teemal:

Taylor & Francis e-raamat
Hind: 133,87 €*
* hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
Tavahind: 191,24 €
Säästad 30%

Formaat: 298 pages, 16 Tables, black and white; 121 Line drawings, color; 121 Illustrations, color
Sari: Chapman & Hall/CRC Computational Biology Series
Ilmumisaeg: 09-Nov-2021
Kirjastus: Chapman & Hall/CRC
ISBN-13: 9781003026860

Teised raamatud teemal:

Rohkem infot Taylor & Francis e-raamatute kohta

Raamatu kodulehekülg: https://www.taylorfrancis.com/books/9781003026860

Large biological data, which are often noisy and high-dimensional, have become increasingly prevalent in biology and medicine. There is a real need for good training in statistics, from data exploration through to analysis and interpretation. This book provides an overview of statistical and dimension reduction methods for high-throughput biological data, with a specific focus on data integration. It starts with some biological background, key concepts underlying the multivariate methods, and then covers an array of methods implemented using the mixOmics package in R.

Features:

Provides a broad and accessible overview of methods for multi-omics data integration

Covers a wide range of multivariate methods, each designed to answer specific biological questions

Includes comprehensive visualisation techniques to aid in data interpretation

Includes many worked examples and case studies using real data

Includes reproducible R code for each multivariate method, using the mixOmics package

The book is suitable for researchers from a wide range of scientific disciplines wishing to apply these methods to obtain new and deeper insights into biological mechanisms and biomedical problems. The suite of tools introduced in this book will enable students and scientists to work at the interface between, and provide critical collaborative expertise to, biologists, bioinformaticians, statisticians and clinicians.

Preface

Authors

xxi

I Modern biology and multivariate analysis

(44)

1 Multi-omics and biological systems

(8)

1.1 Statistical approaches for reductionist or holistic analyses

(1)

1.2 Multi-omics and multivariate analyses

(1)

1.2.1 More than a 'scale up' of univariate analyses

(1)

1.2.2 More than a fishing expedition

(1)

1.3 Shifting the analysis paradigm

(1)

1.4 Challenges with high-throughput data

(2)

1.4.1 Overfitting

(1)

1.4.2 Multi-collinearity and ill-posed problems

(1)

1.4.3 Zero values and missing values

(1)

1.5 Challenges with multi-omics integration

(1)

1.5.1 Data heterogeneity

(1)

1.5.2 Data size

(1)

1.5.3 Platforms

(1)

1.5.4 Expectations for analysis

(1)

1.5.5 Variety of analytical frameworks

(1)

1.6 Summary

(2)

2 The cycle of analysis

(8)

2.1 The Problem guides the analysis

(1)

2.2 Plan in advance

(2)

2.2.1 What affects statistical power?

(1)

2.2.2 Sample size

(1)

2.2.3 Identify covariates and confounders

(1)

2.2.4 Identify batch effects

(1)

2.3 Data cleaning and pre-processing

(1)

2.3.1 Normalisation

(1)

2.3.2 Filtering

(1)

2.3.3 Missing values

(1)

2.4 Analysis: Choose the right approach

(3)

2.4.1 Descriptive statistics

(1)

2.4.2 Exploratory statistics

(1)

2.4.3 Inferential statistics

(1)

2.4.4 Univariate or multivariate modelling?

(1)

2.4.5 Prediction

(1)

2.5 Conclusion and start the cycle again

(1)

2.6 Summary

(1)

3 Key multivariate concepts and dimension reduction in mixOmics

(10)

3.1 Measures of dispersion and association

(4)

3.1.1 Random variables and biological variation

(1)

3.1.2 Variance

(1)

3.1.3 Covariance

(1)

3.1.4 Correlation

(1)

3.1.5 Covariance and correlation in mixOmics context

(1)

3.1.6 R examples

(1)

3.2 Dimension reduction

(2)

3.2.1 Matrix factorisation

(1)

3.2.2 Factorisation with components and loading vectors

(1)

3.2.3 Data visualisation using components

(1)

3.3 Variable selection

(2)

3.3.1 Ridge penalty

(1)

3.3.2 Lasso penalty

(1)

3.3.3 Elastic net

(1)

3.3.4 Visualisation of the selected variables

(1)

3.4 Summary

(2)

4 Choose the right method for the right question in mixOmics

(9)

4.1 Types of analyses and methods

(4)

4.1.1 Single or multiple omics analysis?

(1)

4.1.2 N- or P-integration?

(1)

4.1.3 Unsupervised or supervised analyses?

(1)

4.1.4 Repeated measures analyses

(1)

4.1.5 Compositional data

(1)

4.2 Types of data

(1)

4.2.1 Classical omics

(1)

4.2.2 Microbiome data: A special case

(1)

4.2.3 Genotype data: A special case

(1)

4.2.4 Clinical variables that are categorical: A special case

(1)

4.3 Types of biological questions

(3)

4.3.1 A PCA type of question (one data set, unsupervised)

(1)

4.3.2 A PLS type of question (two data sets, regression or unsupervised)

(1)

4.3.3 A CCA type of question (two data sets, unsupervised)

(1)

4.3.4 A PLS-DA type of question (one data set, classification)

(1)

4.3.5 A multiblock PLS type of question (more than two data sets, supervised or unsupervised)

(1)

4.3.6 An N-integration type of question (several data sets, supervised)

(1)

4.3.7 A P-integration type of question (several studies of the same omit type, supervised or unsupervised)

(1)

4.4 Examplar data sets in mixOmics

(1)

4.5 Summary

(1)

4.A Appendix: Data transformations in mixOmics

(3)

4.A.1 Multilevel decomposition

(1)

4.A.2 Mixed-effect model context

(1)

4.A.3 Split-up variation

(1)

4.A.4 Example of multilevel decomposition in mixOmics

(1)

4.B Centered log ratio transformation

(1)

4.C Creating dummy variables

(3)

II mixOmics under the hood

(48)

5 Projection to latent structures

(12)

5.1 PCA as a projection algorithm

(3)

5.1.1 Overview

(1)

5.1.2 Calculating the components

(1)

5.1.3 Meaning of the loading vectors

(1)

5.1.4 Example using the 1 innerud data in mixOmics

(1)

5.2 Singular Value Decomposition (SVD)

(4)

5.2.1 SVD algorithm

(2)

5.2.2 Example in R

(2)

5.2.3 Matrix approximation

(1)

5.3 Non-linear Iterative Partial Least Squares (NIPALS)

(3)

5.3.1 NIPALS pseudo algorithm

(1)

5.3.2 Local regressions

(1)

5.3.3 Deflation

(1)

5.3.4 Missing values

(1)

5.4 Other matrix factorisation methods in mixOmics

(1)

5.5 Summary

(2)

6 Visualisation for data integration

(20)

6.1 Sample plots using components

(6)

6.1.1 Example with PCA and plot Indiv

(1)

6.1.2 Sample plot for the integration of two or more data sets

(3)

6.1.3 Representing paired coordinates using plotArrow

(2)

6.2 Variable plots using components and loading vectors

(10)

6.2.1 Loading plots

(1)

6.2.2 Correlation circle plots

(3)

6.2.3 Biplots

(1)

6.2.4 Relevance networks

(3)

6.2.5 Clustered Image Maps (CIM)

(1)

6.2.6 Circos plots

(1)

6.3 Summary

(1)

6.A Appendix: Similarity matrix in relevance networks and CIM

(3)

6.A.1 Pairwise variable associations for CCA

(1)

6.A.2 Pairwise variable associations for PLS

(1)

6.A.3 Constructing relevance networks and displaying CIM

(2)

7 Performance assessment in multivariate analyses

(14)

7.1 Main parameters to choose

(1)

7.2 Performance assessment

(2)

7.2.1 Training and testing: If we were rich

(1)

7.2.2 Cross-validation: When we are poor

(1)

7.3 Performance measures

(4)

7.3.1 Evaluation measures for regression

(1)

7.3.2 Evaluation measures for classification

(1)

7.3.3 Details of the tuning process

(3)

7.4 Final model assessment

(1)

7.4.1 Assessment of the performance

(1)

7.4.2 Assessment of the signature

(1)

7.5 Prediction

(3)

7.5.1 Prediction of a continuous response

(1)

7.5.2 Prediction of a categorical response

(2)

7.5.3 Prediction is related to the number of components

(1)

7.6 Summary and roadmap of analysis

(3)

III mixOmics in action

(190)

8 mixOmics: Get started

(14)

8.1 Prepare the data

(7)

8.1.1 Normalisation

(1)

8.1.2 Filtering variables

(1)

8.1.3 Centering and scaling the data

(4)

8.1.4 Managing missing values

100

(1)

8.1.5 Managing batch effects

101

(1)

8.1.6 Data format

101

(1)

8.2 Get ready with the software

102

(1)

8.2.1 R installation

102

(1)

8.2.2 Pre-requisites

102

(1)

8.2.3 mixOmics download

102

(1)

8.2.4 Load the package

103

(1)

8.3 Coding practices

103

(1)

8.3.1 Set the working directory

103

(1)

8.3.2 Good coding practices

104

(1)

8.4 Upload data

104

(2)

8.4.1 Data sets

104

(1)

8.4.2 Dependent variables

104

(1)

8.4.3 Set up the outcome for supervised classification analyses

105

(1)

8.4.4 Check data upload

106

(1)

8.5 Structure of the following chapters

106

(3)

9 Principal Component Analysis (PCA)

109

(28)

9.1 Why use PCA?

109

(1)

9.1.1 Biological questions

109

(1)

9.1.2 Statistical point of view

109

(1)

9.2 Principle

110

(2)

9.2.1 PCA

110

(1)

9.2.2 Sparse PCA

111

(1)

9.3 Input arguments

112

(1)

9.3.1 Center or scale the data?

112

(1)

9.3.2 Number of components (choice of dimensions)

112

(1)

9.3.3 Number of variables to select in sPCA

113

(1)

9.4 Key outputs

113

(1)

9.5 Case study: Multidrug

114

(15)

9.5.1 Load the data

114

(1)

9.5.2 Quick start

115

(1)

9.5.3 Example: PCA

116

(5)

9.5.4 Example: Sparse PCA

121

(4)

9.5.5 Example: Missing values imputation

125

(4)

9.6 To go further

129

(2)

9.6.1 Additional processing steps

129

(1)

9.6.2 Independent component analysis

129

(1)

9.6.3 Incorporating biological information

130

(1)

9.7 FAQ

131

(1)

9.8 Summary

132

(1)

9.A Appendix: Non-linear Iterative Partial Least Squares

132

(1)

9.A.1 Solving PCA with NIPALS

132

(1)

9.A.2 Estimating missing values with NIPALS

132

(1)

9.B Appendix: sparse PCA

133

(4)

9.B.1 sparse PCA-SVD

133

(1)

9.B.2 sPCA pseudo algorithm

134

(1)

9.B.3 Other sPCA methods

134

(3)

10 Projection to Latent Structure (PLS)

137

(40)

10.1 Why use PLS?

137

(1)

10.1.1 Biological questions

137

(1)

10.1.2 Statistical point of view

137

(1)

10.2 Principle

138

(4)

10.2.1 Univariate PLS1 and multivariate PLS2

139

(1)

10.2.2 PLS deflation modes

140

(2)

10.2.3 sparse PLS

142

(1)

10.3 Input arguments and tuning

142

(2)

10.3.1 The deflation mode

142

(1)

10.3.2 The number of dimensions

143

(1)

10.3.3 Number of variables to select

143

(1)

10.4 Key outputs

144

(1)

10.4.1 Graphical outputs

144

(1)

10.4.2 Numerical outputs

144

(1)

10.5 Case study: Liver toxicity

145

(18)

10.5.1 Load the data

146

(1)

10.5.2 Quick start

146

(1)

10.5.3 Example: PLS1 regression

147

(5)

10.5.4 Example: PLS2 regression

152

(11)

10.6 Take a detour: PLS2 regression for prediction

163

(2)

10.7 To go further

165

(2)

10.7.1 Orthogonal projections to latent structures

165

(1)

10.7.2 Redundancy analysis

166

(1)

10.7.3 Group PLS

166

(1)

10.7.4 PLS path modelling

166

(1)

10.7.5 Other sPLS variants

167

(1)

10.8 FAQ

167

(1)

10.9 Summary

168

(1)

10.A Appendix: PLS algorithm

169

(2)

10.A.1 PLS Pseudo algorithm

169

(1)

10.A.2 Convergence of the PLS iterative algorithm

170

(1)

10.A.3 PLS-SVD method

170

(1)

10.B Appendix: sparse PLS

171

(1)

10.B.1 sparse PLS-SVD

171

(1)

10.B.2 sparse PLS pseudo algorithm

171

(1)

10.C Appendix: Tuning the number of components

172

(5)

10.C.1 In PLS1

172

(3)

10.C.2 In PLS2

175

(2)

11 Canonical Correlation Analysis (CCA) A)

177

(24)

11.1 Why use CCA?

177

(1)

11.1.1 Biological question

177

(1)

11.1.2 Statistical point of view

177

(1)

11.2 Principle

178

(1)

11.2.1 CCA

178

(1)

11.2.2 rCCA

179

(1)

11.3 Input arguments and tuning

179

(1)

11.3.1 CCA

179

(1)

11.3.2 rCCA

180

(1)

11.4 Key outputs

180

(1)

11.4.1 Graphical outputs

180

(1)

11.4.2 Numerical outputs

181

(1)

11.5 Case study: Nutrimouse

181

(12)

11.5.1 Load the data

182

(1)

11.5.2 Quick start

182

(1)

11.5.3 Example: CCA

183

(1)

11.5.4 Example: rCCA

184

(9)

11.6 To go further

193

(1)

11.7 FAQ

194

(1)

11.8 Summary

195

(1)

11.A Appendix: CCA and variants

196

(5)

11.A.1 Solving classical CCA

196

(1)

11.A.2 Regularised CCA

197

(4)

12 PLS-Discriminant Analysis (PLS-DA)

201

(32)

12.1 Why use PLS-DA?

201

(1)

12.1.1 Biological question

201

(1)

12.1.2 Statistical point of view

201

(1)

12.2 Principle

202

(2)

12.2.1 PLS-DA

203

(1)

12.2.2 sparse PLS-DA

204

(1)

12.3 Input arguments and tuning

204

(2)

12.3.1 PLS-DA

204

(1)

12.3.2 sPLS-DA

205

(1)

12.3.3 Framework to manage overfitting

205

(1)

12.4 Key outputs

206

(1)

12.4.1 Numerical outputs

207

(1)

12.4.2 Graphical outputs

207

(1)

12.5 Case study: SRBCT

207

(19)

12.5.1 Load the data

208

(1)

12.5.2 Quick start

208

(1)

12.5.3 Example: PLS-DA

209

(5)

12.5.4 Example: sPLS-DA

214

(9)

12.5.5 Take a detour: Prediction

223

(2)

12.5.6 AUROC outputs complement performance evaluation

225

(1)

12.6 To go further

226

(2)

12.6.1 Microbiome

226

(1)

12.6.2 Multilevel

227

(1)

12.6.3 Other related methods and packages

228

(1)

12.7 FAQ

228

(1)

12.8 Summary

229

(1)

12.A Appendix: Prediction in PLS-DA

229

(4)

12.A.1 Prediction distances

229

(2)

12.A.2 Background area

231

(2)

13 N-data integration

233

(28)

13.1 Why use N-integration methods?

233

(1)

13.1.1 Biological question

233

(1)

13.1.2 Statistical point of view and analytical challenges

234

(1)

13.2 Principle

234

(3)

13.2.1 Multiblock sPLS-DA

234

(2)

13.2.2 Prediction in multiblock sPLS-DA

236

(1)

13.3 Input arguments and tuning

237

(1)

13.4 Key outputs

238

(1)

13.4.1 Graphical outputs

238

(1)

13.4.2 Numerical outputs

238

(1)

13.5 Case Study: breast . TCGA

239

(16)

13.5.1 Load the data

239

(1)

13.5.2 Quick start

240

(1)

13.5.3 Parameter choice

241

(3)

13.5.4 Final model

244

(1)

13.5.5 Sample plots

245

(2)

13.5.6 Variable plots

247

(4)

13.5.7 Model performance and prediction

251

(4)

13.6 To go further

255

(2)

13.6.1 Additional data transformation for special cases

255

(1)

13.6.2 Other N-integration frameworks in mixOmics

255

(1)

13.6.3 Supervised classification analyses: concatenation and ensemble methods

256

(1)

13.6.4 Unsupervised analyses: JIVE and MOFA

256

(1)

13.7 FAQ

257

(1)

13.8 Additional resources

258

(1)

13.9 Summary

258

(1)

13.A Appendix: Generalised CCA and variants

258

(3)

13.A.1 regularised GCCA

258

(1)

13.A.2 sparse GCCA

259

(1)

13.A.3 sparse multiblock sPLS-DA

260

(1)

14 P-data integration

261

(22)

14.1 Why use P-integration methods?

261

(1)

14.1.1 Biological question

261

(1)

14.1.2 Statistical point of view

261

(1)

14.2 Principle

262

(2)

14.2.1 Motivation

262

(1)

14.2.2 Multi-group sPLS-DA

263

(1)

14.3 Input arguments and tuning

264

(1)

14.3.1 Data input checks

264

(1)

14.3.2 Number of components

265

(1)

14.3.3 Number of variables to select per component

265

(1)

14.4 Key outputs

265

(1)

14.4.1 Graphical outputs

265

(1)

14.4.2 Numerical outputs

266

(1)

14.5 Case Study: stemcells

266

(14)

14.5.1 Load the data

266

(1)

14.5.2 Quick start

267

(1)

14.5.3 Example: MINT PLS-DA

268

(3)

14.5.4 Example: MINT sPLS-DA

271

(6)

14.5.5 Take a detour

277

(3)

14.6 Examples of application

280

(1)

14.6.1 16S rRNA gene data

280

(1)

14.6.2 Single cell transcriptomics

280

(1)

14.7 To go further

280

(1)

14.8 Summary

280

(3)

Glossary of terms

283

(2)

Key publications

285

(2)

Bibliography

287

(12)

Index

299

Dr Kim-Anh Lź Cao develops novel methods, software and tools to interpret big biological data and answer research questions efficiently. She is committed to statistical education to instill best analytical practice and has taught numerous statistical workshops for biologists and leads collaborative projects in medicine, fundamental biology or microbiology disciplines. Dr Kim-Anh Lź Cao has a mathematical engineering background and graduated with a PhD in Statistics from the Université de Toulouse, France. She then moved to Australia first as a biostatistician consultant at QFAB Bioinformatics, then as a research group leader at the biomedical University of Queensland Diamantina Institute. She currently is Associate Professor in Statistical Genomics at the University of Melbourne. In 2019, Kim-Anh received the Australian Academy of Sciences Moran Medal for her contributions to Applied Statistics in multidisciplinary collaborations. She has been part of leadership program for women in STEMM, including the international Homeward Bound which culminated in a trip to Antarctica, and Superstars of STEM from Science Technology Australia.

Zoe Welham completed a BSc in molecular biology and during this time developed a keen interest in the analysis of big data. She completed a Masters of Bioinformatics with a focus on the statistical integration of different omics data in bowel cancer. She is currently a PhD candidate at the Kolling Institute in Sydney where she is furthering her research into bowel cancer with a focus on integrating microbiome data with other omics to characterise early bowel polyps. Her research interests include bioinformatics and biostatistics for many areas of biology and disseminating that information to the general public through reader-friendly writing.

Püsilink: https://www.kriso.ee/db/9781003026860_pe.html

Märksõnad:

E-raamat: Multivariate Data Integration Using R: Methods and Applications with the mixOmics Package [Taylor & Francis e-raamat]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Kirjastuste teemad

Vali ostukorv