Preface |
|
xv | |
Authors |
|
xxi | |
I Modern biology and multivariate analysis |
|
1 | (44) |
|
1 Multi-omics and biological systems |
|
|
3 | (8) |
|
1.1 Statistical approaches for reductionist or holistic analyses |
|
|
3 | (1) |
|
1.2 Multi-omics and multivariate analyses |
|
|
4 | (1) |
|
1.2.1 More than a 'scale up' of univariate analyses |
|
|
5 | (1) |
|
1.2.2 More than a fishing expedition |
|
|
5 | (1) |
|
1.3 Shifting the analysis paradigm |
|
|
5 | (1) |
|
1.4 Challenges with high-throughput data |
|
|
6 | (2) |
|
|
7 | (1) |
|
1.4.2 Multi-collinearity and ill-posed problems |
|
|
7 | (1) |
|
1.4.3 Zero values and missing values |
|
|
7 | (1) |
|
1.5 Challenges with multi-omics integration |
|
|
8 | (1) |
|
|
8 | (1) |
|
|
8 | (1) |
|
|
8 | (1) |
|
1.5.4 Expectations for analysis |
|
|
8 | (1) |
|
1.5.5 Variety of analytical frameworks |
|
|
9 | (1) |
|
|
9 | (2) |
|
|
11 | (8) |
|
2.1 The Problem guides the analysis |
|
|
11 | (1) |
|
|
12 | (2) |
|
2.2.1 What affects statistical power? |
|
|
12 | (1) |
|
|
12 | (1) |
|
2.2.3 Identify covariates and confounders |
|
|
13 | (1) |
|
2.2.4 Identify batch effects |
|
|
13 | (1) |
|
2.3 Data cleaning and pre-processing |
|
|
14 | (1) |
|
|
14 | (1) |
|
|
15 | (1) |
|
|
15 | (1) |
|
2.4 Analysis: Choose the right approach |
|
|
15 | (3) |
|
2.4.1 Descriptive statistics |
|
|
15 | (1) |
|
2.4.2 Exploratory statistics |
|
|
15 | (1) |
|
2.4.3 Inferential statistics |
|
|
16 | (1) |
|
2.4.4 Univariate or multivariate modelling? |
|
|
16 | (1) |
|
|
17 | (1) |
|
2.5 Conclusion and start the cycle again |
|
|
18 | (1) |
|
|
18 | (1) |
|
3 Key multivariate concepts and dimension reduction in mixOmics |
|
|
19 | (10) |
|
3.1 Measures of dispersion and association |
|
|
19 | (4) |
|
3.1.1 Random variables and biological variation |
|
|
19 | (1) |
|
|
20 | (1) |
|
|
20 | (1) |
|
|
21 | (1) |
|
3.1.5 Covariance and correlation in mixOmics context |
|
|
22 | (1) |
|
|
22 | (1) |
|
|
23 | (2) |
|
3.2.1 Matrix factorisation |
|
|
23 | (1) |
|
3.2.2 Factorisation with components and loading vectors |
|
|
24 | (1) |
|
3.2.3 Data visualisation using components |
|
|
24 | (1) |
|
|
25 | (2) |
|
|
26 | (1) |
|
|
26 | (1) |
|
|
26 | (1) |
|
3.3.4 Visualisation of the selected variables |
|
|
26 | (1) |
|
|
27 | (2) |
|
4 Choose the right method for the right question in mixOmics |
|
|
29 | (9) |
|
4.1 Types of analyses and methods |
|
|
29 | (4) |
|
4.1.1 Single or multiple omics analysis? |
|
|
29 | (1) |
|
4.1.2 N- or P-integration? |
|
|
30 | (1) |
|
4.1.3 Unsupervised or supervised analyses? |
|
|
31 | (1) |
|
4.1.4 Repeated measures analyses |
|
|
32 | (1) |
|
|
32 | (1) |
|
|
33 | (1) |
|
|
33 | (1) |
|
4.2.2 Microbiome data: A special case |
|
|
33 | (1) |
|
4.2.3 Genotype data: A special case |
|
|
33 | (1) |
|
4.2.4 Clinical variables that are categorical: A special case |
|
|
33 | (1) |
|
4.3 Types of biological questions |
|
|
34 | (3) |
|
4.3.1 A PCA type of question (one data set, unsupervised) |
|
|
34 | (1) |
|
4.3.2 A PLS type of question (two data sets, regression or unsupervised) |
|
|
34 | (1) |
|
4.3.3 A CCA type of question (two data sets, unsupervised) |
|
|
35 | (1) |
|
4.3.4 A PLS-DA type of question (one data set, classification) |
|
|
35 | (1) |
|
4.3.5 A multiblock PLS type of question (more than two data sets, supervised or unsupervised) |
|
|
36 | (1) |
|
4.3.6 An N-integration type of question (several data sets, supervised) |
|
|
36 | (1) |
|
4.3.7 A P-integration type of question (several studies of the same omit type, supervised or unsupervised) |
|
|
37 | (1) |
|
4.4 Examplar data sets in mixOmics |
|
|
37 | (1) |
|
|
37 | (1) |
|
4.A Appendix: Data transformations in mixOmics |
|
|
38 | (3) |
|
4.A.1 Multilevel decomposition |
|
|
38 | (1) |
|
4.A.2 Mixed-effect model context |
|
|
39 | (1) |
|
|
39 | (1) |
|
4.A.4 Example of multilevel decomposition in mixOmics |
|
|
40 | (1) |
|
4.B Centered log ratio transformation |
|
|
41 | (1) |
|
4.C Creating dummy variables |
|
|
42 | (3) |
II mixOmics under the hood |
|
45 | (48) |
|
5 Projection to latent structures |
|
|
47 | (12) |
|
5.1 PCA as a projection algorithm |
|
|
47 | (3) |
|
|
47 | (1) |
|
5.1.2 Calculating the components |
|
|
48 | (1) |
|
5.1.3 Meaning of the loading vectors |
|
|
49 | (1) |
|
5.1.4 Example using the 1 innerud data in mixOmics |
|
|
49 | (1) |
|
5.2 Singular Value Decomposition (SVD) |
|
|
50 | (4) |
|
|
50 | (2) |
|
|
52 | (2) |
|
5.2.3 Matrix approximation |
|
|
54 | (1) |
|
5.3 Non-linear Iterative Partial Least Squares (NIPALS) |
|
|
54 | (3) |
|
5.3.1 NIPALS pseudo algorithm |
|
|
55 | (1) |
|
|
55 | (1) |
|
|
56 | (1) |
|
|
57 | (1) |
|
5.4 Other matrix factorisation methods in mixOmics |
|
|
57 | (1) |
|
|
57 | (2) |
|
6 Visualisation for data integration |
|
|
59 | (20) |
|
6.1 Sample plots using components |
|
|
59 | (6) |
|
6.1.1 Example with PCA and plot Indiv |
|
|
59 | (1) |
|
6.1.2 Sample plot for the integration of two or more data sets |
|
|
60 | (3) |
|
6.1.3 Representing paired coordinates using plotArrow |
|
|
63 | (2) |
|
6.2 Variable plots using components and loading vectors |
|
|
65 | (10) |
|
|
65 | (1) |
|
6.2.2 Correlation circle plots |
|
|
66 | (3) |
|
|
69 | (1) |
|
|
70 | (3) |
|
6.2.5 Clustered Image Maps (CIM) |
|
|
73 | (1) |
|
|
74 | (1) |
|
|
75 | (1) |
|
6.A Appendix: Similarity matrix in relevance networks and CIM |
|
|
76 | (3) |
|
6.A.1 Pairwise variable associations for CCA |
|
|
76 | (1) |
|
6.A.2 Pairwise variable associations for PLS |
|
|
76 | (1) |
|
6.A.3 Constructing relevance networks and displaying CIM |
|
|
77 | (2) |
|
7 Performance assessment in multivariate analyses |
|
|
79 | (14) |
|
7.1 Main parameters to choose |
|
|
79 | (1) |
|
7.2 Performance assessment |
|
|
80 | (2) |
|
7.2.1 Training and testing: If we were rich |
|
|
80 | (1) |
|
7.2.2 Cross-validation: When we are poor |
|
|
81 | (1) |
|
|
82 | (4) |
|
7.3.1 Evaluation measures for regression |
|
|
82 | (1) |
|
7.3.2 Evaluation measures for classification |
|
|
83 | (1) |
|
7.3.3 Details of the tuning process |
|
|
83 | (3) |
|
7.4 Final model assessment |
|
|
86 | (1) |
|
7.4.1 Assessment of the performance |
|
|
86 | (1) |
|
7.4.2 Assessment of the signature |
|
|
86 | (1) |
|
|
87 | (3) |
|
7.5.1 Prediction of a continuous response |
|
|
87 | (1) |
|
7.5.2 Prediction of a categorical response |
|
|
88 | (2) |
|
7.5.3 Prediction is related to the number of components |
|
|
90 | (1) |
|
7.6 Summary and roadmap of analysis |
|
|
90 | (3) |
III mixOmics in action |
|
93 | (190) |
|
|
95 | (14) |
|
|
95 | (7) |
|
|
95 | (1) |
|
8.1.2 Filtering variables |
|
|
96 | (1) |
|
8.1.3 Centering and scaling the data |
|
|
96 | (4) |
|
8.1.4 Managing missing values |
|
|
100 | (1) |
|
8.1.5 Managing batch effects |
|
|
101 | (1) |
|
|
101 | (1) |
|
8.2 Get ready with the software |
|
|
102 | (1) |
|
|
102 | (1) |
|
|
102 | (1) |
|
|
102 | (1) |
|
|
103 | (1) |
|
|
103 | (1) |
|
8.3.1 Set the working directory |
|
|
103 | (1) |
|
8.3.2 Good coding practices |
|
|
104 | (1) |
|
|
104 | (2) |
|
|
104 | (1) |
|
8.4.2 Dependent variables |
|
|
104 | (1) |
|
8.4.3 Set up the outcome for supervised classification analyses |
|
|
105 | (1) |
|
|
106 | (1) |
|
8.5 Structure of the following chapters |
|
|
106 | (3) |
|
9 Principal Component Analysis (PCA) |
|
|
109 | (28) |
|
|
109 | (1) |
|
9.1.1 Biological questions |
|
|
109 | (1) |
|
9.1.2 Statistical point of view |
|
|
109 | (1) |
|
|
110 | (2) |
|
|
110 | (1) |
|
|
111 | (1) |
|
|
112 | (1) |
|
9.3.1 Center or scale the data? |
|
|
112 | (1) |
|
9.3.2 Number of components (choice of dimensions) |
|
|
112 | (1) |
|
9.3.3 Number of variables to select in sPCA |
|
|
113 | (1) |
|
|
113 | (1) |
|
9.5 Case study: Multidrug |
|
|
114 | (15) |
|
|
114 | (1) |
|
|
115 | (1) |
|
|
116 | (5) |
|
9.5.4 Example: Sparse PCA |
|
|
121 | (4) |
|
9.5.5 Example: Missing values imputation |
|
|
125 | (4) |
|
|
129 | (2) |
|
9.6.1 Additional processing steps |
|
|
129 | (1) |
|
9.6.2 Independent component analysis |
|
|
129 | (1) |
|
9.6.3 Incorporating biological information |
|
|
130 | (1) |
|
|
131 | (1) |
|
|
132 | (1) |
|
9.A Appendix: Non-linear Iterative Partial Least Squares |
|
|
132 | (1) |
|
9.A.1 Solving PCA with NIPALS |
|
|
132 | (1) |
|
9.A.2 Estimating missing values with NIPALS |
|
|
132 | (1) |
|
|
133 | (4) |
|
|
133 | (1) |
|
9.B.2 sPCA pseudo algorithm |
|
|
134 | (1) |
|
|
134 | (3) |
|
10 Projection to Latent Structure (PLS) |
|
|
137 | (40) |
|
|
137 | (1) |
|
10.1.1 Biological questions |
|
|
137 | (1) |
|
10.1.2 Statistical point of view |
|
|
137 | (1) |
|
|
138 | (4) |
|
10.2.1 Univariate PLS1 and multivariate PLS2 |
|
|
139 | (1) |
|
10.2.2 PLS deflation modes |
|
|
140 | (2) |
|
|
142 | (1) |
|
10.3 Input arguments and tuning |
|
|
142 | (2) |
|
10.3.1 The deflation mode |
|
|
142 | (1) |
|
10.3.2 The number of dimensions |
|
|
143 | (1) |
|
10.3.3 Number of variables to select |
|
|
143 | (1) |
|
|
144 | (1) |
|
|
144 | (1) |
|
|
144 | (1) |
|
10.5 Case study: Liver toxicity |
|
|
145 | (18) |
|
|
146 | (1) |
|
|
146 | (1) |
|
10.5.3 Example: PLS1 regression |
|
|
147 | (5) |
|
10.5.4 Example: PLS2 regression |
|
|
152 | (11) |
|
10.6 Take a detour: PLS2 regression for prediction |
|
|
163 | (2) |
|
|
165 | (2) |
|
10.7.1 Orthogonal projections to latent structures |
|
|
165 | (1) |
|
10.7.2 Redundancy analysis |
|
|
166 | (1) |
|
|
166 | (1) |
|
10.7.4 PLS path modelling |
|
|
166 | (1) |
|
10.7.5 Other sPLS variants |
|
|
167 | (1) |
|
|
167 | (1) |
|
|
168 | (1) |
|
10.A Appendix: PLS algorithm |
|
|
169 | (2) |
|
10.A.1 PLS Pseudo algorithm |
|
|
169 | (1) |
|
10.A.2 Convergence of the PLS iterative algorithm |
|
|
170 | (1) |
|
|
170 | (1) |
|
10.B Appendix: sparse PLS |
|
|
171 | (1) |
|
|
171 | (1) |
|
10.B.2 sparse PLS pseudo algorithm |
|
|
171 | (1) |
|
10.C Appendix: Tuning the number of components |
|
|
172 | (5) |
|
|
172 | (3) |
|
|
175 | (2) |
|
11 Canonical Correlation Analysis (CCA) A) |
|
|
177 | (24) |
|
|
177 | (1) |
|
11.1.1 Biological question |
|
|
177 | (1) |
|
11.1.2 Statistical point of view |
|
|
177 | (1) |
|
|
178 | (1) |
|
|
178 | (1) |
|
|
179 | (1) |
|
11.3 Input arguments and tuning |
|
|
179 | (1) |
|
|
179 | (1) |
|
|
180 | (1) |
|
|
180 | (1) |
|
|
180 | (1) |
|
|
181 | (1) |
|
11.5 Case study: Nutrimouse |
|
|
181 | (12) |
|
|
182 | (1) |
|
|
182 | (1) |
|
|
183 | (1) |
|
|
184 | (9) |
|
|
193 | (1) |
|
|
194 | (1) |
|
|
195 | (1) |
|
11.A Appendix: CCA and variants |
|
|
196 | (5) |
|
11.A.1 Solving classical CCA |
|
|
196 | (1) |
|
|
197 | (4) |
|
12 PLS-Discriminant Analysis (PLS-DA) |
|
|
201 | (32) |
|
|
201 | (1) |
|
12.1.1 Biological question |
|
|
201 | (1) |
|
12.1.2 Statistical point of view |
|
|
201 | (1) |
|
|
202 | (2) |
|
|
203 | (1) |
|
|
204 | (1) |
|
12.3 Input arguments and tuning |
|
|
204 | (2) |
|
|
204 | (1) |
|
|
205 | (1) |
|
12.3.3 Framework to manage overfitting |
|
|
205 | (1) |
|
|
206 | (1) |
|
|
207 | (1) |
|
|
207 | (1) |
|
|
207 | (19) |
|
|
208 | (1) |
|
|
208 | (1) |
|
|
209 | (5) |
|
|
214 | (9) |
|
12.5.5 Take a detour: Prediction |
|
|
223 | (2) |
|
12.5.6 AUROC outputs complement performance evaluation |
|
|
225 | (1) |
|
|
226 | (2) |
|
|
226 | (1) |
|
|
227 | (1) |
|
12.6.3 Other related methods and packages |
|
|
228 | (1) |
|
|
228 | (1) |
|
|
229 | (1) |
|
12.A Appendix: Prediction in PLS-DA |
|
|
229 | (4) |
|
12.A.1 Prediction distances |
|
|
229 | (2) |
|
|
231 | (2) |
|
|
233 | (28) |
|
13.1 Why use N-integration methods? |
|
|
233 | (1) |
|
13.1.1 Biological question |
|
|
233 | (1) |
|
13.1.2 Statistical point of view and analytical challenges |
|
|
234 | (1) |
|
|
234 | (3) |
|
13.2.1 Multiblock sPLS-DA |
|
|
234 | (2) |
|
13.2.2 Prediction in multiblock sPLS-DA |
|
|
236 | (1) |
|
13.3 Input arguments and tuning |
|
|
237 | (1) |
|
|
238 | (1) |
|
|
238 | (1) |
|
|
238 | (1) |
|
13.5 Case Study: breast . TCGA |
|
|
239 | (16) |
|
|
239 | (1) |
|
|
240 | (1) |
|
|
241 | (3) |
|
|
244 | (1) |
|
|
245 | (2) |
|
|
247 | (4) |
|
13.5.7 Model performance and prediction |
|
|
251 | (4) |
|
|
255 | (2) |
|
13.6.1 Additional data transformation for special cases |
|
|
255 | (1) |
|
13.6.2 Other N-integration frameworks in mixOmics |
|
|
255 | (1) |
|
13.6.3 Supervised classification analyses: concatenation and ensemble methods |
|
|
256 | (1) |
|
13.6.4 Unsupervised analyses: JIVE and MOFA |
|
|
256 | (1) |
|
|
257 | (1) |
|
13.8 Additional resources |
|
|
258 | (1) |
|
|
258 | (1) |
|
13.A Appendix: Generalised CCA and variants |
|
|
258 | (3) |
|
|
258 | (1) |
|
|
259 | (1) |
|
13.A.3 sparse multiblock sPLS-DA |
|
|
260 | (1) |
|
|
261 | (22) |
|
14.1 Why use P-integration methods? |
|
|
261 | (1) |
|
14.1.1 Biological question |
|
|
261 | (1) |
|
14.1.2 Statistical point of view |
|
|
261 | (1) |
|
|
262 | (2) |
|
|
262 | (1) |
|
14.2.2 Multi-group sPLS-DA |
|
|
263 | (1) |
|
14.3 Input arguments and tuning |
|
|
264 | (1) |
|
|
264 | (1) |
|
14.3.2 Number of components |
|
|
265 | (1) |
|
14.3.3 Number of variables to select per component |
|
|
265 | (1) |
|
|
265 | (1) |
|
|
265 | (1) |
|
|
266 | (1) |
|
14.5 Case Study: stemcells |
|
|
266 | (14) |
|
|
266 | (1) |
|
|
267 | (1) |
|
14.5.3 Example: MINT PLS-DA |
|
|
268 | (3) |
|
14.5.4 Example: MINT sPLS-DA |
|
|
271 | (6) |
|
|
277 | (3) |
|
14.6 Examples of application |
|
|
280 | (1) |
|
14.6.1 16S rRNA gene data |
|
|
280 | (1) |
|
14.6.2 Single cell transcriptomics |
|
|
280 | (1) |
|
|
280 | (1) |
|
|
280 | (3) |
Glossary of terms |
|
283 | (2) |
Key publications |
|
285 | (2) |
Bibliography |
|
287 | (12) |
Index |
|
299 | |