Preface |
|
vii | |
|
|
1 | (11) |
|
1.1 The responsive gene discovery problem |
|
|
2 | (1) |
|
1.2 The peptide function discovery problem |
|
|
3 | (1) |
|
1.3 The molecular interaction discovery problem |
|
|
4 | (1) |
|
1.4 The spectral molecular discovery problem |
|
|
5 | (1) |
|
1.5 The whole-genome pattern discovery problem |
|
|
6 | (1) |
|
1.6 The global optimisation pattern discovery problem |
|
|
7 | (1) |
|
|
8 | (4) |
|
2 Responsive Gene Discovery |
|
|
12 | (65) |
|
2.1 A biological question --- essential gene discovery |
|
|
13 | (2) |
|
|
15 | (22) |
|
2.2.1 The histogram approach |
|
|
16 | (6) |
|
2.2.2 The parametric approach |
|
|
22 | (4) |
|
2.2.3 The non-parametric approach |
|
|
26 | (1) |
|
2.2.3.1 The kernel method |
|
|
26 | (2) |
|
2.2.3.2 The K-nearest neighbour approach |
|
|
28 | (2) |
|
2.2.4 The semi-parametric approach |
|
|
30 | (1) |
|
2.2.4.1 The Gaussian mixture |
|
|
30 | (4) |
|
2.2.4.2 The Gamma mixture |
|
|
34 | (2) |
|
2.2.5 The multivariate density estimation |
|
|
36 | (1) |
|
|
37 | (28) |
|
2.3.1 The hierarchical cluster analysis algorithm |
|
|
39 | (8) |
|
2.3.2 The K-means cluster analysis algorithm |
|
|
47 | (11) |
|
2.3.3 The fuzzy C-means cluster analysis algorithm |
|
|
58 | (2) |
|
2.3.4 The mixture model cluster analysis algorithm |
|
|
60 | (4) |
|
2.3.5 The other clustering algorithms |
|
|
64 | (1) |
|
2.4 The gene essentiality pattern discovery problem |
|
|
65 | (10) |
|
|
65 | (2) |
|
2.4.2 The properties of the transposon statistics |
|
|
67 | (4) |
|
2.4.3 Gene essentiality pattern discovery using univariate models |
|
|
71 | (1) |
|
2.4.4 Gene essentiality pattern discovery using multivariate models |
|
|
72 | (1) |
|
2.4.4.1 The multi-statistics multivariate model |
|
|
73 | (1) |
|
2.4.4.2 The multi-replicate multivariate model |
|
|
74 | (1) |
|
|
75 | (2) |
|
3 Protease Cleavage Pattern Discovery |
|
|
77 | (83) |
|
3.1 A biology question --- protease cleavage |
|
|
78 | (1) |
|
3.2 The linear discriminant analysis algorithm |
|
|
79 | (14) |
|
3.2.1 The definition and working principle of LDA |
|
|
80 | (1) |
|
3.2.2 The projection direction optimisation |
|
|
81 | (3) |
|
3.2.3 The formulation of LDA |
|
|
84 | (3) |
|
3.2.4 Making decision using the Bayes rule for a LDA model |
|
|
87 | (2) |
|
3.2.5 The R function for LDA |
|
|
89 | (4) |
|
3.3 The other analytic discriminant analysis algorithms |
|
|
93 | (5) |
|
3.3.1 The quadratic discriminant analysis algorithm |
|
|
93 | (2) |
|
3.3.2 The Naive Bayes algorithm |
|
|
95 | (1) |
|
3.3.3 The logistic regression algorithm |
|
|
95 | (2) |
|
3.3.4 The Bayesian linear discriminant analysis |
|
|
97 | (1) |
|
3.4 Evaluation and generalisation of a supervised machine learning model |
|
|
98 | (11) |
|
|
98 | (3) |
|
3.4.2 Receiver operating characteristic analysis |
|
|
101 | (5) |
|
|
106 | (3) |
|
|
109 | (5) |
|
|
114 | (44) |
|
3.6.1 Multi-layer perceptron |
|
|
115 | (1) |
|
3.6.1.1 The structure of MLP |
|
|
115 | (2) |
|
3.6.1.2 The learning mechanism of MLP |
|
|
117 | (2) |
|
3.6.1.3 From SLP (LDA) to MLP |
|
|
119 | (1) |
|
3.6.1.4 The R packages for MLP |
|
|
120 | (2) |
|
3.6.2 Radial basis function neural network |
|
|
122 | (2) |
|
3.6.3 The bio-basis function neural network algorithm |
|
|
124 | (2) |
|
3.6.3.1 The bio-basis function neural network algorithm |
|
|
126 | (2) |
|
3.6.3.2 The Bayesian BBFNN algorithm |
|
|
128 | (3) |
|
3.6.3.3 The orthogonal kernel machine |
|
|
131 | (1) |
|
3.6.4 The support vector machine algorithm |
|
|
132 | (5) |
|
3.6.5 The relevance vector machine algorithm |
|
|
137 | (2) |
|
3.6.6 Deep neural network |
|
|
139 | (2) |
|
|
141 | (1) |
|
3.6.7.1 The working principle of inductive learning |
|
|
142 | (3) |
|
3.6.7.2 The purity measurements |
|
|
145 | (2) |
|
3.6.7.3 The classification and regression tree algorithm |
|
|
147 | (3) |
|
3.6.7.4 The C50 algorithm |
|
|
150 | (2) |
|
3.6.7.5 Seeds classification |
|
|
152 | (1) |
|
3.6.7.6 Factor Xa protease cleavage data classification |
|
|
153 | (1) |
|
3.6.8 The random forest algorithm |
|
|
154 | (4) |
|
|
158 | (2) |
|
4 Genetic-Epigenetic Interplay Discovery |
|
|
160 | (49) |
|
4.1 A biological question --- the genetic-epigenetic interplay pattern discovery problem |
|
|
161 | (1) |
|
|
162 | (5) |
|
4.3 The ordinary linear regression analysis algorithm |
|
|
167 | (12) |
|
4.3.1 The least squared error approach |
|
|
167 | (3) |
|
4.3.2 Assess the fitness of a regression model |
|
|
170 | (3) |
|
4.3.3 The significance analysis of regression coefficients |
|
|
173 | (2) |
|
4.3.4 The regression model confidence bands |
|
|
175 | (1) |
|
4.3.5 R function for ordinary linear regression analysis |
|
|
175 | (4) |
|
4.4 The generalised additive model algorithm |
|
|
179 | (4) |
|
4.5 The Bayesian linear regression algorithm |
|
|
183 | (1) |
|
4.6 The constrained regression analysis algorithms |
|
|
184 | (7) |
|
4.6.1 The ridge linear regression algorithm |
|
|
185 | (2) |
|
4.6.2 The Lasso linear regression algorithm |
|
|
187 | (3) |
|
4.6.3 The elastic net linear regression algorithm |
|
|
190 | (1) |
|
4.7 Ranking variables using the vip package |
|
|
191 | (1) |
|
4.8 The nonlinear regression analysis algorithms |
|
|
192 | (3) |
|
4.9 Epigenetic-genetic interplay pattern discovery |
|
|
195 | (12) |
|
4.9.1 Methylation site to gene --- the M2E models |
|
|
197 | (6) |
|
4.9.2 Gene to methylation site association --- E2M models |
|
|
203 | (4) |
|
|
207 | (2) |
|
5 Spectral Pattern Discovery |
|
|
209 | (36) |
|
|
210 | (1) |
|
5.2 Introduction of baseline estimation approaches |
|
|
210 | (2) |
|
5.3 The Whittaker-Henderson algorithm |
|
|
212 | (6) |
|
|
218 | (2) |
|
5.5 The adaptive iterative reweighted penalised least square smoother |
|
|
220 | (1) |
|
5.6 The asymmetric least square smoother |
|
|
221 | (3) |
|
5.7 The Bayesian Whittaker-Henderson algorithm |
|
|
224 | (11) |
|
5.7.1 The working principle of BWH |
|
|
224 | (4) |
|
5.7.2 The smoothing of the extracted peak spectrum |
|
|
228 | (1) |
|
5.7.3 The generation of the merged and unique peaks |
|
|
229 | (3) |
|
5.7.4 The fitness of a BWH model |
|
|
232 | (1) |
|
5.7.5 Aligning peaks for replicated spectra |
|
|
232 | (3) |
|
5.8 Analyse the milk spectra data |
|
|
235 | (3) |
|
5.9 Analyse the bacterial and macrophage data |
|
|
238 | (5) |
|
|
243 | (2) |
|
6 Gene Expression Pattern Discovery |
|
|
245 | (66) |
|
6.1 Differentially expressed genes |
|
|
245 | (6) |
|
6.1.1 The biological significance |
|
|
246 | (2) |
|
6.1.2 The statistical significance |
|
|
248 | (1) |
|
6.1.3 The Type I and Type II errors |
|
|
249 | (2) |
|
6.2 Microarray gene expression analysis |
|
|
251 | (10) |
|
|
252 | (4) |
|
6.2.2 The visualisation of the discovered DEGs using the MA plot |
|
|
256 | (1) |
|
6.2.3 The visualisation of the discovered DEGs using the volcano plot |
|
|
257 | (3) |
|
6.2.4 How to discover DEGs using the limma package |
|
|
260 | (1) |
|
6.3 DEG discovery for RNA-seq sequencing count data |
|
|
261 | (7) |
|
6.3.1 Discover DEGs for sequencing count data using DESeq2 |
|
|
262 | (3) |
|
6.3.2 Discover DEGs for sequencing count data using edgeR |
|
|
265 | (3) |
|
6.4 Discover differentially expressed genes when outliers are present |
|
|
268 | (20) |
|
6.4.1 Example of heterogeneous gene expression |
|
|
268 | (3) |
|
|
271 | (1) |
|
|
272 | (1) |
|
|
272 | (1) |
|
|
272 | (1) |
|
|
273 | (1) |
|
|
273 | (1) |
|
6.4.8 Discover DEGs when outlier genes are present --- simulated data |
|
|
273 | (9) |
|
6.4.9 Discover heterogenous DEGs for a cancer data set |
|
|
282 | (6) |
|
6.5 Gene expression bimodality pattern discovery |
|
|
288 | (14) |
|
6.5.1 The likelihood ratio test approach |
|
|
289 | (1) |
|
6.5.2 The bimodality index test approach |
|
|
290 | (1) |
|
6.5.3 The gap maximisation test approach |
|
|
291 | (7) |
|
6.5.4 Simulated data analysis |
|
|
298 | (2) |
|
6.5.5 Letrozole data analysis |
|
|
300 | (2) |
|
6.6 Dual-scale Gaussian model for small replicate data DEG discovery |
|
|
302 | (8) |
|
6.6.1 The dual-scale Gaussian model |
|
|
302 | (1) |
|
6.6.1.1 The working principle of DSG |
|
|
302 | (4) |
|
6.6.1.2 DSG for simulated data DEG discovery |
|
|
306 | (2) |
|
6.6.2 A real data set study |
|
|
308 | (2) |
|
|
310 | (1) |
|
7 Whole Genome Pattern Discovery |
|
|
311 | (53) |
|
7.1 The SARS-CoV-2 pandemic |
|
|
311 | (1) |
|
|
312 | (26) |
|
7.2.1 The issues of sequence alignment |
|
|
314 | (1) |
|
7.2.1.1 The three evolution events |
|
|
315 | (1) |
|
7.2.1.2 The alignment gap |
|
|
316 | (1) |
|
7.2.1.3 The alignment strategy |
|
|
316 | (1) |
|
7.2.1.4 The alignment statistic |
|
|
317 | (1) |
|
7.2.2 The Sellers algorithm |
|
|
317 | (1) |
|
7.2.2.1 The forward propagation stage |
|
|
318 | (7) |
|
7.2.2.2 The backward propagation stage |
|
|
325 | (1) |
|
7.2.3 The Needleman-Wunsch algorithm |
|
|
326 | (1) |
|
7.2.3.1 The initialisation stage |
|
|
327 | (1) |
|
7.2.3.2 The forward propagation stage |
|
|
327 | (2) |
|
7.2.3.3 The backward propagation stage |
|
|
329 | (2) |
|
7.2.3.4 The R library for the Needleman-Wunsch algorithm |
|
|
331 | (1) |
|
7.2.4 The Smith-Waterman algorithm |
|
|
332 | (1) |
|
7.2.4.1 The alignment metric and moving directions |
|
|
332 | (1) |
|
7.2.4.2 The initialisation |
|
|
333 | (1) |
|
7.2.4.3 The forward propagation |
|
|
334 | (2) |
|
7.2.4.4 The backward propagation stage |
|
|
336 | (1) |
|
7.2.4.5 The R library for the Smith-Waterman algorithm |
|
|
337 | (1) |
|
7.3 Alignment-based multiple sequence comparison |
|
|
338 | (5) |
|
7.4 Alignment-free multiple sequence comparison |
|
|
343 | (11) |
|
7.4.1 The k-mers approach |
|
|
345 | (7) |
|
7.4.2 The alignment-based approach versus the alignment-free approach for sequence comparison |
|
|
352 | (1) |
|
7.4.2.1 The speed comparison |
|
|
352 | (1) |
|
7.4.2.2 The accuracy comparison |
|
|
353 | (1) |
|
7.4.2.3 The pattern discovery power |
|
|
354 | (1) |
|
|
354 | (2) |
|
7.6 Whole genome pattern discovery for SARS-CoV-2 |
|
|
356 | (7) |
|
7.6.1 Genomics distribution of sequences |
|
|
357 | (2) |
|
7.6.2 Discrimination between countries based on genomics pattern |
|
|
359 | (2) |
|
7.6.3 Genomics pattern evolving with time |
|
|
361 | (2) |
|
|
363 | (1) |
|
8 Optimised Peptide Pattern Discovery |
|
|
364 | (31) |
|
8.1 A biological question --- protease cleavage pattern discovery |
|
|
365 | (1) |
|
|
366 | (2) |
|
|
368 | (19) |
|
8.3.1 The genetic algorithm |
|
|
369 | (3) |
|
8.3.2 The genetic programming algorithm |
|
|
372 | (1) |
|
8.3.2.1 The reverse Polish notation |
|
|
372 | (3) |
|
8.3.2.2 The GP breeding rules |
|
|
375 | (2) |
|
|
377 | (1) |
|
8.3.2.4 The dual-chromosome crossover |
|
|
378 | (3) |
|
8.3.2.5 Single-chromosome crossover |
|
|
381 | (2) |
|
8.3.2.6 The training of a GP model |
|
|
383 | (4) |
|
8.4 Factor Xa protease residue interplay |
|
|
387 | (7) |
|
|
394 | (1) |
|
|
395 | (8) |
|
9.1 Neural networks and deep learning |
|
|
395 | (3) |
|
9.2 Optimisation with evolutionary computation |
|
|
398 | (2) |
|
9.3 Quantum computing for biological pattern analysis |
|
|
400 | (1) |
|
9.4 Next-generation sequencing data quality |
|
|
401 | (1) |
|
9.5 SARS-CoV-2 protease cleavage pattern discovery |
|
|
402 | (1) |
References |
|
403 | (38) |
Index |
|
441 | |