Muutke küpsiste eelistusi

E-raamat: Biological Pattern Discovery With R: Machine Learning Approaches

(Univ Of Exeter, Uk)
  • Formaat: 464 pages
  • Ilmumisaeg: 17-Sep-2021
  • Kirjastus: World Scientific Publishing Co Pte Ltd
  • Keel: eng
  • ISBN-13: 9789811240133
Teised raamatud teemal:
  • Formaat - EPUB+DRM
  • Hind: 58,38 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 464 pages
  • Ilmumisaeg: 17-Sep-2021
  • Kirjastus: World Scientific Publishing Co Pte Ltd
  • Keel: eng
  • ISBN-13: 9789811240133
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This book provides the research directions for new or junior researchers who are going to use machine learning approaches for biological pattern discovery. The book was written based on the research experience of the author's several research projects in collaboration with biologists worldwide. The chapters are organised to address individual biological pattern discovery problems. For each subject, the research methodologies and the machine learning algorithms which can be employed are introduced and compared. Importantly, each chapter was written with the aim to help the readers to transfer their knowledge in theory to practical implementation smoothly. Therefore, the R programming environment was used for each subject in the chapters. The author hopes that this book can inspire new or junior researchers' interest in biological pattern discovery using machine learning algorithms.

Preface vii
1 Introduction
1(11)
1.1 The responsive gene discovery problem
2(1)
1.2 The peptide function discovery problem
3(1)
1.3 The molecular interaction discovery problem
4(1)
1.4 The spectral molecular discovery problem
5(1)
1.5 The whole-genome pattern discovery problem
6(1)
1.6 The global optimisation pattern discovery problem
7(1)
1.7 The chapters
8(4)
2 Responsive Gene Discovery
12(65)
2.1 A biological question --- essential gene discovery
13(2)
2.2 Density estimation
15(22)
2.2.1 The histogram approach
16(6)
2.2.2 The parametric approach
22(4)
2.2.3 The non-parametric approach
26(1)
2.2.3.1 The kernel method
26(2)
2.2.3.2 The K-nearest neighbour approach
28(2)
2.2.4 The semi-parametric approach
30(1)
2.2.4.1 The Gaussian mixture
30(4)
2.2.4.2 The Gamma mixture
34(2)
2.2.5 The multivariate density estimation
36(1)
2.3 Cluster analysis
37(28)
2.3.1 The hierarchical cluster analysis algorithm
39(8)
2.3.2 The K-means cluster analysis algorithm
47(11)
2.3.3 The fuzzy C-means cluster analysis algorithm
58(2)
2.3.4 The mixture model cluster analysis algorithm
60(4)
2.3.5 The other clustering algorithms
64(1)
2.4 The gene essentiality pattern discovery problem
65(10)
2.4.1 The data
65(2)
2.4.2 The properties of the transposon statistics
67(4)
2.4.3 Gene essentiality pattern discovery using univariate models
71(1)
2.4.4 Gene essentiality pattern discovery using multivariate models
72(1)
2.4.4.1 The multi-statistics multivariate model
73(1)
2.4.4.2 The multi-replicate multivariate model
74(1)
Summary
75(2)
3 Protease Cleavage Pattern Discovery
77(83)
3.1 A biology question --- protease cleavage
78(1)
3.2 The linear discriminant analysis algorithm
79(14)
3.2.1 The definition and working principle of LDA
80(1)
3.2.2 The projection direction optimisation
81(3)
3.2.3 The formulation of LDA
84(3)
3.2.4 Making decision using the Bayes rule for a LDA model
87(2)
3.2.5 The R function for LDA
89(4)
3.3 The other analytic discriminant analysis algorithms
93(5)
3.3.1 The quadratic discriminant analysis algorithm
93(2)
3.3.2 The Naive Bayes algorithm
95(1)
3.3.3 The logistic regression algorithm
95(2)
3.3.4 The Bayesian linear discriminant analysis
97(1)
3.4 Evaluation and generalisation of a supervised machine learning model
98(11)
3.4.1 Confusion matrix
98(3)
3.4.2 Receiver operating characteristic analysis
101(5)
3.4.3 Generalisation
106(3)
3.5 Example
109(5)
3.6 Nonlinear algorithms
114(44)
3.6.1 Multi-layer perceptron
115(1)
3.6.1.1 The structure of MLP
115(2)
3.6.1.2 The learning mechanism of MLP
117(2)
3.6.1.3 From SLP (LDA) to MLP
119(1)
3.6.1.4 The R packages for MLP
120(2)
3.6.2 Radial basis function neural network
122(2)
3.6.3 The bio-basis function neural network algorithm
124(2)
3.6.3.1 The bio-basis function neural network algorithm
126(2)
3.6.3.2 The Bayesian BBFNN algorithm
128(3)
3.6.3.3 The orthogonal kernel machine
131(1)
3.6.4 The support vector machine algorithm
132(5)
3.6.5 The relevance vector machine algorithm
137(2)
3.6.6 Deep neural network
139(2)
3.6.7 Inductive learning
141(1)
3.6.7.1 The working principle of inductive learning
142(3)
3.6.7.2 The purity measurements
145(2)
3.6.7.3 The classification and regression tree algorithm
147(3)
3.6.7.4 The C50 algorithm
150(2)
3.6.7.5 Seeds classification
152(1)
3.6.7.6 Factor Xa protease cleavage data classification
153(1)
3.6.8 The random forest algorithm
154(4)
Summary
158(2)
4 Genetic-Epigenetic Interplay Discovery
160(49)
4.1 A biological question --- the genetic-epigenetic interplay pattern discovery problem
161(1)
4.2 Regression analysis
162(5)
4.3 The ordinary linear regression analysis algorithm
167(12)
4.3.1 The least squared error approach
167(3)
4.3.2 Assess the fitness of a regression model
170(3)
4.3.3 The significance analysis of regression coefficients
173(2)
4.3.4 The regression model confidence bands
175(1)
4.3.5 R function for ordinary linear regression analysis
175(4)
4.4 The generalised additive model algorithm
179(4)
4.5 The Bayesian linear regression algorithm
183(1)
4.6 The constrained regression analysis algorithms
184(7)
4.6.1 The ridge linear regression algorithm
185(2)
4.6.2 The Lasso linear regression algorithm
187(3)
4.6.3 The elastic net linear regression algorithm
190(1)
4.7 Ranking variables using the vip package
191(1)
4.8 The nonlinear regression analysis algorithms
192(3)
4.9 Epigenetic-genetic interplay pattern discovery
195(12)
4.9.1 Methylation site to gene --- the M2E models
197(6)
4.9.2 Gene to methylation site association --- E2M models
203(4)
Summary
207(2)
5 Spectral Pattern Discovery
209(36)
5.1 A biology question
210(1)
5.2 Introduction of baseline estimation approaches
210(2)
5.3 The Whittaker-Henderson algorithm
212(6)
5.4 The spline smoother
218(2)
5.5 The adaptive iterative reweighted penalised least square smoother
220(1)
5.6 The asymmetric least square smoother
221(3)
5.7 The Bayesian Whittaker-Henderson algorithm
224(11)
5.7.1 The working principle of BWH
224(4)
5.7.2 The smoothing of the extracted peak spectrum
228(1)
5.7.3 The generation of the merged and unique peaks
229(3)
5.7.4 The fitness of a BWH model
232(1)
5.7.5 Aligning peaks for replicated spectra
232(3)
5.8 Analyse the milk spectra data
235(3)
5.9 Analyse the bacterial and macrophage data
238(5)
Summary
243(2)
6 Gene Expression Pattern Discovery
245(66)
6.1 Differentially expressed genes
245(6)
6.1.1 The biological significance
246(2)
6.1.2 The statistical significance
248(1)
6.1.3 The Type I and Type II errors
249(2)
6.2 Microarray gene expression analysis
251(10)
6.2.1 The limma package
252(4)
6.2.2 The visualisation of the discovered DEGs using the MA plot
256(1)
6.2.3 The visualisation of the discovered DEGs using the volcano plot
257(3)
6.2.4 How to discover DEGs using the limma package
260(1)
6.3 DEG discovery for RNA-seq sequencing count data
261(7)
6.3.1 Discover DEGs for sequencing count data using DESeq2
262(3)
6.3.2 Discover DEGs for sequencing count data using edgeR
265(3)
6.4 Discover differentially expressed genes when outliers are present
268(20)
6.4.1 Example of heterogeneous gene expression
268(3)
6.4.2 COPA
271(1)
6.4.3 OS
272(1)
6.4.4 ORT
272(1)
6.4.5 MOST
272(1)
6.4.6 LSOSS
273(1)
6.4.7 DOG
273(1)
6.4.8 Discover DEGs when outlier genes are present --- simulated data
273(9)
6.4.9 Discover heterogenous DEGs for a cancer data set
282(6)
6.5 Gene expression bimodality pattern discovery
288(14)
6.5.1 The likelihood ratio test approach
289(1)
6.5.2 The bimodality index test approach
290(1)
6.5.3 The gap maximisation test approach
291(7)
6.5.4 Simulated data analysis
298(2)
6.5.5 Letrozole data analysis
300(2)
6.6 Dual-scale Gaussian model for small replicate data DEG discovery
302(8)
6.6.1 The dual-scale Gaussian model
302(1)
6.6.1.1 The working principle of DSG
302(4)
6.6.1.2 DSG for simulated data DEG discovery
306(2)
6.6.2 A real data set study
308(2)
Summary
310(1)
7 Whole Genome Pattern Discovery
311(53)
7.1 The SARS-CoV-2 pandemic
311(1)
7.2 Sequence alignment
312(26)
7.2.1 The issues of sequence alignment
314(1)
7.2.1.1 The three evolution events
315(1)
7.2.1.2 The alignment gap
316(1)
7.2.1.3 The alignment strategy
316(1)
7.2.1.4 The alignment statistic
317(1)
7.2.2 The Sellers algorithm
317(1)
7.2.2.1 The forward propagation stage
318(7)
7.2.2.2 The backward propagation stage
325(1)
7.2.3 The Needleman-Wunsch algorithm
326(1)
7.2.3.1 The initialisation stage
327(1)
7.2.3.2 The forward propagation stage
327(2)
7.2.3.3 The backward propagation stage
329(2)
7.2.3.4 The R library for the Needleman-Wunsch algorithm
331(1)
7.2.4 The Smith-Waterman algorithm
332(1)
7.2.4.1 The alignment metric and moving directions
332(1)
7.2.4.2 The initialisation
333(1)
7.2.4.3 The forward propagation
334(2)
7.2.4.4 The backward propagation stage
336(1)
7.2.4.5 The R library for the Smith-Waterman algorithm
337(1)
7.3 Alignment-based multiple sequence comparison
338(5)
7.4 Alignment-free multiple sequence comparison
343(11)
7.4.1 The k-mers approach
345(7)
7.4.2 The alignment-based approach versus the alignment-free approach for sequence comparison
352(1)
7.4.2.1 The speed comparison
352(1)
7.4.2.2 The accuracy comparison
353(1)
7.4.2.3 The pattern discovery power
354(1)
7.5 K-mer machine
354(2)
7.6 Whole genome pattern discovery for SARS-CoV-2
356(7)
7.6.1 Genomics distribution of sequences
357(2)
7.6.2 Discrimination between countries based on genomics pattern
359(2)
7.6.3 Genomics pattern evolving with time
361(2)
Summary
363(1)
8 Optimised Peptide Pattern Discovery
364(31)
8.1 A biological question --- protease cleavage pattern discovery
365(1)
8.2 Introduction
366(2)
8.3 Genetic programming
368(19)
8.3.1 The genetic algorithm
369(3)
8.3.2 The genetic programming algorithm
372(1)
8.3.2.1 The reverse Polish notation
372(3)
8.3.2.2 The GP breeding rules
375(2)
8.3.2.3 Mutation
377(1)
8.3.2.4 The dual-chromosome crossover
378(3)
8.3.2.5 Single-chromosome crossover
381(2)
8.3.2.6 The training of a GP model
383(4)
8.4 Factor Xa protease residue interplay
387(7)
Summary
394(1)
9 Advanced Subjects
395(8)
9.1 Neural networks and deep learning
395(3)
9.2 Optimisation with evolutionary computation
398(2)
9.3 Quantum computing for biological pattern analysis
400(1)
9.4 Next-generation sequencing data quality
401(1)
9.5 SARS-CoV-2 protease cleavage pattern discovery
402(1)
References 403(38)
Index 441