Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Biological Pattern Discovery With R: Machine Learning Approaches

Zheng Rong Yang (Univ Of Exeter, Uk)

Formaat: 464 pages
Ilmumisaeg: 17-Sep-2021
Kirjastus: World Scientific Publishing Co Pte Ltd
Keel: eng
ISBN-13: 9789811240133

Teised raamatud teemal:

Medical bioinformatics

Formaat - EPUB+DRM
Hind: 58,38 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 464 pages
Ilmumisaeg: 17-Sep-2021
Kirjastus: World Scientific Publishing Co Pte Ltd
Keel: eng
ISBN-13: 9789811240133

Teised raamatud teemal:

Medical bioinformatics

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

This book provides the research directions for new or junior researchers who are going to use machine learning approaches for biological pattern discovery. The book was written based on the research experience of the author's several research projects in collaboration with biologists worldwide. The chapters are organised to address individual biological pattern discovery problems. For each subject, the research methodologies and the machine learning algorithms which can be employed are introduced and compared. Importantly, each chapter was written with the aim to help the readers to transfer their knowledge in theory to practical implementation smoothly. Therefore, the R programming environment was used for each subject in the chapters. The author hopes that this book can inspire new or junior researchers' interest in biological pattern discovery using machine learning algorithms.

Preface

vii

1 Introduction

(11)

1.1 The responsive gene discovery problem

(1)

1.2 The peptide function discovery problem

(1)

1.3 The molecular interaction discovery problem

(1)

1.4 The spectral molecular discovery problem

(1)

1.5 The whole-genome pattern discovery problem

(1)

1.6 The global optimisation pattern discovery problem

(1)

1.7 The chapters

(4)

2 Responsive Gene Discovery

(65)

2.1 A biological question --- essential gene discovery

(2)

2.2 Density estimation

(22)

2.2.1 The histogram approach

(6)

2.2.2 The parametric approach

(4)

2.2.3 The non-parametric approach

(1)

2.2.3.1 The kernel method

(2)

2.2.3.2 The K-nearest neighbour approach

(2)

2.2.4 The semi-parametric approach

(1)

2.2.4.1 The Gaussian mixture

(4)

2.2.4.2 The Gamma mixture

(2)

2.2.5 The multivariate density estimation

(1)

2.3 Cluster analysis

(28)

2.3.1 The hierarchical cluster analysis algorithm

(8)

2.3.2 The K-means cluster analysis algorithm

(11)

2.3.3 The fuzzy C-means cluster analysis algorithm

(2)

2.3.4 The mixture model cluster analysis algorithm

(4)

2.3.5 The other clustering algorithms

(1)

2.4 The gene essentiality pattern discovery problem

(10)

2.4.1 The data

(2)

2.4.2 The properties of the transposon statistics

(4)

2.4.3 Gene essentiality pattern discovery using univariate models

(1)

2.4.4 Gene essentiality pattern discovery using multivariate models

(1)

2.4.4.1 The multi-statistics multivariate model

(1)

2.4.4.2 The multi-replicate multivariate model

(1)

Summary

(2)

3 Protease Cleavage Pattern Discovery

(83)

3.1 A biology question --- protease cleavage

(1)

3.2 The linear discriminant analysis algorithm

(14)

3.2.1 The definition and working principle of LDA

(1)

3.2.2 The projection direction optimisation

(3)

3.2.3 The formulation of LDA

(3)

3.2.4 Making decision using the Bayes rule for a LDA model

(2)

3.2.5 The R function for LDA

(4)

3.3 The other analytic discriminant analysis algorithms

(5)

3.3.1 The quadratic discriminant analysis algorithm

(2)

3.3.2 The Naive Bayes algorithm

(1)

3.3.3 The logistic regression algorithm

(2)

3.3.4 The Bayesian linear discriminant analysis

(1)

3.4 Evaluation and generalisation of a supervised machine learning model

(11)

3.4.1 Confusion matrix

(3)

3.4.2 Receiver operating characteristic analysis

101

(5)

3.4.3 Generalisation

106

(3)

3.5 Example

109

(5)

3.6 Nonlinear algorithms

114

(44)

3.6.1 Multi-layer perceptron

115

(1)

3.6.1.1 The structure of MLP

115

(2)

3.6.1.2 The learning mechanism of MLP

117

(2)

3.6.1.3 From SLP (LDA) to MLP

119

(1)

3.6.1.4 The R packages for MLP

120

(2)

3.6.2 Radial basis function neural network

122

(2)

3.6.3 The bio-basis function neural network algorithm

124

(2)

3.6.3.1 The bio-basis function neural network algorithm

126

(2)

3.6.3.2 The Bayesian BBFNN algorithm

128

(3)

3.6.3.3 The orthogonal kernel machine

131

(1)

3.6.4 The support vector machine algorithm

132

(5)

3.6.5 The relevance vector machine algorithm

137

(2)

3.6.6 Deep neural network

139

(2)

3.6.7 Inductive learning

141

(1)

3.6.7.1 The working principle of inductive learning

142

(3)

3.6.7.2 The purity measurements

145

(2)

3.6.7.3 The classification and regression tree algorithm

147

(3)

3.6.7.4 The C50 algorithm

150

(2)

3.6.7.5 Seeds classification

152

(1)

3.6.7.6 Factor Xa protease cleavage data classification

153

(1)

3.6.8 The random forest algorithm

154

(4)

Summary

158

(2)

4 Genetic-Epigenetic Interplay Discovery

160

(49)

4.1 A biological question --- the genetic-epigenetic interplay pattern discovery problem

161

(1)

4.2 Regression analysis

162

(5)

4.3 The ordinary linear regression analysis algorithm

167

(12)

4.3.1 The least squared error approach

167

(3)

4.3.2 Assess the fitness of a regression model

170

(3)

4.3.3 The significance analysis of regression coefficients

173

(2)

4.3.4 The regression model confidence bands

175

(1)

4.3.5 R function for ordinary linear regression analysis

175

(4)

4.4 The generalised additive model algorithm

179

(4)

4.5 The Bayesian linear regression algorithm

183

(1)

4.6 The constrained regression analysis algorithms

184

(7)

4.6.1 The ridge linear regression algorithm

185

(2)

4.6.2 The Lasso linear regression algorithm

187

(3)

4.6.3 The elastic net linear regression algorithm

190

(1)

4.7 Ranking variables using the vip package

191

(1)

4.8 The nonlinear regression analysis algorithms

192

(3)

4.9 Epigenetic-genetic interplay pattern discovery

195

(12)

4.9.1 Methylation site to gene --- the M2E models

197

(6)

4.9.2 Gene to methylation site association --- E2M models

203

(4)

Summary

207

(2)

5 Spectral Pattern Discovery

209

(36)

5.1 A biology question

210

(1)

5.2 Introduction of baseline estimation approaches

210

(2)

5.3 The Whittaker-Henderson algorithm

212

(6)

5.4 The spline smoother

218

(2)

5.5 The adaptive iterative reweighted penalised least square smoother

220

(1)

5.6 The asymmetric least square smoother

221

(3)

5.7 The Bayesian Whittaker-Henderson algorithm

224

(11)

5.7.1 The working principle of BWH

224

(4)

5.7.2 The smoothing of the extracted peak spectrum

228

(1)

5.7.3 The generation of the merged and unique peaks

229

(3)

5.7.4 The fitness of a BWH model

232

(1)

5.7.5 Aligning peaks for replicated spectra

232

(3)

5.8 Analyse the milk spectra data

235

(3)

5.9 Analyse the bacterial and macrophage data

238

(5)

Summary

243

(2)

6 Gene Expression Pattern Discovery

245

(66)

6.1 Differentially expressed genes

245

(6)

6.1.1 The biological significance

246

(2)

6.1.2 The statistical significance

248

(1)

6.1.3 The Type I and Type II errors

249

(2)

6.2 Microarray gene expression analysis

251

(10)

6.2.1 The limma package

252

(4)

6.2.2 The visualisation of the discovered DEGs using the MA plot

256

(1)

6.2.3 The visualisation of the discovered DEGs using the volcano plot

257

(3)

6.2.4 How to discover DEGs using the limma package

260

(1)

6.3 DEG discovery for RNA-seq sequencing count data

261

(7)

6.3.1 Discover DEGs for sequencing count data using DESeq2

262

(3)

6.3.2 Discover DEGs for sequencing count data using edgeR

265

(3)

6.4 Discover differentially expressed genes when outliers are present

268

(20)

6.4.1 Example of heterogeneous gene expression

268

(3)

6.4.2 COPA

271

(1)

6.4.3 OS

272

(1)

6.4.4 ORT

272

(1)

6.4.5 MOST

272

(1)

6.4.6 LSOSS

273

(1)

6.4.7 DOG

273

(1)

6.4.8 Discover DEGs when outlier genes are present --- simulated data

273

(9)

6.4.9 Discover heterogenous DEGs for a cancer data set

282

(6)

6.5 Gene expression bimodality pattern discovery

288

(14)

6.5.1 The likelihood ratio test approach

289

(1)

6.5.2 The bimodality index test approach

290

(1)

6.5.3 The gap maximisation test approach

291

(7)

6.5.4 Simulated data analysis

298

(2)

6.5.5 Letrozole data analysis

300

(2)

6.6 Dual-scale Gaussian model for small replicate data DEG discovery

302

(8)

6.6.1 The dual-scale Gaussian model

302

(1)

6.6.1.1 The working principle of DSG

302

(4)

6.6.1.2 DSG for simulated data DEG discovery

306

(2)

6.6.2 A real data set study

308

(2)

Summary

310

(1)

7 Whole Genome Pattern Discovery

311

(53)

7.1 The SARS-CoV-2 pandemic

311

(1)

7.2 Sequence alignment

312

(26)

7.2.1 The issues of sequence alignment

314

(1)

7.2.1.1 The three evolution events

315

(1)

7.2.1.2 The alignment gap

316

(1)

7.2.1.3 The alignment strategy

316

(1)

7.2.1.4 The alignment statistic

317

(1)

7.2.2 The Sellers algorithm

317

(1)

7.2.2.1 The forward propagation stage

318

(7)

7.2.2.2 The backward propagation stage

325

(1)

7.2.3 The Needleman-Wunsch algorithm

326

(1)

7.2.3.1 The initialisation stage

327

(1)

7.2.3.2 The forward propagation stage

327

(2)

7.2.3.3 The backward propagation stage

329

(2)

7.2.3.4 The R library for the Needleman-Wunsch algorithm

331

(1)

7.2.4 The Smith-Waterman algorithm

332

(1)

7.2.4.1 The alignment metric and moving directions

332

(1)

7.2.4.2 The initialisation

333

(1)

7.2.4.3 The forward propagation

334

(2)

7.2.4.4 The backward propagation stage

336

(1)

7.2.4.5 The R library for the Smith-Waterman algorithm

337

(1)

7.3 Alignment-based multiple sequence comparison

338

(5)

7.4 Alignment-free multiple sequence comparison

343

(11)

7.4.1 The k-mers approach

345

(7)

7.4.2 The alignment-based approach versus the alignment-free approach for sequence comparison

352

(1)

7.4.2.1 The speed comparison

352

(1)

7.4.2.2 The accuracy comparison

353

(1)

7.4.2.3 The pattern discovery power

354

(1)

7.5 K-mer machine

354

(2)

7.6 Whole genome pattern discovery for SARS-CoV-2

356

(7)

7.6.1 Genomics distribution of sequences

357

(2)

7.6.2 Discrimination between countries based on genomics pattern

359

(2)

7.6.3 Genomics pattern evolving with time

361

(2)

Summary

363

(1)

8 Optimised Peptide Pattern Discovery

364

(31)

8.1 A biological question --- protease cleavage pattern discovery

365

(1)

8.2 Introduction

366

(2)

8.3 Genetic programming

368

(19)

8.3.1 The genetic algorithm

369

(3)

8.3.2 The genetic programming algorithm

372

(1)

8.3.2.1 The reverse Polish notation

372

(3)

8.3.2.2 The GP breeding rules

375

(2)

8.3.2.3 Mutation

377

(1)

8.3.2.4 The dual-chromosome crossover

378

(3)

8.3.2.5 Single-chromosome crossover

381

(2)

8.3.2.6 The training of a GP model

383

(4)

8.4 Factor Xa protease residue interplay

387

(7)

Summary

394

(1)

9 Advanced Subjects

395

(8)

9.1 Neural networks and deep learning

395

(3)

9.2 Optimisation with evolutionary computation

398

(2)

9.3 Quantum computing for biological pattern analysis

400

(1)

9.4 Next-generation sequencing data quality

401

(1)

9.5 SARS-CoV-2 protease cleavage pattern discovery

402

(1)

References

403

(38)

Index

441

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97898112401336e.html

Märksõnad:

E-raamat: Biological Pattern Discovery With R: Machine Learning Approaches

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv