Klienditugi: 7440010 (E-R 10-18)

Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications [Kõva köide]

Edited by Mourad Elloumi (University of Tunis El Manar, Tunisia), Edited by Albert Y. Zomaya (University of Western Australia), Series edited by Yi Pan (Department of Computer Science, Georgia State University)

Formaat: Hardback, 1080 pages, kõrgus x laius x paksus: 243x165x64 mm, kaal: 1692 g
Sari: Wiley Series in Bioinformatics
Ilmumisaeg: 18-Feb-2011
Kirjastus: John Wiley & Sons Inc
ISBN-10: 0470505192
ISBN-13: 9780470505199

Teised raamatud teemal:

Biology, life sciences - (Hetkel poes: 30 nimetust)
Applied mathematics - (Hetkel poes: 1 nimetust)

Kõva köide
Hind: 216,03 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja
Raamatukogudele

Formaat: Hardback, 1080 pages, kõrgus x laius x paksus: 243x165x64 mm, kaal: 1692 g
Sari: Wiley Series in Bioinformatics
Ilmumisaeg: 18-Feb-2011
Kirjastus: John Wiley & Sons Inc
ISBN-10: 0470505192
ISBN-13: 9780470505199

Teised raamatud teemal:

Biology, life sciences - (Hetkel poes: 30 nimetust)
Applied mathematics - (Hetkel poes: 1 nimetust)

Püsilink: https://www.kriso.ee/db/9780470505199.html

This book represents the most comprehensive and up-to-date collection of information on the topic of computational molecular biology. Bringing the most recent research into the forefront of discussion, Algorithms in Computational Molecular Biology studies the most important and useful algorithms currently being used in the field, and provides related problems. It also succeeds where other titles have failed, in offering a wide range of information from the introductory fundamentals right up to the latest, most advanced levels of study.

Preface

xxxi

Contributors

xxxiii

I STRINGS PROCESSING AND APPLICATION TO BIOLOGICAL SEQUENCES

(190)

1 String data Structures for Computational Molecular Biology

(24)

Christos Makris

Evangelos Theodoridis

1.1 Introduction

(3)

1.2 Main String Indexing Data Structures

(6)

1.2.1 Suffix Trees

(2)

1.2.2 Suffix Arrays

(4)

1.3 Index Structures for Weighted Strings

(2)

1.4 Index Structures for Indeterminate Strings

(3)

1.5 String Data Structures in Memory Hierarchies

(3)

1.6 Conclusions

(1)

References

(7)

2 Efficient Restricted-Case Algorithms for Problems in Computational Biology

(24)

Patricia A. Evans

H. Todd Wareham

2.1 The Need for Special Cases

(1)

2.2 Assessing Efficient Solvability Options for General Problems and Special Cases

(2)

2.3 String and Sequence Problems

(1)

2.4 Shortest Common Superstring

(5)

2.4.1 Solving the General Problem

(2)

2.4.2 Special Case: SCSt for Short Strings Over Small Alphabets

(1)

2.4.3 Discussion

(1)

2.5 Longest Common Subsequence

(5)

2.5.1 Solving the General Problem

(2)

2.5.2 Special Case: LCS of Similar Sequences

(1)

2.5.3 Special Case: LCS Under Symbol-Occurrence Restrictions

(1)

2.5.4 Discussion

(1)

2.6 Common Approximate Substring

(5)

2.6.1 Solving the General Problem

(2)

2.6.2 Special Case: Common Approximate String

(1)

2.6.3 Discussion

(1)

2.7 Conclusion

(1)

References

(4)

3 Finite Automata in Pattern Matching

(22)

Jan Holub

3.1 Introduction

(2)

3.1.1 Preliminaries

(1)

3.2 Direct Use of DFA in Stringology

(7)

3.2.1 Forward Automata

(3)

3.2.2 Degenerate Strings

(1)

3.2.3 Indexing Automata

(2)

3.2.4 Filtering Automata

(1)

3.2.5 Backward Automata

(1)

3.2.6 Automata with Fail Function

(1)

3.3 NFA Simulation

(6)

3.3.1 Basic Simulation Method

(1)

3.3.2 Bit Parallelism

(2)

3.3.3 Dynamic Programming

(3)

3.3.4 Basic Simulation Method with Deterministic State Cache

(1)

3.4 Finite Automaton as Model of Computation

(1)

3.5 Finite Automata Composition

(1)

3.6 Summary

(2)

References

(4)

4 New Developments in Processing of Degenerate Sequences

(18)

Pavlos Antoniou

Costas S. Iliopoulos

4.1 Introduction

(1)

4.1.1 Degenerate Primer Design Problem

(1)

4.2 Background

(2)

4.3 Basic Definitions

(3)

4.4 Repetitive Structures in Degenerate Strings

(5)

4.4.1 Using the Masking Technique

(1)

4.4.2 Computing the Smallest Cover of the Degenerate String x

(2)

4.4.3 Computing Maximal Local Covers of x

(3)

4.4.4 Computing All Covers of x

(1)

4.4.5 Computing the Seeds of x

(1)

4.5 Conservative String Covering in Degenerate Strings

(4)

4.5.1 Finding Constrained Pattern p in Degenerate String T

(1)

4.5.2 Computing λ-Conservative Covers of Degenerate Strings

(1)

4.5.3 Computing λ-Conservative Seeds of Degenerate Strings

(1)

4.6 Conclusion

(1)

References

(2)

5 Exact Search Algorithms for Biological Sequences

(22)

Eric Rivals

Leena Salmela

Jorma Tarhio

5.1 Introduction

(2)

5.2 Single Pattern Matching Algorithms

(4)

5.2.1 Algorithms for DNA Sequences

(2)

5.2.2 Algorithms for Amino Acids

(1)

5.3 Algorithms for Multiple Patterns

(6)

5.3.1 Trie-Based Algorithms

(3)

5.3.2 Filtering Algorithms

100

(3)

5.3.3 Other Algorithms

103

(1)

5.4 Application of Exact Set Pattern Matching for Read Mapping

103

(4)

5.4.1 MPSCAN: An Efficient Exact Set Pattern Matching Tool for DNA/RNA Sequences

103

(1)

5.4.2 Other Solutions for Mapping Reads

104

(1)

5.4.3 Comparison of Mapping Solutions

105

(2)

5.5 Conclusions

107

(1)

References

108

(5)

6 Algorithmic Aspects of Arc-Annotated Sequences

113

(16)

Guillaume Blin

Maxime Crochemore

Stephane Vialette

6.1 Introduction

113

(1)

6.2 Preliminaries

114

(3)

6.2.1 Arc-Annotated Sequences

114

(1)

6.2.2 Hierarchy

114

(1)

6.2.3 Refined Hierarchy

115

(1)

6.2.4 Alignment

115

(1)

6.2.5 Edit Operations

116

(1)

6.3 Longest Arc-Preserving Common Subsequence

117

(3)

6.3.1 Definition

117

(1)

6.3.2 Classical Complexity

118

(1)

6.3.3 Parameterized Complexity

119

(1)

6.3.4 Approximability

120

(1)

6.4 Arc-Preserving Subsequence

120

(2)

6.4.1 Definition

120

(1)

6.4.2 Classical Complexity

121

(1)

6.4.3 Classical Complexity for the Refined Hierarchy

121

(1)

6.4.4 Open Problems

122

(1)

6.5 Maximum Arc-Preserving Common Subsequence

122

(1)

6.5.1 Definition

122

(1)

6.5.2 Classical Complexity

123

(1)

6.5.3 Open Problems

123

(1)

6.6 Edit Distance

123

(2)

6.6.1 Definition

123

(1)

6.6.2 Classical Complexity

123

(2)

6.6.3 Approximability

125

(1)

6.6.4 Open Problems

125

(1)

References

125

(4)

7 Algorithmic Issues in Dna Barcoding Problems

129

(14)

Bhaskar DasGupta

Ming-Yang Kao

Ion Mandoiu

7.1 Introduction

129

(1)

7.2 Test Set Problems: A General Framework for Several Barcoding Problems

130

(2)

7.3 A Synopsis of Biological Applications of Barcoding

132

(1)

7.4 Survey of Algorithmic Techniques on Barcoding

133

(2)

7.4.1 Integer Programming

134

(1)

7.4.2 Lagrangian Relaxation and Simulated Annealing

134

(1)

7.4.3 Provably Asymptotically Optimal Results

134

(1)

7.5 Information Content Approach

135

(1)

7.6 Set-Covering Approach

136

(3)

7.6.1 Set-Covering Implementation in More Detail

137

(2)

7.7 Experimental Results and Software Availability

139

(1)

7.7.1 Randomly Generated Instances

139

(1)

7.7.2 Real Data

140

(1)

7.7.3 Software Availability

140

(1)

7.8 Concluding Remarks

140

(1)

References

141

(2)

8 Recent Advances in Weighted Dna Sequences

143

(28)

Manolis Christodoulakis

Costas S. Iliopoulos

8.1 Introduction

143

(3)

8.2 Preliminaries

146

(2)

8.2.1 Strings

146

(1)

8.2.2 Weighted Sequences

147

(1)

8.3 Indexing

148

(4)

8.3.1 Weighted Suffix Tree

148

(3)

8.3.2 Property Suffix Tree

151

(1)

8.4 Pattern Matching

152

(5)

8.4.1 Pattern Matching Using the Weighted Suffix Tree

152

(1)

8.4.2 Pattern Matching Using Match Counts

153

(1)

8.4.3 Pattern Matching with Gaps

154

(2)

8.4.4 Pattern Matching with Swaps

156

(1)

8.5 Approximate Pattern Matching

157

(3)

8.5.1 Hamming Distance

157

(3)

8.6 Repetitions, Covers, and Tandem Repeats

160

(4)

8.6.1 Finding Simple Repetitions with the Weighted Suffix Tree

161

(1)

8.6.2 Fixed-Length Simple Repetitions

161

(2)

8.6.3 Fixed-Length Strict Repetitions

163

(1)

8.6.4 Fixed-Length Tandem Repeats

163

(1)

8.6.5 Identifying Covers

164

(1)

8.7 Motif Discovery

164

(2)

8.7.1 Approximate Motifs in a Single Weighted Sequence

164

(1)

8.7.2 Approximate Common Motifs in a Set of Weighted Sequences

165

(1)

8.8 Conclusions

166

(1)

References

167

(4)

9 Dna Computing for Subgraph Isomorphism Problem and Related Problems

171

(20)

Sun-Yuan Hsieh

Chao-Wen Huang

Hsin-Hung Chou

9.1 Introduction

171

(1)

9.2 Definitions of Subgraph Isomorphism Problem and Related Problems

172

(2)

9.3 DNA Computing Models

174

(1)

9.3.1 The Stickers

174

(1)

9.3.2 The Adleman-Lipton Model

175

(1)

9.4 The Sticker-based Solution Space

175

(4)

9.4.1 Using Stickers for Generating the Permutation Set

176

(1)

9.4.2 Using Stickers for Generating the Solution Space

177

(2)

9.5 Algorithms for Solving Problems

179

(8)

9.5.1 Solving the Subgraph Isomorphism Problem

179

(4)

9.5.2 Solving the Graph Isomorphism Problem

183

(1)

9.5.3 Solving the Maximum Common Subgraph Problem

184

(3)

9.6 Experimental Data

187

(1)

9.7 Conclusion

188

(1)

References

188

(3)

II ANALYSIS OF BIOLOGICAL SEQUENCES

191

(192)

10 Graphs in Bioinformatics

193

(28)

Elsa Chacko

Shoba Ranganathan

10.1 Graph theory---Origin

193

(14)

10.1.1 What is a Graph?

193

(1)

10.1.2 Types of Graphs

194

(6)

10.1.3 Well-Known Graph Problems and Algorithms

200

(7)

10.2 Graphs and the Biological World

207

(9)

10.2.1 Alternative Splicing and Graphs

207

(1)

10.2.2 Evolutionary Tree Construction

208

(1)

10.2.3 Tracking the Temporal Variation of Biological Systems

209

(1)

10.2.4 Identifying Protein Domains by Clustering Sequence Alignments

210

(1)

10.2.5 Clustering Gene Expression Data

211

(1)

10.2.6 Protein Structural Domain Decomposition

212

(1)

10.2.7 Optimal Design of Thermally Stable Proteins

212

(2)

10.2.8 The Sequencing by Hybridization (SBH) Problem

214

(1)

10.2.9 Predicting Interactions in Protein Networks by Completing Defective Cliques

215

(1)

10.3 Conclusion

216

(1)

References

216

(5)

11 A Flexible Data Store for Managing Bioinformatics Data

221

(20)

Bassam A. Alqaralleh

Chen Wang

Bing Bing Zhou

Albert Y. Zomaya

11.1 Introduction

221

(2)

11.1.1 Background

222

(1)

11.1.2 Scalability Challenges

222

(1)

11.2 Data Model and System Overview

223

(4)

11.3 Replication and Load Balancing

227

(3)

11.3.1 Replicating an Index Node

228

(1)

11.3.2 Answering Range Queries with Replicas

229

(1)

11.4 Evaluation

230

(7)

11.4.1 Point Query Processing Performance

230

(3)

11.4.2 Range Query Processing Performance

233

(2)

11.4.3 Growth of the Replicas of an Indexing Node

235

(2)

11.5 Related Work

237

(1)

11.6 Summary

237

(1)

References

238

(3)

12 Algorithms for the Alignment of Biological Sequences

241

(20)

Ahmed Mokaddem

Mourad Elloumi

12.1 Introduction

241

(1)

12.2 Alignment Algorithms

242

(9)

12.2.1 Pairwise Alignment Algorithms

242

(3)

12.2.2 Multiple Alignment Algorithms

245

(6)

12.3 Score Functions

251

(1)

12.4 Benchmarks

252

(3)

12.5 Conclusion

255

(1)

Acknowledgments

255

(1)

References

255

(6)

13 Algorithms for Local Structural Alignment and Structural Motif Identification

261

(16)

Sanguthevar Rajasekaran

Vamsi Kundeti

Martin Schiller

13.1 Introduction

261

(1)

13.2 Problem Definition of Local Structural Alignment

262

(1)

13.3 Variable-Length Alignment Fragment Pair (VLAFP) Algorithm

263

(3)

13.3.1 Alignment Fragment Pairs

263

(1)

13.3.2 Finding the Optimal Local Alignments Based on the VLAFP Cost Function

264

(2)

13.4 Structural Alignment based on Center of Gravity: SACG

266

(4)

13.4.1 Description of Protein Structure in PDB Format

266

(1)

13.4.2 Related Work

267

(1)

13.4.3 Center-of-Gravity-Based Algorithm

267

(2)

13.4.4 Extending Theorem 13.1 for Atomic Coordinates in Protein Structure

269

(1)

13.4.5 Building VCOST(i,j,q) Function Based on Center of Gravity

270

(1)

13.5 Searching Structural Motifs

270

(3)

13.6 Using SACG Algorithm for Classification of New Protein Structures

273

(1)

13.7 Experimental Results

273

(1)

13.8 Accuracy Results

273

(1)

13.9 Conclusion

274

(1)

Acknowledgments

275

(1)

References

276

(1)

14 Evolution of the Clustal Family of Multiple Sequence Alignment Programs

277

(22)

Mohamed Radhouene Aniba

Julie Thompson

14.1 Introduction

277

(1)

14.2 Clustal-ClustalV

278

(6)

14.2.1 Pairwise Similarity Scores

279

(1)

14.2.2 Guide Tree

280

(2)

14.2.3 Progressive Multiple Alignment

282

(1)

14.2.4 An Efficient Dynamic Programming Algorithm

282

(2)

14.2.5 Profile Alignments

284

(1)

14.3 ClustalW

284

(5)

14.3.1 Optimal Pairwise Alignments

284

(1)

14.3.2 More Accurate Guide Tree

284

(1)

14.3.3 Improved Progressive Alignment

285

(4)

14.4 ClustalX

289

(3)

14.4.1 Alignment Quality Analysis

290

(2)

14.5 ClustalW and ClustalX 2.0

292

(1)

14.6 DbClustal

293

(2)

14.6.1 Anchored Global Alignment

294

(1)

14.7 Perspectives

295

(1)

References

296

(3)

15 Filters and Seeds Approaches for Fast Homology Searches in Large Datasets

299

(22)

Nadia Pisanti

Mathieu Giraud

Pierre Peterlongo

15.1 Introduction

299

(2)

15.1.1 Homologies and Large Datasets

299

(1)

15.1.2 Filter Preprocessing or Heuristics

300

(1)

15.1.3 Contents

300

(1)

15.2 Methods Framework

301

(2)

15.2.1 Strings and Repeats

301

(1)

15.2.2 Filters---Fundamental Concepts

301

(2)

15.3 Lossless filters

303

(6)

15.3.1 History of Lossless Filters

303

(1)

15.3.2 Quasar and swift---Filtering Repeats with Edit Distance

304

(1)

15.3.3 Nimbus---Filtering Multiple Repeats with Hamming Distance

305

(3)

15.3.4 tuiuiu---Filtering Multiple Repeats with Edit Distance

308

(1)

15.4 Lossy Seed-Based Filters

309

(6)

15.4.1 Seed-Based Heuristics

310

(1)

15.4.2 Advanced Seeds

311

(1)

15.4.3 Latencies and Neighborhood Indexing

311

(2)

15.4.4 Seed-Based Heuristics Implementations

313

(2)

15.5 Conclusion

315

(1)

15.6 Acknowledgments

315

(1)

References

315

(6)

16 Novel Combinatorial and Information-Theoretic Alignment-Free Distances for Biological DATA MINING

321

(40)

Chiara Epifanio

Alessandra Gabriele

Raffaele Giancarlo

Marinella Sciortino

16.1 Introduction

321

(2)

16.2 Information-Theoretic Alignment-Free Methods

323

(8)

16.2.1 Fundamental Information Measures, Statistical Dependency, and Similarity of Sequences

324

(1)

16.2.2 Methods Based on Relative Entropy and Empirical Probability Distributions

325

(4)

16.2.3 A Method Based on Statistical Dependency, via Mutual Information

329

(2)

16.3 Combinatorial Alignment-Free Methods

331

(5)

16.3.1 The Average Common Substring Distance

332

(1)

16.3.2 A Method Based on the EBWT Transform

333

(1)

16.3.3 N-Local Decoding

334

(2)

16.4 Alignment-Free Compositional Methods

336

(4)

16.4.1 The k-String Composition Approach

337

(1)

16.4.2 Complete Composition Vector

338

(1)

16.4.3 Fast Algorithms to Compute Composition Vectors

339

(1)

16.5 Alignment-Free Exact Word Matches Methods

340

(4)

16.5.1 D2 and its Distributional Regimes

340

(2)

16.5.2 An Extension to Mismatches and the Choice of the Optimal Word Size

342

(1)

16.5.3 The Transformation of D2 into a Method Assessing the Statistical Significance of Sequence Similarity

343

(1)

16.6 Domains of Biological Application

344

(5)

16.6.1 Phylogeny: Information Theoretic and Combinatorial Methods

345

(1)

16.6.2 Phylogeny: Compositional Methods

346

(1)

16.6.3 CIS Regulatory Modules

347

(1)

16.6.4 DNA Sequence Dependencies

348

(1)

16.7 Datasets and Software for Experimental Algorithmics

349

(5)

16.7.1 Datasets

350

(3)

16.7.2 Software

353

(1)

16.8 Conclusions

354

(1)

References

355

(6)

17 In Silico Methods for the Analysis of Metabolites and Drug Molecules

361

(22)

Varun Khanna

Shoba Ranganathan

17.1 Introduction

361

(2)

17.1.1 Chemoinformatics and "Drug-Likeness"

361

(2)

17.2 Molecular Descriptors

363

(4)

17.2.1 One-Dimensional (1-D) Descriptors

363

(1)

17.2.2 Two-Dimensional (2-D) Descriptors

364

(2)

17.2.3 Three-Dimensional (3-D) Descriptors

366

(1)

17.3 Databases

367

(3)

17.3.1 PubChem

367

(2)

17.3.2 Chemical Entities of Biological Interest (ChEBI)

369

(1)

17.3.3 ChemBank

369

(1)

17.3.4 ChemlDplus

369

(1)

17.3.5 ChemDB

369

(1)

17.4 Methods and Data Analysis Algorithms

370

(6)

17.4.1 Simple Count Methods

370

(1)

17.4.2 Enhanced Simple Count Methods, Using Structural Features

371

(1)

17.4.3 ML Methods

372

(4)

17.5 Conclusions

376

(1)

Acknowledgments

377

(1)

References

377

(6)

III MOTIF FINDING AND STRUCTURE PREDICTION

383

(164)

18 Motif Finding Algorithms in Biological Sequences

385

(12)

Tarek El Falah

Mourad Elloumi

Thierry Lecroq

18.1 Introduction

385

(2)

18.3 The Planted (l, d)-Motif Problem

387

(4)

18.3.1 Formulation

387

(1)

18.3.2 Algorithms

387

(4)

18.4 The Extended (l, d)-Motif Problem

391

(1)

18.4.1 Formulation

391

(1)

18.4.2 Algorithms

391

(1)

18.5 The Edited Motif Problem

392

(1)

18.5.1 Formulation

392

(1)

18.5.2 Algorithms

393

(1)

18.6 The Simple Motif Problem

393

(2)

18.6.1 Formulation

393

(1)

18.6.2 Algorithms

394

(1)

18.7 Conclusion

395

(1)

References

396

(1)

19 Computational Characterization of Regulatory Regions

397

(28)

Enrique Blanco

19.1 The Genome Regulatory Landscape

397

(3)

19.2 Qualitative Models of Regulatory Signals

400

(1)

19.3 Quantitative Models of Regulatory Signals

401

(2)

19.4 Detection of Dependencies in Sequences

403

(2)

19.5 Repositories of Regulatory Information

405

(1)

19.6 Using Predictive Models to Annotate Sequences

406

(2)

19.7 Comparative Genomics Characterization

408

(2)

19.8 Sequence Comparisons

410

(2)

19.9 Combining Motifs and Alignments

412

(2)

19.10 Experimental Validation

414

(3)

19.11 Summary

417

(1)

References

417

(8)

20 Algorithmic Issues in the Analysis of Chip-Seq Data

425

(24)

Federico Zambelli

Giulio Pavesi

20.1 Introduction

425

(4)

20.2 Mapping Sequences on the Genome

429

(5)

20.3 Identifying Significantly Enriched Regions

434

(4)

20.3.1 ChIP-Seq Approaches to the Identification of DNA Structure Modifications

437

(1)

20.4 Deriving Actual Transcription Factor Binding Sites

438

(6)

20.5 Conclusions

444

(1)

References

444

(5)

21 Approaches and Methods for Operon Prediction Based on Machine Learning Techniques

449

(30)

Yan Wang

You Zhou

Chunguang Zhou

Shuqin Wang

Wei Du

Chen Zhang

Yanchun Liang

21.1 Introduction

449

(2)

21.2 Datasets, Features, and Preprocesses for Operon Prediction

451

(9)

21.2.1 Operon Datasets

451

(3)

21.2.2 Features

454

(5)

21.2.3 Preprocess Methods

459

(1)

21.3 Machine Learning Prediction Methods for Operon Prediction

460

(14)

21.3.1 Hidden Markov Model

461

(1)

21.3.2 Linkage Clustering

462

(2)

21.3.3 Bayesian Classifier

464

(3)

21.3.4 Bayesian Network

467

(1)

21.3.5 Support Vector Machine

468

(2)

21.3.6 Artificial Neural Network

470

(1)

21.3.7 Genetic Algorithms

471

(1)

21.3.8 Several Combinations

472

(2)

21.4 Conclusions

474

(1)

21.5 Acknowledgments

475

(1)

References

475

(4)

22 Protein Function Prediction with Data-Mining Techniques

479

(22)

Xing-Ming Zhao

Luonan Chen

22.1 Introduction

479

(1)

22.2 Protein Annotation Based on Sequence

480

(4)

22.2.1 Protein Sequence Classification

480

(3)

22.2.2 Protein Subcellular Localization Prediction

483

(1)

22.3 Protein Annotation Based on Protein Structure

484

(1)

22.4 Protein Function Prediction Based on Gene-Expression Data

485

(1)

22.5 Protein Function Prediction Based on Protein Interactome Map

486

(3)

22.5.1 Protein Function Prediction Based on Local Topology Structure of Interaction Map

486

(2)

22.5.2 Protein Function Prediction Based on Global Topology of Interaction Map

488

(1)

22.6 Protein Function Prediction Based on Data Integration

489

(2)

22.7 Conclusions and Perspectives

491

(2)

References

493

(8)

23 Protein Domain Boundary Prediction

501

(20)

Paul D. Yoo

Bing Bing Zhou

Albert Y. Zomaya

23.1 Introduction

501

(2)

23.2 Profiling Technique

503

(7)

23.2.1 Nonlocal Interaction and Vanishing Gradient Problem

506

(1)

23.2.2 Hierarchical Mixture of Experts

506

(2)

23.2.3 Overall Modular Kernel Architecture

508

(2)

23.3 Results

510

(2)

23.4 Discussion

512

(3)

23.4.1 Nonlocal Interactions in Amino Acids

512

(1)

23.4.2 Secondary Structure Information

513

(1)

23.4.3 Hydrophobicity and Profiles

514

(1)

23.4.4 Domain Assignment Is More Accurate for Proteins with Fewer Domains

514

(1)

23.5 Conclusions

515

(1)

References

515

(6)

24 An Introduction to rna Structure and Pseudoknot Prediction

521

(26)

Jana Sperschneider

Amitava Datta

24.1 Introduction

521

(1)

24.2 RNA Secondary Structure Prediction

522

(12)

24.2.1 Minimum Free Energy Model

524

(2)

24.2.2 Prediction of Minimum Free Energy Structure

526

(4)

24.2.3 Partition Function Calculation

530

(3)

24.2.4 Base Pair Probabilities

533

(1)

24.3 RNA Pseudoknots

534

(9)

24.3.1 Biological Relevance

536

(1)

24.3.2 RNA Pseudoknot Prediction

537

(1)

24.3.3 Dynamic Programming

538

(3)

24.3.4 Heuristic Approaches

541

(1)

24.3.5 Pseudoknot Detection

542

(1)

24.3.6 Overview

542

(1)

24.4 Conclusions

543

(1)

References

544

(3)

IV PHYLOGENY RECONSTRUCTION

547

(76)

25 Phylogenetic Search Algorithms for Maximum Likelihood

549

(30)

Alexandros Stamatakis

25.1 Introduction

549

(3)

25.1.1 Phylogenetic Inference

550

(2)

25.2 Computing the Likelihood

552

(3)

25.3 Accelerating the PLF by Algorithmic Means

555

(3)

25.3.1 Reuse of Values Across Probability Vectors

555

(2)

25.3.2 Gappy Alignments and Pointer Meshes

557

(1)

25.4 Alignment Shapes

558

(1)

25.5 General Search Heuristics

559

(7)

25.5.1 Lazy Evaluation Strategies

563

(1)

25.5.2 Further Heuristics

564

(1)

25.5.3 Rapid Bootstrapping

565

(1)

25.6 Computing the Robinson Foulds Distance

566

(2)

25.7 Convergence Criteria

568

(4)

25.7.1 Asymptotic Stopping

569

(3)

25.8 Future Directions

572

(1)

References

573

(6)

26 Heuristic Methods for Phylogenetic Reconstruction with Maximum Parsimony

579

(20)

Adrien Goeffon

Jean-Michel Richer

Jin-Kao Hao

26.1 Introduction

579

(1)

26.2 Definitions and Formal Background

580

(1)

26.2.1 Parsimony and Maximum Parsimony

580

(1)

26.3 Methods

581

(13)

26.3.1 Combinatorial Optimization

581

(1)

26.3.2 Exact Approach

582

(1)

26.3.3 Local Search Methods

582

(6)

26.3.4 Evolutionary Metaheuristics and Genetic Algorithms

588

(2)

26.3.5 Memetic Methods

590

(2)

26.3.6 Problem-Specific Improvements

592

(2)

26.4 Conclusion

594

(1)

References

595

(4)

27 Maximum Entropy Method for Composition Vector Method

599

(24)

Raymond H.-F. Chan

Roger W. Wang

Jeff C.-F. Wong

27.1 Introduction

599

(2)

27.2 Models and Entropy Optimization

601

(13)

27.2.1 Definitions

601

(2)

27.2.2 Denoising Formulas

603

(8)

27.2.3 Distance Measure

611

(2)

27.2.4 Phylogenetic Tree Construction

613

(1)

27.3 Application and Dicussion

614

(5)

27.3.1 Example 1

614

(1)

27.3.2 Example 2

614

(1)

27.3.3 Example 3

615

(2)

27.3.4 Example 4

617

(2)

27.4 Concluding Remarks

619

(1)

References

619

(4)

V MICROARRAY DATA ANALYSIS

623

(100)

28 Microarray Gene Expression Data Analysis

625

(26)

Alan Wee-Chung Liew

Xiangchao Gan

28.1 Introduction

625

(1)

28.2 DNA Microarray Technology and Experiment

626

(1)

28.3 Image Analysis and Expression Data Extraction

627

(3)

28.3.1 Image Preprocessing

628

(1)

28.3.2 Block Segmentation

628

(1)

28.3.3 Automatic Gridding

628

(1)

28.3.4 Spot Extraction

628

(2)

28.4 Data Processing

630

(1)

28.4.1 Background Correction

630

(1)

28.4.2 Normalization

630

(1)

28.4.3 Data Filtering

631

(1)

28.5 Missing Value Imputation

631

(3)

28.6 Temporal Gene Expression Profile Analysis

634

(6)

28.7 Cyclic Gene Expression Profiles Detection

640

(7)

28.7.1 SSA-AR Spectral Estimation

643

(1)

28.7.2 Spectral Estimation by Signal Reconstruction

644

(2)

28.7.3 Statistical Hypothesis Testing for Periodic Profile Detection

646

(1)

28.8 Summary

647

(1)

Acknowledgments

648

(1)

References

649

(2)

29 Biclustering of Microarray Data

651

(14)

Wassim Ayadi

Mourad Elloumi

29.1 Introduction

651

(1)

29.2 Types of Biclusters

652

(1)

29.3 Groups of Biclusters

653

(1)

29.4 Evaluation Functions

654

(2)

29.5 Systematic and Stochastic Biclustering Algorithms

656

(3)

29.6 Biological Validation

659

(2)

29.7 Conclusion

661

(1)

References

661

(4)

30 Computational Models for Condition-Specific Gene and Pathway Inference

665

(26)

Yu-Qing Qiu

Shihua Zhang

Xiang-Sun Zhang

Luonan Chen

30.1 Introduction

665

(1)

30.2 Condition-Specific Pathway Identification

666

(15)

30.2.1 Gene Set Analysis

667

(4)

30.2.2 Condition-Specific Pathway Inference

671

(10)

30.3 Disease Gene Prioritization and Genetic Pathway Detection

681

(3)

30.4 Module Networks

684

(1)

30.5 Summary

685

(1)

Acknowledgments

685

(1)

References

685

(6)

31 Heterogeneity of Differential Expression in Cancer Studies: Algorithms and Methods

691

(32)

Radha Krishna Murthy Karuturi

31.1 Introduction

691

(1)

31.2 Notations

692

(2)

31.3 Differential Mean of Expression

694

(5)

31.3.1 Single Factor Differential Expression

695

(2)

31.3.2 Multifactor Differential Expression

697

(1)

31.3.3 Empirical Bayes Extension

698

(1)

31.4 Differential Variability of Expression

699

(2)

31.4.1 F-Test for Two-Group Differential Variability Analysis

699

(1)

31.4.2 Bartlett's and Levene's Tests for Multigroup Differential Variability Analysis

700

(1)

31.5 Differential Expression in Compendium of Tumors

701

(4)

31.5.1 Gaussian Mixture Model (GMM) for Finite Levels of Expression

701

(2)

31.5.2 Outlier Detection Strategy

703

(1)

31.5.3 Kurtosis Excess

704

(1)

31.6 Differential Expression by Chromosomal Aberrations: The Local Properties

705

(6)

31.6.1 Wavelet Variance Scanning (WAVES) for Single-Sample Analysis

708

(1)

31.6.2 Local Singular Value Decomposition (LSVD) for Compendium of Tumors

709

(1)

31.6.3 Locally Adaptive Statistical Procedure (LAP) for Compendium of Tumors with Control Samples

710

(1)

31.7 Differential Expression in Gene Interactome

711

(3)

31.7.1 Friendly Neighbors Algorithm: A Multiplicative Interactome

711

(1)

31.7.2 GeneRank: A Contributing Interactome

712

(1)

31.7.3 Top Scoring Pairs (TSP): A Differential Interactome

713

(1)

31.8 Differential Coexpression: Global MultiDimensional Interactome

714

(6)

31.8.1 Kostka and Spang's Differential Coexpression Algorithm

715

(3)

31.8.2 Differential Expression Linked Differential Coexpression

718

(1)

31.8.3 Differential Friendly Neighbors (DiffFNs)

718

(2)

Acknowledgments

720

(1)

References

720

(3)

VI ANALYSIS OF GENOMES

723

(142)

32 Comparative Genomics: Algorithms and Applications

725

(24)

Xiao Yang

Srinivas Aluru

32.1 Introduction

725

(2)

32.2 Notations

727

(1)

32.3 Ortholog Assignment

727

(7)

32.3.1 Sequence Similarity-Based Method

729

(2)

32.3.2 Phylogeny-Based Method

731

(1)

32.3.3 Rearrangement-Based Method

732

(2)

32.4 Gene Cluster and Synteny Detection

734

(9)

32.4.1 Synteny Detection

736

(3)

32.4.2 Gene Cluster Detection

739

(4)

32.5 Conclusions

743

(1)

References

743

(6)

33 Advances in Genome Rearrangement Algorithms

749

(24)

Masud Hasan

M. Sohel Rahman

33.1 Introduction

749

(3)

33.2 Preliminaries

752

(1)

33.3 Sorting by Reversals

753

(6)

33.3.1 Approaches to Approximation Algorithms

754

(3)

33.3.2 Signed Permutations

757

(2)

33.4 Sorting by Transpositions

759

(2)

33.4.1 Approximation Results

760

(1)

33.4.2 Improved Running Time and Simpler Algorithms

761

(1)

33.5 Other Operations

761

(2)

33.5.1 Sorting by Prefix Reversals

761

(1)

33.5.2 Sorting by Prefix Transpositions

762

(1)

33.5.3 Sorting by Block Interchange

762

(1)

33.5.4 Short Swap and Fixed-Length Reversals

763

(1)

33.6 Sorting by More Than One Operation

763

(2)

33.6.1 Unified Operation: Doule Cut and Join

764

(1)

33.7 Future Research Directions

765

(1)

33.8 Notes on Software

766

(1)

References

767

(6)

34 Computing Genomic Distances: An Algorithmic Viewpoint

773

(26)

Guillaume Fertin

Irena Rusu

34.1 Introduction

773

(2)

34.1.1 What this
Chapter is About

773

(1)

34.1.2 Definitions and Notations

774

(1)

34.1.3 Organization of the
Chapter

775

(1)

34.2 Interval-Based Criteria

775

(10)

34.2.1 Brief Introduction

775

(1)

34.2.2 The Context and the Problems

776

(2)

34.2.3 Common Intervals in Permutations and the Commuting Generators Strategy

778

(4)

34.2.4 Conserved Intervals in Permutations and the Bound-and-Drop Strategy

782

(1)

34.2.5 Common Intervals in Strings and the Element Plotting Strategy

783

(2)

34.2.6 Variants

785

(1)

34.3 Character-Based Criteria

785

(10)

34.3.1 Introduction and Definition of the Problems

785

(2)

34.3.2 An Approximation Algorithm for BAL-FMB

787

(4)

34.3.3 An Exact Algorithm for UNBAL-FMB

791

(4)

34.3.4 Other Results and Open Problems

795

(1)

34.4 Conclusion

795

(1)

References

796

(3)

35 Wavelet Algorithms for Dna Analysis

799

(44)

Carlo Cattani

35.1 Introduction

799

(3)

35.2 DNA Representation

802

(10)

35.2.1 Preliminary Remarks on DNA

802

(1)

35.2.2 Indicator Function

803

(3)

35.2.3 Representation

806

(1)

35.2.4 Representation Models

807

(1)

35.2.5 Constraints on the Representation in R2

808

(2)

35.2.6 Complex Representation

810

(1)

35.2.7 DNA Walks

810

(2)

35.3 Statistical Correlations in DNA

812

(6)

35.3.1 Long-Range Correlation

812

(2)

35.3.2 Power Spectrum

814

(3)

35.3.3 Complexity

817

(1)

35.4 Wavelet Analysis

818

(5)

35.4.1 Haar Wavelet Basis

819

(1)

35.4.2 Haar Series

819

(2)

35.4.3 Discrete Haar Wavelet Transform

821

(2)

35.5 Haar Wavelet Coefficients and Statistical Parameters

823

(3)

35.6 Algorithm of the Short Haar Discrete Wavelet Transform

826

(2)

35.7 Clusters of Wavelet Coefficients

828

(10)

35.7.1 Cluster Analysis of the Wavelet Coefficients of the Complex DNA Representation

830

(4)

35.7.2 Cluster Analysis of the Wavelet Coefficients of DNA Walks

834

(4)

35.8 Conclusion

838

(1)

References

839

(4)

36 Haplotype Inference Models and Algorithms

843

(22)

Ling-Yun Wu

36.1 Introduction

843

(1)

36.2 Problem Statement and Notations

844

(2)

36.3 Combinatorial Methods

846

(5)

36.3.1 Clark's Inference Rule

846

(2)

36.3.2 Pure Parsimony Model

848

(1)

36.3.3 Phylogeny Methods

849

(2)

36.4 Statistical Methods

851

(2)

36.4.1 Maximum Likelihood Methods

851

(1)

36.4.2 Bayesian Methods

852

(1)

36.4.3 Markov Chain Methods

852

(1)

36.5 Pedigree Methods

853

(3)

36.5.1 Minimum Recombinant Haplotype Configurations

854

(1)

36.5.2 Zero Recombinant Haplotype Configurations

854

(1)

36.5.3 Statistical Methods

855

(1)

36.6 Evaluation

856

(2)

36.6.1 Evaluation Measurements

856

(1)

36.6.2 Comparisons

857

(1)

36.6.3 Datasets

857

(1)

36.7 Discussion

858

(1)

References

859

(6)

VII ANALYSIS OF BIOLOGICAL NETWORKS

865

(142)

37 Untangling Biological Networks Using Bioinformatics

867

(26)

Gaurav Kumar

Adrian P. Cootes

Shoba Ranganathan

37.1 Introduction

867

(11)

37.1.1 Predicting Biological Processes: A Major Challenge to Understanding Biology

867

(1)

37.1.2 Historical Perspective and Mathematical Preliminaries of Networks

868

(2)

37.1.3 Structural Properties of Biological Networks

870

(3)

37.1.4 Local Topology of Biological Networks: Functional Motifs, Modules, and Communities

873

(5)

37.2 Types of Biological Networks

878

(6)

37.2.1 Protein-Protein Interaction Networks

878

(1)

37.2.2 Metabolic Networks

879

(2)

37.2.3 Transcriptional Networks

881

(2)

37.2.4 Other Biological Networks

883

(1)

37.3 Network Dynamic, Evolution and Disease

884

(3)

37.3.1 Biological Network Dynamic and Evolution

884

(2)

37.3.2 Biological Networks and Disease

886

(1)

37.4 Future Challenges and Scope

887

(1)

Acknowledgments

887

(1)

References

888

(5)

38 Probabilistic Approaches for Investigating Biological Networks

893

(22)

Jeremie Bourdon

Damien Eveillard

38.1 Probabilistic Models for Biological Networks

894

(8)

38.1.1 Boolean Networks

895

(5)

38.1.2 Probabilistic Boolean Networks: A Natural Extension

900

(1)

38.1.3 Inferring Probabilistic Models from Experiments

901

(1)

38.2 Interpretation and Quantitative Analysis of Probabilistic Models

902

(9)

38.2.1 Dynamical Analysis and Temporal Properties

902

(3)

38.2.2 Impact of Update Strategies for Analyzing Probabilistic Boolean Networks

905

(1)

38.2.3 Simulations of a Probabilistic Boolean Network

906

(5)

38.3 Conclusion

911

(1)

Acknowledgments

911

(1)

References

911

(4)

39 Modeling and Analysis of Biological Networks with Model Checking

915

(26)

Dragan Bosnacki

Peter A.J. Hilbers

Ronny S. Mans

Erik P. de Vink

39.1 Introduction

915

(1)

39.2 Preliminaries

916

(3)

39.2.1 Model Checking

916

(1)

39.2.2 SPIN and Promela

917

(1)

39.2.3 LTL

918

(1)

39.3 Analyzing Genetic Networks with Model Checking

919

(6)

39.3.1 Boolean Regulatory Networks

919

(1)

39.3.2 A Case Study

919

(2)

39.3.3 Translating Boolean Regulatory Graphs into Promela

921

(1)

39.3.4 Some Results

922

(2)

39.3.5 Concluding Remarks

924

(1)

39.3.6 Related Work and Bibliographic Notes

924

(1)

39.4 Probabilistic Model Checking for Biological Systems

925

(11)

39.4.1 Motivation and Background

926

(1)

39.4.2 A Kinetic Model of mRNA Translation

927

(1)

39.4.3 Probabilistic Model Checking

928

(1)

39.4.4 The Prism Model

929

(4)

39.4.5 Insertion Errors

933

(1)

39.4.6 Concluding Remarks

934

(1)

39.4.7 Related Work and Bibliographic Notes

935

(1)

References

936

(5)

40 Reverse Engineering of Molecular Networks from a Common Combinatorial Approach

941

(14)

Bhaskar DasGupta

Paola Vera-Licona

Eduardo Sontag

40.1 Introduction

941

(1)

40.2 Reverse-Engineering of Biological Networks

942

(4)

40.2.1 Evaluation of the Performance of Reverse-Engineering Methods

945

(1)

40.3 Classical Combinatorial Algorithms: A Case Study

946

(5)

40.3.1 Benchmarking RE Combinatorial-Based Methods

947

(3)

40.3.2 Software Availability

950

(1)

40.4 Concluding Remarks

951

(1)

Acknowledgments

951

(1)

References

951

(4)

41 Unsupervised Learning for gene Regulation Network Inference from Expression Data: A Review

955

(24)

Mohamed Elati

Celine Rouveirol

41.1 Introduction

955

(1)

41.2 Gene Networks: Definition and Properties

956

(2)

41.3 Gene Expression: Data and Analysis

958

(1)

41.4 Network Inference as an Unsupervised Learning Problem

959

(1)

41.5 Correlation-Based Methods

959

(2)

41.6 Probabilistic Graphical Models

961

(2)

41.7 Constraint-Based Data Mining

963

(6)

41.7.1 Multiple Usages of Extracted Patterns

965

(1)

41.7.2 Mining Gene Regulation from Transcriptome Datasets

966

(3)

41.8 Validation

969

(4)

41.8.1 Statistical Validation of Network Inference

970

(2)

41.8.2 Biological Validation

972

(1)

41.9 Conclusion and Perspectives

973

(1)

References

974

(5)

42 Approaches to Construction and Analysis of Microrna-Mediated Networks

979

(28)

Ilana Lichtenstein

Albert Zomaya

Jennifer Gamble

Mathew Vadas

42.1 Introduction

979

(3)

42.1.1 miRNA-mediated Genetic Regulatory Networks

979

(2)

42.1.2 The Four Levels of Regulation in GRNs

981

(1)

42.1.3 Overview of Sections

982

(1)

42.2 Fundamental Component Interaction Research: Predicting miRNA Genes, Regulators, and Targets

982

(6)

42.2.1 Prediction of Novel miRNA Genes

983

(1)

42.2.2 Prediction of miRNA Targets

984

(1)

42.2.3 Prediction of miRNA Transcript Elements and Transcriptional Regulation

984

(4)

42.3 Identifying miRNA-mediated Networks

988

(5)

42.3.1 Forward Engineering---Construction of Multinode Components in miRNA-mediated Networks Using Paired Interaction Information

988

(1)

42.3.2 Reverse Engineering---Inference of MicroRNA Modules Using Top-Down Approaches

988

(5)

42.4 Global and Local Architecture Analysis in miRNA-Containing Networks

993

(8)

42.4.1 Global Architecture Properties of miRNA-mediated Post-transcriptional Networks

993

(1)

42.4.2 Local Architecture Properties of miRNA-mediated Post-transcriptional Networks

994

(7)

42.5 Conclusion

1001

(1)

References

1001

(6)

Index

1007

Mourad Elloumi, PhD, is Associate Professor in Computer Science, Faculty of Economic Sciences and Management of Tunis (Tunisia), and member of the Unit of Technologies of Information and Communication (UTIC). He is the author/coauthor of more than forty publications in international journals and conferences. Professor Elloumi was the guest editor of a special issue on biological knowledge discovery and data mining in Knowledge-Based Systems and the coeditor of the proceedings of two international conferences.

Albert Y. Zomaya, PhD, is the Chair Professor of High Performance Computing and Networking in the School of Information Technologies at The University of Sydney (Australia). He is the author/coauthor of eight books and more than 400 publications in technical journals and conferences, and the editor of eight books and eight conference volumes. Professor Zomaya is currently an associate editor for twenty journals, the Founding Editor of the Wiley Series on Parallel and Distributed Computing, and a Founding Coeditor of the Wiley Series in Bioinformatics.

Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications [Kõva köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv