Muutke küpsiste eelistusi

Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications [Kõva köide]

Edited by (University of Tunis El Manar, Tunisia), Edited by (University of Western Australia), Series edited by (Department of Computer Science, Georgia State University)
  • Formaat: Hardback, 1080 pages, kõrgus x laius x paksus: 243x165x64 mm, kaal: 1692 g
  • Sari: Wiley Series in Bioinformatics
  • Ilmumisaeg: 18-Feb-2011
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 0470505192
  • ISBN-13: 9780470505199
Teised raamatud teemal:
  • Formaat: Hardback, 1080 pages, kõrgus x laius x paksus: 243x165x64 mm, kaal: 1692 g
  • Sari: Wiley Series in Bioinformatics
  • Ilmumisaeg: 18-Feb-2011
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 0470505192
  • ISBN-13: 9780470505199
Teised raamatud teemal:
This book represents the most comprehensive and up-to-date collection of information on the topic of computational molecular biology.  Bringing the most recent research into the forefront of discussion, Algorithms in Computational Molecular Biology studies the most important and useful algorithms currently being used in the field, and provides related problems.  It also succeeds where other titles have failed, in offering a wide range of information from the introductory fundamentals right up to the latest, most advanced levels of study. 
Preface xxxi
Contributors xxxiii
I STRINGS PROCESSING AND APPLICATION TO BIOLOGICAL SEQUENCES
1(190)
1 String data Structures for Computational Molecular Biology
3(24)
Christos Makris
Evangelos Theodoridis
1.1 Introduction
3(3)
1.2 Main String Indexing Data Structures
6(6)
1.2.1 Suffix Trees
6(2)
1.2.2 Suffix Arrays
8(4)
1.3 Index Structures for Weighted Strings
12(2)
1.4 Index Structures for Indeterminate Strings
14(3)
1.5 String Data Structures in Memory Hierarchies
17(3)
1.6 Conclusions
20(1)
References
20(7)
2 Efficient Restricted-Case Algorithms for Problems in Computational Biology
27(24)
Patricia A. Evans
H. Todd Wareham
2.1 The Need for Special Cases
27(1)
2.2 Assessing Efficient Solvability Options for General Problems and Special Cases
28(2)
2.3 String and Sequence Problems
30(1)
2.4 Shortest Common Superstring
31(5)
2.4.1 Solving the General Problem
32(2)
2.4.2 Special Case: SCSt for Short Strings Over Small Alphabets
34(1)
2.4.3 Discussion
35(1)
2.5 Longest Common Subsequence
36(5)
2.5.1 Solving the General Problem
37(2)
2.5.2 Special Case: LCS of Similar Sequences
39(1)
2.5.3 Special Case: LCS Under Symbol-Occurrence Restrictions
39(1)
2.5.4 Discussion
40(1)
2.6 Common Approximate Substring
41(5)
2.6.1 Solving the General Problem
42(2)
2.6.2 Special Case: Common Approximate String
44(1)
2.6.3 Discussion
45(1)
2.7 Conclusion
46(1)
References
47(4)
3 Finite Automata in Pattern Matching
51(22)
Jan Holub
3.1 Introduction
51(2)
3.1.1 Preliminaries
52(1)
3.2 Direct Use of DFA in Stringology
53(7)
3.2.1 Forward Automata
53(3)
3.2.2 Degenerate Strings
56(1)
3.2.3 Indexing Automata
57(2)
3.2.4 Filtering Automata
59(1)
3.2.5 Backward Automata
59(1)
3.2.6 Automata with Fail Function
60(1)
3.3 NFA Simulation
60(6)
3.3.1 Basic Simulation Method
61(1)
3.3.2 Bit Parallelism
61(2)
3.3.3 Dynamic Programming
63(3)
3.3.4 Basic Simulation Method with Deterministic State Cache
66(1)
3.4 Finite Automaton as Model of Computation
66(1)
3.5 Finite Automata Composition
67(1)
3.6 Summary
67(2)
References
69(4)
4 New Developments in Processing of Degenerate Sequences
73(18)
Pavlos Antoniou
Costas S. Iliopoulos
4.1 Introduction
73(1)
4.1.1 Degenerate Primer Design Problem
74(1)
4.2 Background
74(2)
4.3 Basic Definitions
76(3)
4.4 Repetitive Structures in Degenerate Strings
79(5)
4.4.1 Using the Masking Technique
79(1)
4.4.2 Computing the Smallest Cover of the Degenerate String x
79(2)
4.4.3 Computing Maximal Local Covers of x
81(3)
4.4.4 Computing All Covers of x
84(1)
4.4.5 Computing the Seeds of x
84(1)
4.5 Conservative String Covering in Degenerate Strings
84(4)
4.5.1 Finding Constrained Pattern p in Degenerate String T
85(1)
4.5.2 Computing λ-Conservative Covers of Degenerate Strings
86(1)
4.5.3 Computing λ-Conservative Seeds of Degenerate Strings
87(1)
4.6 Conclusion
88(1)
References
89(2)
5 Exact Search Algorithms for Biological Sequences
91(22)
Eric Rivals
Leena Salmela
Jorma Tarhio
5.1 Introduction
91(2)
5.2 Single Pattern Matching Algorithms
93(4)
5.2.1 Algorithms for DNA Sequences
94(2)
5.2.2 Algorithms for Amino Acids
96(1)
5.3 Algorithms for Multiple Patterns
97(6)
5.3.1 Trie-Based Algorithms
97(3)
5.3.2 Filtering Algorithms
100(3)
5.3.3 Other Algorithms
103(1)
5.4 Application of Exact Set Pattern Matching for Read Mapping
103(4)
5.4.1 MPSCAN: An Efficient Exact Set Pattern Matching Tool for DNA/RNA Sequences
103(1)
5.4.2 Other Solutions for Mapping Reads
104(1)
5.4.3 Comparison of Mapping Solutions
105(2)
5.5 Conclusions
107(1)
References
108(5)
6 Algorithmic Aspects of Arc-Annotated Sequences
113(16)
Guillaume Blin
Maxime Crochemore
Stephane Vialette
6.1 Introduction
113(1)
6.2 Preliminaries
114(3)
6.2.1 Arc-Annotated Sequences
114(1)
6.2.2 Hierarchy
114(1)
6.2.3 Refined Hierarchy
115(1)
6.2.4 Alignment
115(1)
6.2.5 Edit Operations
116(1)
6.3 Longest Arc-Preserving Common Subsequence
117(3)
6.3.1 Definition
117(1)
6.3.2 Classical Complexity
118(1)
6.3.3 Parameterized Complexity
119(1)
6.3.4 Approximability
120(1)
6.4 Arc-Preserving Subsequence
120(2)
6.4.1 Definition
120(1)
6.4.2 Classical Complexity
121(1)
6.4.3 Classical Complexity for the Refined Hierarchy
121(1)
6.4.4 Open Problems
122(1)
6.5 Maximum Arc-Preserving Common Subsequence
122(1)
6.5.1 Definition
122(1)
6.5.2 Classical Complexity
123(1)
6.5.3 Open Problems
123(1)
6.6 Edit Distance
123(2)
6.6.1 Definition
123(1)
6.6.2 Classical Complexity
123(2)
6.6.3 Approximability
125(1)
6.6.4 Open Problems
125(1)
References
125(4)
7 Algorithmic Issues in Dna Barcoding Problems
129(14)
Bhaskar DasGupta
Ming-Yang Kao
Ion Mandoiu
7.1 Introduction
129(1)
7.2 Test Set Problems: A General Framework for Several Barcoding Problems
130(2)
7.3 A Synopsis of Biological Applications of Barcoding
132(1)
7.4 Survey of Algorithmic Techniques on Barcoding
133(2)
7.4.1 Integer Programming
134(1)
7.4.2 Lagrangian Relaxation and Simulated Annealing
134(1)
7.4.3 Provably Asymptotically Optimal Results
134(1)
7.5 Information Content Approach
135(1)
7.6 Set-Covering Approach
136(3)
7.6.1 Set-Covering Implementation in More Detail
137(2)
7.7 Experimental Results and Software Availability
139(1)
7.7.1 Randomly Generated Instances
139(1)
7.7.2 Real Data
140(1)
7.7.3 Software Availability
140(1)
7.8 Concluding Remarks
140(1)
References
141(2)
8 Recent Advances in Weighted Dna Sequences
143(28)
Manolis Christodoulakis
Costas S. Iliopoulos
8.1 Introduction
143(3)
8.2 Preliminaries
146(2)
8.2.1 Strings
146(1)
8.2.2 Weighted Sequences
147(1)
8.3 Indexing
148(4)
8.3.1 Weighted Suffix Tree
148(3)
8.3.2 Property Suffix Tree
151(1)
8.4 Pattern Matching
152(5)
8.4.1 Pattern Matching Using the Weighted Suffix Tree
152(1)
8.4.2 Pattern Matching Using Match Counts
153(1)
8.4.3 Pattern Matching with Gaps
154(2)
8.4.4 Pattern Matching with Swaps
156(1)
8.5 Approximate Pattern Matching
157(3)
8.5.1 Hamming Distance
157(3)
8.6 Repetitions, Covers, and Tandem Repeats
160(4)
8.6.1 Finding Simple Repetitions with the Weighted Suffix Tree
161(1)
8.6.2 Fixed-Length Simple Repetitions
161(2)
8.6.3 Fixed-Length Strict Repetitions
163(1)
8.6.4 Fixed-Length Tandem Repeats
163(1)
8.6.5 Identifying Covers
164(1)
8.7 Motif Discovery
164(2)
8.7.1 Approximate Motifs in a Single Weighted Sequence
164(1)
8.7.2 Approximate Common Motifs in a Set of Weighted Sequences
165(1)
8.8 Conclusions
166(1)
References
167(4)
9 Dna Computing for Subgraph Isomorphism Problem and Related Problems
171(20)
Sun-Yuan Hsieh
Chao-Wen Huang
Hsin-Hung Chou
9.1 Introduction
171(1)
9.2 Definitions of Subgraph Isomorphism Problem and Related Problems
172(2)
9.3 DNA Computing Models
174(1)
9.3.1 The Stickers
174(1)
9.3.2 The Adleman-Lipton Model
175(1)
9.4 The Sticker-based Solution Space
175(4)
9.4.1 Using Stickers for Generating the Permutation Set
176(1)
9.4.2 Using Stickers for Generating the Solution Space
177(2)
9.5 Algorithms for Solving Problems
179(8)
9.5.1 Solving the Subgraph Isomorphism Problem
179(4)
9.5.2 Solving the Graph Isomorphism Problem
183(1)
9.5.3 Solving the Maximum Common Subgraph Problem
184(3)
9.6 Experimental Data
187(1)
9.7 Conclusion
188(1)
References
188(3)
II ANALYSIS OF BIOLOGICAL SEQUENCES
191(192)
10 Graphs in Bioinformatics
193(28)
Elsa Chacko
Shoba Ranganathan
10.1 Graph theory---Origin
193(14)
10.1.1 What is a Graph?
193(1)
10.1.2 Types of Graphs
194(6)
10.1.3 Well-Known Graph Problems and Algorithms
200(7)
10.2 Graphs and the Biological World
207(9)
10.2.1 Alternative Splicing and Graphs
207(1)
10.2.2 Evolutionary Tree Construction
208(1)
10.2.3 Tracking the Temporal Variation of Biological Systems
209(1)
10.2.4 Identifying Protein Domains by Clustering Sequence Alignments
210(1)
10.2.5 Clustering Gene Expression Data
211(1)
10.2.6 Protein Structural Domain Decomposition
212(1)
10.2.7 Optimal Design of Thermally Stable Proteins
212(2)
10.2.8 The Sequencing by Hybridization (SBH) Problem
214(1)
10.2.9 Predicting Interactions in Protein Networks by Completing Defective Cliques
215(1)
10.3 Conclusion
216(1)
References
216(5)
11 A Flexible Data Store for Managing Bioinformatics Data
221(20)
Bassam A. Alqaralleh
Chen Wang
Bing Bing Zhou
Albert Y. Zomaya
11.1 Introduction
221(2)
11.1.1 Background
222(1)
11.1.2 Scalability Challenges
222(1)
11.2 Data Model and System Overview
223(4)
11.3 Replication and Load Balancing
227(3)
11.3.1 Replicating an Index Node
228(1)
11.3.2 Answering Range Queries with Replicas
229(1)
11.4 Evaluation
230(7)
11.4.1 Point Query Processing Performance
230(3)
11.4.2 Range Query Processing Performance
233(2)
11.4.3 Growth of the Replicas of an Indexing Node
235(2)
11.5 Related Work
237(1)
11.6 Summary
237(1)
References
238(3)
12 Algorithms for the Alignment of Biological Sequences
241(20)
Ahmed Mokaddem
Mourad Elloumi
12.1 Introduction
241(1)
12.2 Alignment Algorithms
242(9)
12.2.1 Pairwise Alignment Algorithms
242(3)
12.2.2 Multiple Alignment Algorithms
245(6)
12.3 Score Functions
251(1)
12.4 Benchmarks
252(3)
12.5 Conclusion
255(1)
Acknowledgments
255(1)
References
255(6)
13 Algorithms for Local Structural Alignment and Structural Motif Identification
261(16)
Sanguthevar Rajasekaran
Vamsi Kundeti
Martin Schiller
13.1 Introduction
261(1)
13.2 Problem Definition of Local Structural Alignment
262(1)
13.3 Variable-Length Alignment Fragment Pair (VLAFP) Algorithm
263(3)
13.3.1 Alignment Fragment Pairs
263(1)
13.3.2 Finding the Optimal Local Alignments Based on the VLAFP Cost Function
264(2)
13.4 Structural Alignment based on Center of Gravity: SACG
266(4)
13.4.1 Description of Protein Structure in PDB Format
266(1)
13.4.2 Related Work
267(1)
13.4.3 Center-of-Gravity-Based Algorithm
267(2)
13.4.4 Extending Theorem 13.1 for Atomic Coordinates in Protein Structure
269(1)
13.4.5 Building VCOST(i,j,q) Function Based on Center of Gravity
270(1)
13.5 Searching Structural Motifs
270(3)
13.6 Using SACG Algorithm for Classification of New Protein Structures
273(1)
13.7 Experimental Results
273(1)
13.8 Accuracy Results
273(1)
13.9 Conclusion
274(1)
Acknowledgments
275(1)
References
276(1)
14 Evolution of the Clustal Family of Multiple Sequence Alignment Programs
277(22)
Mohamed Radhouene Aniba
Julie Thompson
14.1 Introduction
277(1)
14.2 Clustal-ClustalV
278(6)
14.2.1 Pairwise Similarity Scores
279(1)
14.2.2 Guide Tree
280(2)
14.2.3 Progressive Multiple Alignment
282(1)
14.2.4 An Efficient Dynamic Programming Algorithm
282(2)
14.2.5 Profile Alignments
284(1)
14.3 ClustalW
284(5)
14.3.1 Optimal Pairwise Alignments
284(1)
14.3.2 More Accurate Guide Tree
284(1)
14.3.3 Improved Progressive Alignment
285(4)
14.4 ClustalX
289(3)
14.4.1 Alignment Quality Analysis
290(2)
14.5 ClustalW and ClustalX 2.0
292(1)
14.6 DbClustal
293(2)
14.6.1 Anchored Global Alignment
294(1)
14.7 Perspectives
295(1)
References
296(3)
15 Filters and Seeds Approaches for Fast Homology Searches in Large Datasets
299(22)
Nadia Pisanti
Mathieu Giraud
Pierre Peterlongo
15.1 Introduction
299(2)
15.1.1 Homologies and Large Datasets
299(1)
15.1.2 Filter Preprocessing or Heuristics
300(1)
15.1.3 Contents
300(1)
15.2 Methods Framework
301(2)
15.2.1 Strings and Repeats
301(1)
15.2.2 Filters---Fundamental Concepts
301(2)
15.3 Lossless filters
303(6)
15.3.1 History of Lossless Filters
303(1)
15.3.2 Quasar and swift---Filtering Repeats with Edit Distance
304(1)
15.3.3 Nimbus---Filtering Multiple Repeats with Hamming Distance
305(3)
15.3.4 tuiuiu---Filtering Multiple Repeats with Edit Distance
308(1)
15.4 Lossy Seed-Based Filters
309(6)
15.4.1 Seed-Based Heuristics
310(1)
15.4.2 Advanced Seeds
311(1)
15.4.3 Latencies and Neighborhood Indexing
311(2)
15.4.4 Seed-Based Heuristics Implementations
313(2)
15.5 Conclusion
315(1)
15.6 Acknowledgments
315(1)
References
315(6)
16 Novel Combinatorial and Information-Theoretic Alignment-Free Distances for Biological DATA MINING
321(40)
Chiara Epifanio
Alessandra Gabriele
Raffaele Giancarlo
Marinella Sciortino
16.1 Introduction
321(2)
16.2 Information-Theoretic Alignment-Free Methods
323(8)
16.2.1 Fundamental Information Measures, Statistical Dependency, and Similarity of Sequences
324(1)
16.2.2 Methods Based on Relative Entropy and Empirical Probability Distributions
325(4)
16.2.3 A Method Based on Statistical Dependency, via Mutual Information
329(2)
16.3 Combinatorial Alignment-Free Methods
331(5)
16.3.1 The Average Common Substring Distance
332(1)
16.3.2 A Method Based on the EBWT Transform
333(1)
16.3.3 N-Local Decoding
334(2)
16.4 Alignment-Free Compositional Methods
336(4)
16.4.1 The k-String Composition Approach
337(1)
16.4.2 Complete Composition Vector
338(1)
16.4.3 Fast Algorithms to Compute Composition Vectors
339(1)
16.5 Alignment-Free Exact Word Matches Methods
340(4)
16.5.1 D2 and its Distributional Regimes
340(2)
16.5.2 An Extension to Mismatches and the Choice of the Optimal Word Size
342(1)
16.5.3 The Transformation of D2 into a Method Assessing the Statistical Significance of Sequence Similarity
343(1)
16.6 Domains of Biological Application
344(5)
16.6.1 Phylogeny: Information Theoretic and Combinatorial Methods
345(1)
16.6.2 Phylogeny: Compositional Methods
346(1)
16.6.3 CIS Regulatory Modules
347(1)
16.6.4 DNA Sequence Dependencies
348(1)
16.7 Datasets and Software for Experimental Algorithmics
349(5)
16.7.1 Datasets
350(3)
16.7.2 Software
353(1)
16.8 Conclusions
354(1)
References
355(6)
17 In Silico Methods for the Analysis of Metabolites and Drug Molecules
361(22)
Varun Khanna
Shoba Ranganathan
17.1 Introduction
361(2)
17.1.1 Chemoinformatics and "Drug-Likeness"
361(2)
17.2 Molecular Descriptors
363(4)
17.2.1 One-Dimensional (1-D) Descriptors
363(1)
17.2.2 Two-Dimensional (2-D) Descriptors
364(2)
17.2.3 Three-Dimensional (3-D) Descriptors
366(1)
17.3 Databases
367(3)
17.3.1 PubChem
367(2)
17.3.2 Chemical Entities of Biological Interest (ChEBI)
369(1)
17.3.3 ChemBank
369(1)
17.3.4 ChemlDplus
369(1)
17.3.5 ChemDB
369(1)
17.4 Methods and Data Analysis Algorithms
370(6)
17.4.1 Simple Count Methods
370(1)
17.4.2 Enhanced Simple Count Methods, Using Structural Features
371(1)
17.4.3 ML Methods
372(4)
17.5 Conclusions
376(1)
Acknowledgments
377(1)
References
377(6)
III MOTIF FINDING AND STRUCTURE PREDICTION
383(164)
18 Motif Finding Algorithms in Biological Sequences
385(12)
Tarek El Falah
Mourad Elloumi
Thierry Lecroq
18.1 Introduction
385(2)
18.3 The Planted (l, d)-Motif Problem
387(4)
18.3.1 Formulation
387(1)
18.3.2 Algorithms
387(4)
18.4 The Extended (l, d)-Motif Problem
391(1)
18.4.1 Formulation
391(1)
18.4.2 Algorithms
391(1)
18.5 The Edited Motif Problem
392(1)
18.5.1 Formulation
392(1)
18.5.2 Algorithms
393(1)
18.6 The Simple Motif Problem
393(2)
18.6.1 Formulation
393(1)
18.6.2 Algorithms
394(1)
18.7 Conclusion
395(1)
References
396(1)
19 Computational Characterization of Regulatory Regions
397(28)
Enrique Blanco
19.1 The Genome Regulatory Landscape
397(3)
19.2 Qualitative Models of Regulatory Signals
400(1)
19.3 Quantitative Models of Regulatory Signals
401(2)
19.4 Detection of Dependencies in Sequences
403(2)
19.5 Repositories of Regulatory Information
405(1)
19.6 Using Predictive Models to Annotate Sequences
406(2)
19.7 Comparative Genomics Characterization
408(2)
19.8 Sequence Comparisons
410(2)
19.9 Combining Motifs and Alignments
412(2)
19.10 Experimental Validation
414(3)
19.11 Summary
417(1)
References
417(8)
20 Algorithmic Issues in the Analysis of Chip-Seq Data
425(24)
Federico Zambelli
Giulio Pavesi
20.1 Introduction
425(4)
20.2 Mapping Sequences on the Genome
429(5)
20.3 Identifying Significantly Enriched Regions
434(4)
20.3.1 ChIP-Seq Approaches to the Identification of DNA Structure Modifications
437(1)
20.4 Deriving Actual Transcription Factor Binding Sites
438(6)
20.5 Conclusions
444(1)
References
444(5)
21 Approaches and Methods for Operon Prediction Based on Machine Learning Techniques
449(30)
Yan Wang
You Zhou
Chunguang Zhou
Shuqin Wang
Wei Du
Chen Zhang
Yanchun Liang
21.1 Introduction
449(2)
21.2 Datasets, Features, and Preprocesses for Operon Prediction
451(9)
21.2.1 Operon Datasets
451(3)
21.2.2 Features
454(5)
21.2.3 Preprocess Methods
459(1)
21.3 Machine Learning Prediction Methods for Operon Prediction
460(14)
21.3.1 Hidden Markov Model
461(1)
21.3.2 Linkage Clustering
462(2)
21.3.3 Bayesian Classifier
464(3)
21.3.4 Bayesian Network
467(1)
21.3.5 Support Vector Machine
468(2)
21.3.6 Artificial Neural Network
470(1)
21.3.7 Genetic Algorithms
471(1)
21.3.8 Several Combinations
472(2)
21.4 Conclusions
474(1)
21.5 Acknowledgments
475(1)
References
475(4)
22 Protein Function Prediction with Data-Mining Techniques
479(22)
Xing-Ming Zhao
Luonan Chen
22.1 Introduction
479(1)
22.2 Protein Annotation Based on Sequence
480(4)
22.2.1 Protein Sequence Classification
480(3)
22.2.2 Protein Subcellular Localization Prediction
483(1)
22.3 Protein Annotation Based on Protein Structure
484(1)
22.4 Protein Function Prediction Based on Gene-Expression Data
485(1)
22.5 Protein Function Prediction Based on Protein Interactome Map
486(3)
22.5.1 Protein Function Prediction Based on Local Topology Structure of Interaction Map
486(2)
22.5.2 Protein Function Prediction Based on Global Topology of Interaction Map
488(1)
22.6 Protein Function Prediction Based on Data Integration
489(2)
22.7 Conclusions and Perspectives
491(2)
References
493(8)
23 Protein Domain Boundary Prediction
501(20)
Paul D. Yoo
Bing Bing Zhou
Albert Y. Zomaya
23.1 Introduction
501(2)
23.2 Profiling Technique
503(7)
23.2.1 Nonlocal Interaction and Vanishing Gradient Problem
506(1)
23.2.2 Hierarchical Mixture of Experts
506(2)
23.2.3 Overall Modular Kernel Architecture
508(2)
23.3 Results
510(2)
23.4 Discussion
512(3)
23.4.1 Nonlocal Interactions in Amino Acids
512(1)
23.4.2 Secondary Structure Information
513(1)
23.4.3 Hydrophobicity and Profiles
514(1)
23.4.4 Domain Assignment Is More Accurate for Proteins with Fewer Domains
514(1)
23.5 Conclusions
515(1)
References
515(6)
24 An Introduction to rna Structure and Pseudoknot Prediction
521(26)
Jana Sperschneider
Amitava Datta
24.1 Introduction
521(1)
24.2 RNA Secondary Structure Prediction
522(12)
24.2.1 Minimum Free Energy Model
524(2)
24.2.2 Prediction of Minimum Free Energy Structure
526(4)
24.2.3 Partition Function Calculation
530(3)
24.2.4 Base Pair Probabilities
533(1)
24.3 RNA Pseudoknots
534(9)
24.3.1 Biological Relevance
536(1)
24.3.2 RNA Pseudoknot Prediction
537(1)
24.3.3 Dynamic Programming
538(3)
24.3.4 Heuristic Approaches
541(1)
24.3.5 Pseudoknot Detection
542(1)
24.3.6 Overview
542(1)
24.4 Conclusions
543(1)
References
544(3)
IV PHYLOGENY RECONSTRUCTION
547(76)
25 Phylogenetic Search Algorithms for Maximum Likelihood
549(30)
Alexandros Stamatakis
25.1 Introduction
549(3)
25.1.1 Phylogenetic Inference
550(2)
25.2 Computing the Likelihood
552(3)
25.3 Accelerating the PLF by Algorithmic Means
555(3)
25.3.1 Reuse of Values Across Probability Vectors
555(2)
25.3.2 Gappy Alignments and Pointer Meshes
557(1)
25.4 Alignment Shapes
558(1)
25.5 General Search Heuristics
559(7)
25.5.1 Lazy Evaluation Strategies
563(1)
25.5.2 Further Heuristics
564(1)
25.5.3 Rapid Bootstrapping
565(1)
25.6 Computing the Robinson Foulds Distance
566(2)
25.7 Convergence Criteria
568(4)
25.7.1 Asymptotic Stopping
569(3)
25.8 Future Directions
572(1)
References
573(6)
26 Heuristic Methods for Phylogenetic Reconstruction with Maximum Parsimony
579(20)
Adrien Goeffon
Jean-Michel Richer
Jin-Kao Hao
26.1 Introduction
579(1)
26.2 Definitions and Formal Background
580(1)
26.2.1 Parsimony and Maximum Parsimony
580(1)
26.3 Methods
581(13)
26.3.1 Combinatorial Optimization
581(1)
26.3.2 Exact Approach
582(1)
26.3.3 Local Search Methods
582(6)
26.3.4 Evolutionary Metaheuristics and Genetic Algorithms
588(2)
26.3.5 Memetic Methods
590(2)
26.3.6 Problem-Specific Improvements
592(2)
26.4 Conclusion
594(1)
References
595(4)
27 Maximum Entropy Method for Composition Vector Method
599(24)
Raymond H.-F. Chan
Roger W. Wang
Jeff C.-F. Wong
27.1 Introduction
599(2)
27.2 Models and Entropy Optimization
601(13)
27.2.1 Definitions
601(2)
27.2.2 Denoising Formulas
603(8)
27.2.3 Distance Measure
611(2)
27.2.4 Phylogenetic Tree Construction
613(1)
27.3 Application and Dicussion
614(5)
27.3.1 Example 1
614(1)
27.3.2 Example 2
614(1)
27.3.3 Example 3
615(2)
27.3.4 Example 4
617(2)
27.4 Concluding Remarks
619(1)
References
619(4)
V MICROARRAY DATA ANALYSIS
623(100)
28 Microarray Gene Expression Data Analysis
625(26)
Alan Wee-Chung Liew
Xiangchao Gan
28.1 Introduction
625(1)
28.2 DNA Microarray Technology and Experiment
626(1)
28.3 Image Analysis and Expression Data Extraction
627(3)
28.3.1 Image Preprocessing
628(1)
28.3.2 Block Segmentation
628(1)
28.3.3 Automatic Gridding
628(1)
28.3.4 Spot Extraction
628(2)
28.4 Data Processing
630(1)
28.4.1 Background Correction
630(1)
28.4.2 Normalization
630(1)
28.4.3 Data Filtering
631(1)
28.5 Missing Value Imputation
631(3)
28.6 Temporal Gene Expression Profile Analysis
634(6)
28.7 Cyclic Gene Expression Profiles Detection
640(7)
28.7.1 SSA-AR Spectral Estimation
643(1)
28.7.2 Spectral Estimation by Signal Reconstruction
644(2)
28.7.3 Statistical Hypothesis Testing for Periodic Profile Detection
646(1)
28.8 Summary
647(1)
Acknowledgments
648(1)
References
649(2)
29 Biclustering of Microarray Data
651(14)
Wassim Ayadi
Mourad Elloumi
29.1 Introduction
651(1)
29.2 Types of Biclusters
652(1)
29.3 Groups of Biclusters
653(1)
29.4 Evaluation Functions
654(2)
29.5 Systematic and Stochastic Biclustering Algorithms
656(3)
29.6 Biological Validation
659(2)
29.7 Conclusion
661(1)
References
661(4)
30 Computational Models for Condition-Specific Gene and Pathway Inference
665(26)
Yu-Qing Qiu
Shihua Zhang
Xiang-Sun Zhang
Luonan Chen
30.1 Introduction
665(1)
30.2 Condition-Specific Pathway Identification
666(15)
30.2.1 Gene Set Analysis
667(4)
30.2.2 Condition-Specific Pathway Inference
671(10)
30.3 Disease Gene Prioritization and Genetic Pathway Detection
681(3)
30.4 Module Networks
684(1)
30.5 Summary
685(1)
Acknowledgments
685(1)
References
685(6)
31 Heterogeneity of Differential Expression in Cancer Studies: Algorithms and Methods
691(32)
Radha Krishna Murthy Karuturi
31.1 Introduction
691(1)
31.2 Notations
692(2)
31.3 Differential Mean of Expression
694(5)
31.3.1 Single Factor Differential Expression
695(2)
31.3.2 Multifactor Differential Expression
697(1)
31.3.3 Empirical Bayes Extension
698(1)
31.4 Differential Variability of Expression
699(2)
31.4.1 F-Test for Two-Group Differential Variability Analysis
699(1)
31.4.2 Bartlett's and Levene's Tests for Multigroup Differential Variability Analysis
700(1)
31.5 Differential Expression in Compendium of Tumors
701(4)
31.5.1 Gaussian Mixture Model (GMM) for Finite Levels of Expression
701(2)
31.5.2 Outlier Detection Strategy
703(1)
31.5.3 Kurtosis Excess
704(1)
31.6 Differential Expression by Chromosomal Aberrations: The Local Properties
705(6)
31.6.1 Wavelet Variance Scanning (WAVES) for Single-Sample Analysis
708(1)
31.6.2 Local Singular Value Decomposition (LSVD) for Compendium of Tumors
709(1)
31.6.3 Locally Adaptive Statistical Procedure (LAP) for Compendium of Tumors with Control Samples
710(1)
31.7 Differential Expression in Gene Interactome
711(3)
31.7.1 Friendly Neighbors Algorithm: A Multiplicative Interactome
711(1)
31.7.2 GeneRank: A Contributing Interactome
712(1)
31.7.3 Top Scoring Pairs (TSP): A Differential Interactome
713(1)
31.8 Differential Coexpression: Global MultiDimensional Interactome
714(6)
31.8.1 Kostka and Spang's Differential Coexpression Algorithm
715(3)
31.8.2 Differential Expression Linked Differential Coexpression
718(1)
31.8.3 Differential Friendly Neighbors (DiffFNs)
718(2)
Acknowledgments
720(1)
References
720(3)
VI ANALYSIS OF GENOMES
723(142)
32 Comparative Genomics: Algorithms and Applications
725(24)
Xiao Yang
Srinivas Aluru
32.1 Introduction
725(2)
32.2 Notations
727(1)
32.3 Ortholog Assignment
727(7)
32.3.1 Sequence Similarity-Based Method
729(2)
32.3.2 Phylogeny-Based Method
731(1)
32.3.3 Rearrangement-Based Method
732(2)
32.4 Gene Cluster and Synteny Detection
734(9)
32.4.1 Synteny Detection
736(3)
32.4.2 Gene Cluster Detection
739(4)
32.5 Conclusions
743(1)
References
743(6)
33 Advances in Genome Rearrangement Algorithms
749(24)
Masud Hasan
M. Sohel Rahman
33.1 Introduction
749(3)
33.2 Preliminaries
752(1)
33.3 Sorting by Reversals
753(6)
33.3.1 Approaches to Approximation Algorithms
754(3)
33.3.2 Signed Permutations
757(2)
33.4 Sorting by Transpositions
759(2)
33.4.1 Approximation Results
760(1)
33.4.2 Improved Running Time and Simpler Algorithms
761(1)
33.5 Other Operations
761(2)
33.5.1 Sorting by Prefix Reversals
761(1)
33.5.2 Sorting by Prefix Transpositions
762(1)
33.5.3 Sorting by Block Interchange
762(1)
33.5.4 Short Swap and Fixed-Length Reversals
763(1)
33.6 Sorting by More Than One Operation
763(2)
33.6.1 Unified Operation: Doule Cut and Join
764(1)
33.7 Future Research Directions
765(1)
33.8 Notes on Software
766(1)
References
767(6)
34 Computing Genomic Distances: An Algorithmic Viewpoint
773(26)
Guillaume Fertin
Irena Rusu
34.1 Introduction
773(2)
34.1.1 What this
Chapter is About
773(1)
34.1.2 Definitions and Notations
774(1)
34.1.3 Organization of the
Chapter
775(1)
34.2 Interval-Based Criteria
775(10)
34.2.1 Brief Introduction
775(1)
34.2.2 The Context and the Problems
776(2)
34.2.3 Common Intervals in Permutations and the Commuting Generators Strategy
778(4)
34.2.4 Conserved Intervals in Permutations and the Bound-and-Drop Strategy
782(1)
34.2.5 Common Intervals in Strings and the Element Plotting Strategy
783(2)
34.2.6 Variants
785(1)
34.3 Character-Based Criteria
785(10)
34.3.1 Introduction and Definition of the Problems
785(2)
34.3.2 An Approximation Algorithm for BAL-FMB
787(4)
34.3.3 An Exact Algorithm for UNBAL-FMB
791(4)
34.3.4 Other Results and Open Problems
795(1)
34.4 Conclusion
795(1)
References
796(3)
35 Wavelet Algorithms for Dna Analysis
799(44)
Carlo Cattani
35.1 Introduction
799(3)
35.2 DNA Representation
802(10)
35.2.1 Preliminary Remarks on DNA
802(1)
35.2.2 Indicator Function
803(3)
35.2.3 Representation
806(1)
35.2.4 Representation Models
807(1)
35.2.5 Constraints on the Representation in R2
808(2)
35.2.6 Complex Representation
810(1)
35.2.7 DNA Walks
810(2)
35.3 Statistical Correlations in DNA
812(6)
35.3.1 Long-Range Correlation
812(2)
35.3.2 Power Spectrum
814(3)
35.3.3 Complexity
817(1)
35.4 Wavelet Analysis
818(5)
35.4.1 Haar Wavelet Basis
819(1)
35.4.2 Haar Series
819(2)
35.4.3 Discrete Haar Wavelet Transform
821(2)
35.5 Haar Wavelet Coefficients and Statistical Parameters
823(3)
35.6 Algorithm of the Short Haar Discrete Wavelet Transform
826(2)
35.7 Clusters of Wavelet Coefficients
828(10)
35.7.1 Cluster Analysis of the Wavelet Coefficients of the Complex DNA Representation
830(4)
35.7.2 Cluster Analysis of the Wavelet Coefficients of DNA Walks
834(4)
35.8 Conclusion
838(1)
References
839(4)
36 Haplotype Inference Models and Algorithms
843(22)
Ling-Yun Wu
36.1 Introduction
843(1)
36.2 Problem Statement and Notations
844(2)
36.3 Combinatorial Methods
846(5)
36.3.1 Clark's Inference Rule
846(2)
36.3.2 Pure Parsimony Model
848(1)
36.3.3 Phylogeny Methods
849(2)
36.4 Statistical Methods
851(2)
36.4.1 Maximum Likelihood Methods
851(1)
36.4.2 Bayesian Methods
852(1)
36.4.3 Markov Chain Methods
852(1)
36.5 Pedigree Methods
853(3)
36.5.1 Minimum Recombinant Haplotype Configurations
854(1)
36.5.2 Zero Recombinant Haplotype Configurations
854(1)
36.5.3 Statistical Methods
855(1)
36.6 Evaluation
856(2)
36.6.1 Evaluation Measurements
856(1)
36.6.2 Comparisons
857(1)
36.6.3 Datasets
857(1)
36.7 Discussion
858(1)
References
859(6)
VII ANALYSIS OF BIOLOGICAL NETWORKS
865(142)
37 Untangling Biological Networks Using Bioinformatics
867(26)
Gaurav Kumar
Adrian P. Cootes
Shoba Ranganathan
37.1 Introduction
867(11)
37.1.1 Predicting Biological Processes: A Major Challenge to Understanding Biology
867(1)
37.1.2 Historical Perspective and Mathematical Preliminaries of Networks
868(2)
37.1.3 Structural Properties of Biological Networks
870(3)
37.1.4 Local Topology of Biological Networks: Functional Motifs, Modules, and Communities
873(5)
37.2 Types of Biological Networks
878(6)
37.2.1 Protein-Protein Interaction Networks
878(1)
37.2.2 Metabolic Networks
879(2)
37.2.3 Transcriptional Networks
881(2)
37.2.4 Other Biological Networks
883(1)
37.3 Network Dynamic, Evolution and Disease
884(3)
37.3.1 Biological Network Dynamic and Evolution
884(2)
37.3.2 Biological Networks and Disease
886(1)
37.4 Future Challenges and Scope
887(1)
Acknowledgments
887(1)
References
888(5)
38 Probabilistic Approaches for Investigating Biological Networks
893(22)
Jeremie Bourdon
Damien Eveillard
38.1 Probabilistic Models for Biological Networks
894(8)
38.1.1 Boolean Networks
895(5)
38.1.2 Probabilistic Boolean Networks: A Natural Extension
900(1)
38.1.3 Inferring Probabilistic Models from Experiments
901(1)
38.2 Interpretation and Quantitative Analysis of Probabilistic Models
902(9)
38.2.1 Dynamical Analysis and Temporal Properties
902(3)
38.2.2 Impact of Update Strategies for Analyzing Probabilistic Boolean Networks
905(1)
38.2.3 Simulations of a Probabilistic Boolean Network
906(5)
38.3 Conclusion
911(1)
Acknowledgments
911(1)
References
911(4)
39 Modeling and Analysis of Biological Networks with Model Checking
915(26)
Dragan Bosnacki
Peter A.J. Hilbers
Ronny S. Mans
Erik P. de Vink
39.1 Introduction
915(1)
39.2 Preliminaries
916(3)
39.2.1 Model Checking
916(1)
39.2.2 SPIN and Promela
917(1)
39.2.3 LTL
918(1)
39.3 Analyzing Genetic Networks with Model Checking
919(6)
39.3.1 Boolean Regulatory Networks
919(1)
39.3.2 A Case Study
919(2)
39.3.3 Translating Boolean Regulatory Graphs into Promela
921(1)
39.3.4 Some Results
922(2)
39.3.5 Concluding Remarks
924(1)
39.3.6 Related Work and Bibliographic Notes
924(1)
39.4 Probabilistic Model Checking for Biological Systems
925(11)
39.4.1 Motivation and Background
926(1)
39.4.2 A Kinetic Model of mRNA Translation
927(1)
39.4.3 Probabilistic Model Checking
928(1)
39.4.4 The Prism Model
929(4)
39.4.5 Insertion Errors
933(1)
39.4.6 Concluding Remarks
934(1)
39.4.7 Related Work and Bibliographic Notes
935(1)
References
936(5)
40 Reverse Engineering of Molecular Networks from a Common Combinatorial Approach
941(14)
Bhaskar DasGupta
Paola Vera-Licona
Eduardo Sontag
40.1 Introduction
941(1)
40.2 Reverse-Engineering of Biological Networks
942(4)
40.2.1 Evaluation of the Performance of Reverse-Engineering Methods
945(1)
40.3 Classical Combinatorial Algorithms: A Case Study
946(5)
40.3.1 Benchmarking RE Combinatorial-Based Methods
947(3)
40.3.2 Software Availability
950(1)
40.4 Concluding Remarks
951(1)
Acknowledgments
951(1)
References
951(4)
41 Unsupervised Learning for gene Regulation Network Inference from Expression Data: A Review
955(24)
Mohamed Elati
Celine Rouveirol
41.1 Introduction
955(1)
41.2 Gene Networks: Definition and Properties
956(2)
41.3 Gene Expression: Data and Analysis
958(1)
41.4 Network Inference as an Unsupervised Learning Problem
959(1)
41.5 Correlation-Based Methods
959(2)
41.6 Probabilistic Graphical Models
961(2)
41.7 Constraint-Based Data Mining
963(6)
41.7.1 Multiple Usages of Extracted Patterns
965(1)
41.7.2 Mining Gene Regulation from Transcriptome Datasets
966(3)
41.8 Validation
969(4)
41.8.1 Statistical Validation of Network Inference
970(2)
41.8.2 Biological Validation
972(1)
41.9 Conclusion and Perspectives
973(1)
References
974(5)
42 Approaches to Construction and Analysis of Microrna-Mediated Networks
979(28)
Ilana Lichtenstein
Albert Zomaya
Jennifer Gamble
Mathew Vadas
42.1 Introduction
979(3)
42.1.1 miRNA-mediated Genetic Regulatory Networks
979(2)
42.1.2 The Four Levels of Regulation in GRNs
981(1)
42.1.3 Overview of Sections
982(1)
42.2 Fundamental Component Interaction Research: Predicting miRNA Genes, Regulators, and Targets
982(6)
42.2.1 Prediction of Novel miRNA Genes
983(1)
42.2.2 Prediction of miRNA Targets
984(1)
42.2.3 Prediction of miRNA Transcript Elements and Transcriptional Regulation
984(4)
42.3 Identifying miRNA-mediated Networks
988(5)
42.3.1 Forward Engineering---Construction of Multinode Components in miRNA-mediated Networks Using Paired Interaction Information
988(1)
42.3.2 Reverse Engineering---Inference of MicroRNA Modules Using Top-Down Approaches
988(5)
42.4 Global and Local Architecture Analysis in miRNA-Containing Networks
993(8)
42.4.1 Global Architecture Properties of miRNA-mediated Post-transcriptional Networks
993(1)
42.4.2 Local Architecture Properties of miRNA-mediated Post-transcriptional Networks
994(7)
42.5 Conclusion
1001(1)
References
1001(6)
Index 1007
Mourad Elloumi, PhD, is Associate Professor in Computer Science, Faculty of Economic Sciences and Management of Tunis (Tunisia), and member of the Unit of Technologies of Information and Communication (UTIC). He is the author/coauthor of more than forty publications in international journals and conferences. Professor Elloumi was the guest editor of a special issue on biological knowledge discovery and data mining in Knowledge-Based Systems and the coeditor of the proceedings of two international conferences.

Albert Y. Zomaya, PhD, is the Chair Professor of High Performance Computing and Networking in the School of Information Technologies at The University of Sydney (Australia). He is the author/coauthor of eight books and more than 400 publications in technical journals and conferences, and the editor of eight books and eight conference volumes. Professor Zomaya is currently an associate editor for twenty journals, the Founding Editor of the Wiley Series on Parallel and Distributed Computing, and a Founding Coeditor of the Wiley Series in Bioinformatics.