Preface |
|
xxxi | |
Contributors |
|
xxxiii | |
|
I STRINGS PROCESSING AND APPLICATION TO BIOLOGICAL SEQUENCES |
|
|
1 | (190) |
|
1 String data Structures for Computational Molecular Biology |
|
|
3 | (24) |
|
|
|
|
3 | (3) |
|
1.2 Main String Indexing Data Structures |
|
|
6 | (6) |
|
|
6 | (2) |
|
|
8 | (4) |
|
1.3 Index Structures for Weighted Strings |
|
|
12 | (2) |
|
1.4 Index Structures for Indeterminate Strings |
|
|
14 | (3) |
|
1.5 String Data Structures in Memory Hierarchies |
|
|
17 | (3) |
|
|
20 | (1) |
|
|
20 | (7) |
|
2 Efficient Restricted-Case Algorithms for Problems in Computational Biology |
|
|
27 | (24) |
|
|
|
2.1 The Need for Special Cases |
|
|
27 | (1) |
|
2.2 Assessing Efficient Solvability Options for General Problems and Special Cases |
|
|
28 | (2) |
|
2.3 String and Sequence Problems |
|
|
30 | (1) |
|
2.4 Shortest Common Superstring |
|
|
31 | (5) |
|
2.4.1 Solving the General Problem |
|
|
32 | (2) |
|
2.4.2 Special Case: SCSt for Short Strings Over Small Alphabets |
|
|
34 | (1) |
|
|
35 | (1) |
|
2.5 Longest Common Subsequence |
|
|
36 | (5) |
|
2.5.1 Solving the General Problem |
|
|
37 | (2) |
|
2.5.2 Special Case: LCS of Similar Sequences |
|
|
39 | (1) |
|
2.5.3 Special Case: LCS Under Symbol-Occurrence Restrictions |
|
|
39 | (1) |
|
|
40 | (1) |
|
2.6 Common Approximate Substring |
|
|
41 | (5) |
|
2.6.1 Solving the General Problem |
|
|
42 | (2) |
|
2.6.2 Special Case: Common Approximate String |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
46 | (1) |
|
|
47 | (4) |
|
3 Finite Automata in Pattern Matching |
|
|
51 | (22) |
|
|
|
51 | (2) |
|
|
52 | (1) |
|
3.2 Direct Use of DFA in Stringology |
|
|
53 | (7) |
|
|
53 | (3) |
|
|
56 | (1) |
|
|
57 | (2) |
|
|
59 | (1) |
|
|
59 | (1) |
|
3.2.6 Automata with Fail Function |
|
|
60 | (1) |
|
|
60 | (6) |
|
3.3.1 Basic Simulation Method |
|
|
61 | (1) |
|
|
61 | (2) |
|
3.3.3 Dynamic Programming |
|
|
63 | (3) |
|
3.3.4 Basic Simulation Method with Deterministic State Cache |
|
|
66 | (1) |
|
3.4 Finite Automaton as Model of Computation |
|
|
66 | (1) |
|
3.5 Finite Automata Composition |
|
|
67 | (1) |
|
|
67 | (2) |
|
|
69 | (4) |
|
4 New Developments in Processing of Degenerate Sequences |
|
|
73 | (18) |
|
|
|
|
73 | (1) |
|
4.1.1 Degenerate Primer Design Problem |
|
|
74 | (1) |
|
|
74 | (2) |
|
|
76 | (3) |
|
4.4 Repetitive Structures in Degenerate Strings |
|
|
79 | (5) |
|
4.4.1 Using the Masking Technique |
|
|
79 | (1) |
|
4.4.2 Computing the Smallest Cover of the Degenerate String x |
|
|
79 | (2) |
|
4.4.3 Computing Maximal Local Covers of x |
|
|
81 | (3) |
|
4.4.4 Computing All Covers of x |
|
|
84 | (1) |
|
4.4.5 Computing the Seeds of x |
|
|
84 | (1) |
|
4.5 Conservative String Covering in Degenerate Strings |
|
|
84 | (4) |
|
4.5.1 Finding Constrained Pattern p in Degenerate String T |
|
|
85 | (1) |
|
4.5.2 Computing λ-Conservative Covers of Degenerate Strings |
|
|
86 | (1) |
|
4.5.3 Computing λ-Conservative Seeds of Degenerate Strings |
|
|
87 | (1) |
|
|
88 | (1) |
|
|
89 | (2) |
|
5 Exact Search Algorithms for Biological Sequences |
|
|
91 | (22) |
|
|
|
|
|
91 | (2) |
|
5.2 Single Pattern Matching Algorithms |
|
|
93 | (4) |
|
5.2.1 Algorithms for DNA Sequences |
|
|
94 | (2) |
|
5.2.2 Algorithms for Amino Acids |
|
|
96 | (1) |
|
5.3 Algorithms for Multiple Patterns |
|
|
97 | (6) |
|
5.3.1 Trie-Based Algorithms |
|
|
97 | (3) |
|
5.3.2 Filtering Algorithms |
|
|
100 | (3) |
|
|
103 | (1) |
|
5.4 Application of Exact Set Pattern Matching for Read Mapping |
|
|
103 | (4) |
|
5.4.1 MPSCAN: An Efficient Exact Set Pattern Matching Tool for DNA/RNA Sequences |
|
|
103 | (1) |
|
5.4.2 Other Solutions for Mapping Reads |
|
|
104 | (1) |
|
5.4.3 Comparison of Mapping Solutions |
|
|
105 | (2) |
|
|
107 | (1) |
|
|
108 | (5) |
|
6 Algorithmic Aspects of Arc-Annotated Sequences |
|
|
113 | (16) |
|
|
|
|
|
113 | (1) |
|
|
114 | (3) |
|
6.2.1 Arc-Annotated Sequences |
|
|
114 | (1) |
|
|
114 | (1) |
|
|
115 | (1) |
|
|
115 | (1) |
|
|
116 | (1) |
|
6.3 Longest Arc-Preserving Common Subsequence |
|
|
117 | (3) |
|
|
117 | (1) |
|
6.3.2 Classical Complexity |
|
|
118 | (1) |
|
6.3.3 Parameterized Complexity |
|
|
119 | (1) |
|
|
120 | (1) |
|
6.4 Arc-Preserving Subsequence |
|
|
120 | (2) |
|
|
120 | (1) |
|
6.4.2 Classical Complexity |
|
|
121 | (1) |
|
6.4.3 Classical Complexity for the Refined Hierarchy |
|
|
121 | (1) |
|
|
122 | (1) |
|
6.5 Maximum Arc-Preserving Common Subsequence |
|
|
122 | (1) |
|
|
122 | (1) |
|
6.5.2 Classical Complexity |
|
|
123 | (1) |
|
|
123 | (1) |
|
|
123 | (2) |
|
|
123 | (1) |
|
6.6.2 Classical Complexity |
|
|
123 | (2) |
|
|
125 | (1) |
|
|
125 | (1) |
|
|
125 | (4) |
|
7 Algorithmic Issues in Dna Barcoding Problems |
|
|
129 | (14) |
|
|
|
|
|
129 | (1) |
|
7.2 Test Set Problems: A General Framework for Several Barcoding Problems |
|
|
130 | (2) |
|
7.3 A Synopsis of Biological Applications of Barcoding |
|
|
132 | (1) |
|
7.4 Survey of Algorithmic Techniques on Barcoding |
|
|
133 | (2) |
|
7.4.1 Integer Programming |
|
|
134 | (1) |
|
7.4.2 Lagrangian Relaxation and Simulated Annealing |
|
|
134 | (1) |
|
7.4.3 Provably Asymptotically Optimal Results |
|
|
134 | (1) |
|
7.5 Information Content Approach |
|
|
135 | (1) |
|
7.6 Set-Covering Approach |
|
|
136 | (3) |
|
7.6.1 Set-Covering Implementation in More Detail |
|
|
137 | (2) |
|
7.7 Experimental Results and Software Availability |
|
|
139 | (1) |
|
7.7.1 Randomly Generated Instances |
|
|
139 | (1) |
|
|
140 | (1) |
|
7.7.3 Software Availability |
|
|
140 | (1) |
|
|
140 | (1) |
|
|
141 | (2) |
|
8 Recent Advances in Weighted Dna Sequences |
|
|
143 | (28) |
|
|
|
|
143 | (3) |
|
|
146 | (2) |
|
|
146 | (1) |
|
|
147 | (1) |
|
|
148 | (4) |
|
8.3.1 Weighted Suffix Tree |
|
|
148 | (3) |
|
8.3.2 Property Suffix Tree |
|
|
151 | (1) |
|
|
152 | (5) |
|
8.4.1 Pattern Matching Using the Weighted Suffix Tree |
|
|
152 | (1) |
|
8.4.2 Pattern Matching Using Match Counts |
|
|
153 | (1) |
|
8.4.3 Pattern Matching with Gaps |
|
|
154 | (2) |
|
8.4.4 Pattern Matching with Swaps |
|
|
156 | (1) |
|
8.5 Approximate Pattern Matching |
|
|
157 | (3) |
|
|
157 | (3) |
|
8.6 Repetitions, Covers, and Tandem Repeats |
|
|
160 | (4) |
|
8.6.1 Finding Simple Repetitions with the Weighted Suffix Tree |
|
|
161 | (1) |
|
8.6.2 Fixed-Length Simple Repetitions |
|
|
161 | (2) |
|
8.6.3 Fixed-Length Strict Repetitions |
|
|
163 | (1) |
|
8.6.4 Fixed-Length Tandem Repeats |
|
|
163 | (1) |
|
|
164 | (1) |
|
|
164 | (2) |
|
8.7.1 Approximate Motifs in a Single Weighted Sequence |
|
|
164 | (1) |
|
8.7.2 Approximate Common Motifs in a Set of Weighted Sequences |
|
|
165 | (1) |
|
|
166 | (1) |
|
|
167 | (4) |
|
9 Dna Computing for Subgraph Isomorphism Problem and Related Problems |
|
|
171 | (20) |
|
|
|
|
|
171 | (1) |
|
9.2 Definitions of Subgraph Isomorphism Problem and Related Problems |
|
|
172 | (2) |
|
|
174 | (1) |
|
|
174 | (1) |
|
9.3.2 The Adleman-Lipton Model |
|
|
175 | (1) |
|
9.4 The Sticker-based Solution Space |
|
|
175 | (4) |
|
9.4.1 Using Stickers for Generating the Permutation Set |
|
|
176 | (1) |
|
9.4.2 Using Stickers for Generating the Solution Space |
|
|
177 | (2) |
|
9.5 Algorithms for Solving Problems |
|
|
179 | (8) |
|
9.5.1 Solving the Subgraph Isomorphism Problem |
|
|
179 | (4) |
|
9.5.2 Solving the Graph Isomorphism Problem |
|
|
183 | (1) |
|
9.5.3 Solving the Maximum Common Subgraph Problem |
|
|
184 | (3) |
|
|
187 | (1) |
|
|
188 | (1) |
|
|
188 | (3) |
|
II ANALYSIS OF BIOLOGICAL SEQUENCES |
|
|
191 | (192) |
|
10 Graphs in Bioinformatics |
|
|
193 | (28) |
|
|
|
10.1 Graph theory---Origin |
|
|
193 | (14) |
|
|
193 | (1) |
|
|
194 | (6) |
|
10.1.3 Well-Known Graph Problems and Algorithms |
|
|
200 | (7) |
|
10.2 Graphs and the Biological World |
|
|
207 | (9) |
|
10.2.1 Alternative Splicing and Graphs |
|
|
207 | (1) |
|
10.2.2 Evolutionary Tree Construction |
|
|
208 | (1) |
|
10.2.3 Tracking the Temporal Variation of Biological Systems |
|
|
209 | (1) |
|
10.2.4 Identifying Protein Domains by Clustering Sequence Alignments |
|
|
210 | (1) |
|
10.2.5 Clustering Gene Expression Data |
|
|
211 | (1) |
|
10.2.6 Protein Structural Domain Decomposition |
|
|
212 | (1) |
|
10.2.7 Optimal Design of Thermally Stable Proteins |
|
|
212 | (2) |
|
10.2.8 The Sequencing by Hybridization (SBH) Problem |
|
|
214 | (1) |
|
10.2.9 Predicting Interactions in Protein Networks by Completing Defective Cliques |
|
|
215 | (1) |
|
|
216 | (1) |
|
|
216 | (5) |
|
11 A Flexible Data Store for Managing Bioinformatics Data |
|
|
221 | (20) |
|
|
|
|
|
|
221 | (2) |
|
|
222 | (1) |
|
11.1.2 Scalability Challenges |
|
|
222 | (1) |
|
11.2 Data Model and System Overview |
|
|
223 | (4) |
|
11.3 Replication and Load Balancing |
|
|
227 | (3) |
|
11.3.1 Replicating an Index Node |
|
|
228 | (1) |
|
11.3.2 Answering Range Queries with Replicas |
|
|
229 | (1) |
|
|
230 | (7) |
|
11.4.1 Point Query Processing Performance |
|
|
230 | (3) |
|
11.4.2 Range Query Processing Performance |
|
|
233 | (2) |
|
11.4.3 Growth of the Replicas of an Indexing Node |
|
|
235 | (2) |
|
|
237 | (1) |
|
|
237 | (1) |
|
|
238 | (3) |
|
12 Algorithms for the Alignment of Biological Sequences |
|
|
241 | (20) |
|
|
|
|
241 | (1) |
|
12.2 Alignment Algorithms |
|
|
242 | (9) |
|
12.2.1 Pairwise Alignment Algorithms |
|
|
242 | (3) |
|
12.2.2 Multiple Alignment Algorithms |
|
|
245 | (6) |
|
|
251 | (1) |
|
|
252 | (3) |
|
|
255 | (1) |
|
|
255 | (1) |
|
|
255 | (6) |
|
13 Algorithms for Local Structural Alignment and Structural Motif Identification |
|
|
261 | (16) |
|
|
|
|
|
261 | (1) |
|
13.2 Problem Definition of Local Structural Alignment |
|
|
262 | (1) |
|
13.3 Variable-Length Alignment Fragment Pair (VLAFP) Algorithm |
|
|
263 | (3) |
|
13.3.1 Alignment Fragment Pairs |
|
|
263 | (1) |
|
13.3.2 Finding the Optimal Local Alignments Based on the VLAFP Cost Function |
|
|
264 | (2) |
|
13.4 Structural Alignment based on Center of Gravity: SACG |
|
|
266 | (4) |
|
13.4.1 Description of Protein Structure in PDB Format |
|
|
266 | (1) |
|
|
267 | (1) |
|
13.4.3 Center-of-Gravity-Based Algorithm |
|
|
267 | (2) |
|
13.4.4 Extending Theorem 13.1 for Atomic Coordinates in Protein Structure |
|
|
269 | (1) |
|
13.4.5 Building VCOST(i,j,q) Function Based on Center of Gravity |
|
|
270 | (1) |
|
13.5 Searching Structural Motifs |
|
|
270 | (3) |
|
13.6 Using SACG Algorithm for Classification of New Protein Structures |
|
|
273 | (1) |
|
13.7 Experimental Results |
|
|
273 | (1) |
|
|
273 | (1) |
|
|
274 | (1) |
|
|
275 | (1) |
|
|
276 | (1) |
|
14 Evolution of the Clustal Family of Multiple Sequence Alignment Programs |
|
|
277 | (22) |
|
|
|
|
277 | (1) |
|
|
278 | (6) |
|
14.2.1 Pairwise Similarity Scores |
|
|
279 | (1) |
|
|
280 | (2) |
|
14.2.3 Progressive Multiple Alignment |
|
|
282 | (1) |
|
14.2.4 An Efficient Dynamic Programming Algorithm |
|
|
282 | (2) |
|
14.2.5 Profile Alignments |
|
|
284 | (1) |
|
|
284 | (5) |
|
14.3.1 Optimal Pairwise Alignments |
|
|
284 | (1) |
|
14.3.2 More Accurate Guide Tree |
|
|
284 | (1) |
|
14.3.3 Improved Progressive Alignment |
|
|
285 | (4) |
|
|
289 | (3) |
|
14.4.1 Alignment Quality Analysis |
|
|
290 | (2) |
|
14.5 ClustalW and ClustalX 2.0 |
|
|
292 | (1) |
|
|
293 | (2) |
|
14.6.1 Anchored Global Alignment |
|
|
294 | (1) |
|
|
295 | (1) |
|
|
296 | (3) |
|
15 Filters and Seeds Approaches for Fast Homology Searches in Large Datasets |
|
|
299 | (22) |
|
|
|
|
|
299 | (2) |
|
15.1.1 Homologies and Large Datasets |
|
|
299 | (1) |
|
15.1.2 Filter Preprocessing or Heuristics |
|
|
300 | (1) |
|
|
300 | (1) |
|
|
301 | (2) |
|
15.2.1 Strings and Repeats |
|
|
301 | (1) |
|
15.2.2 Filters---Fundamental Concepts |
|
|
301 | (2) |
|
|
303 | (6) |
|
15.3.1 History of Lossless Filters |
|
|
303 | (1) |
|
15.3.2 Quasar and swift---Filtering Repeats with Edit Distance |
|
|
304 | (1) |
|
15.3.3 Nimbus---Filtering Multiple Repeats with Hamming Distance |
|
|
305 | (3) |
|
15.3.4 tuiuiu---Filtering Multiple Repeats with Edit Distance |
|
|
308 | (1) |
|
15.4 Lossy Seed-Based Filters |
|
|
309 | (6) |
|
15.4.1 Seed-Based Heuristics |
|
|
310 | (1) |
|
|
311 | (1) |
|
15.4.3 Latencies and Neighborhood Indexing |
|
|
311 | (2) |
|
15.4.4 Seed-Based Heuristics Implementations |
|
|
313 | (2) |
|
|
315 | (1) |
|
|
315 | (1) |
|
|
315 | (6) |
|
16 Novel Combinatorial and Information-Theoretic Alignment-Free Distances for Biological DATA MINING |
|
|
321 | (40) |
|
|
|
|
|
|
321 | (2) |
|
16.2 Information-Theoretic Alignment-Free Methods |
|
|
323 | (8) |
|
16.2.1 Fundamental Information Measures, Statistical Dependency, and Similarity of Sequences |
|
|
324 | (1) |
|
16.2.2 Methods Based on Relative Entropy and Empirical Probability Distributions |
|
|
325 | (4) |
|
16.2.3 A Method Based on Statistical Dependency, via Mutual Information |
|
|
329 | (2) |
|
16.3 Combinatorial Alignment-Free Methods |
|
|
331 | (5) |
|
16.3.1 The Average Common Substring Distance |
|
|
332 | (1) |
|
16.3.2 A Method Based on the EBWT Transform |
|
|
333 | (1) |
|
|
334 | (2) |
|
16.4 Alignment-Free Compositional Methods |
|
|
336 | (4) |
|
16.4.1 The k-String Composition Approach |
|
|
337 | (1) |
|
16.4.2 Complete Composition Vector |
|
|
338 | (1) |
|
16.4.3 Fast Algorithms to Compute Composition Vectors |
|
|
339 | (1) |
|
16.5 Alignment-Free Exact Word Matches Methods |
|
|
340 | (4) |
|
16.5.1 D2 and its Distributional Regimes |
|
|
340 | (2) |
|
16.5.2 An Extension to Mismatches and the Choice of the Optimal Word Size |
|
|
342 | (1) |
|
16.5.3 The Transformation of D2 into a Method Assessing the Statistical Significance of Sequence Similarity |
|
|
343 | (1) |
|
16.6 Domains of Biological Application |
|
|
344 | (5) |
|
16.6.1 Phylogeny: Information Theoretic and Combinatorial Methods |
|
|
345 | (1) |
|
16.6.2 Phylogeny: Compositional Methods |
|
|
346 | (1) |
|
16.6.3 CIS Regulatory Modules |
|
|
347 | (1) |
|
16.6.4 DNA Sequence Dependencies |
|
|
348 | (1) |
|
16.7 Datasets and Software for Experimental Algorithmics |
|
|
349 | (5) |
|
|
350 | (3) |
|
|
353 | (1) |
|
|
354 | (1) |
|
|
355 | (6) |
|
17 In Silico Methods for the Analysis of Metabolites and Drug Molecules |
|
|
361 | (22) |
|
|
|
|
361 | (2) |
|
17.1.1 Chemoinformatics and "Drug-Likeness" |
|
|
361 | (2) |
|
17.2 Molecular Descriptors |
|
|
363 | (4) |
|
17.2.1 One-Dimensional (1-D) Descriptors |
|
|
363 | (1) |
|
17.2.2 Two-Dimensional (2-D) Descriptors |
|
|
364 | (2) |
|
17.2.3 Three-Dimensional (3-D) Descriptors |
|
|
366 | (1) |
|
|
367 | (3) |
|
|
367 | (2) |
|
17.3.2 Chemical Entities of Biological Interest (ChEBI) |
|
|
369 | (1) |
|
|
369 | (1) |
|
|
369 | (1) |
|
|
369 | (1) |
|
17.4 Methods and Data Analysis Algorithms |
|
|
370 | (6) |
|
17.4.1 Simple Count Methods |
|
|
370 | (1) |
|
17.4.2 Enhanced Simple Count Methods, Using Structural Features |
|
|
371 | (1) |
|
|
372 | (4) |
|
|
376 | (1) |
|
|
377 | (1) |
|
|
377 | (6) |
|
III MOTIF FINDING AND STRUCTURE PREDICTION |
|
|
383 | (164) |
|
18 Motif Finding Algorithms in Biological Sequences |
|
|
385 | (12) |
|
|
|
|
|
385 | (2) |
|
18.3 The Planted (l, d)-Motif Problem |
|
|
387 | (4) |
|
|
387 | (1) |
|
|
387 | (4) |
|
18.4 The Extended (l, d)-Motif Problem |
|
|
391 | (1) |
|
|
391 | (1) |
|
|
391 | (1) |
|
18.5 The Edited Motif Problem |
|
|
392 | (1) |
|
|
392 | (1) |
|
|
393 | (1) |
|
18.6 The Simple Motif Problem |
|
|
393 | (2) |
|
|
393 | (1) |
|
|
394 | (1) |
|
|
395 | (1) |
|
|
396 | (1) |
|
19 Computational Characterization of Regulatory Regions |
|
|
397 | (28) |
|
|
19.1 The Genome Regulatory Landscape |
|
|
397 | (3) |
|
19.2 Qualitative Models of Regulatory Signals |
|
|
400 | (1) |
|
19.3 Quantitative Models of Regulatory Signals |
|
|
401 | (2) |
|
19.4 Detection of Dependencies in Sequences |
|
|
403 | (2) |
|
19.5 Repositories of Regulatory Information |
|
|
405 | (1) |
|
19.6 Using Predictive Models to Annotate Sequences |
|
|
406 | (2) |
|
19.7 Comparative Genomics Characterization |
|
|
408 | (2) |
|
19.8 Sequence Comparisons |
|
|
410 | (2) |
|
19.9 Combining Motifs and Alignments |
|
|
412 | (2) |
|
19.10 Experimental Validation |
|
|
414 | (3) |
|
|
417 | (1) |
|
|
417 | (8) |
|
20 Algorithmic Issues in the Analysis of Chip-Seq Data |
|
|
425 | (24) |
|
|
|
|
425 | (4) |
|
20.2 Mapping Sequences on the Genome |
|
|
429 | (5) |
|
20.3 Identifying Significantly Enriched Regions |
|
|
434 | (4) |
|
20.3.1 ChIP-Seq Approaches to the Identification of DNA Structure Modifications |
|
|
437 | (1) |
|
20.4 Deriving Actual Transcription Factor Binding Sites |
|
|
438 | (6) |
|
|
444 | (1) |
|
|
444 | (5) |
|
21 Approaches and Methods for Operon Prediction Based on Machine Learning Techniques |
|
|
449 | (30) |
|
|
|
|
|
|
|
|
|
449 | (2) |
|
21.2 Datasets, Features, and Preprocesses for Operon Prediction |
|
|
451 | (9) |
|
|
451 | (3) |
|
|
454 | (5) |
|
21.2.3 Preprocess Methods |
|
|
459 | (1) |
|
21.3 Machine Learning Prediction Methods for Operon Prediction |
|
|
460 | (14) |
|
21.3.1 Hidden Markov Model |
|
|
461 | (1) |
|
21.3.2 Linkage Clustering |
|
|
462 | (2) |
|
21.3.3 Bayesian Classifier |
|
|
464 | (3) |
|
|
467 | (1) |
|
21.3.5 Support Vector Machine |
|
|
468 | (2) |
|
21.3.6 Artificial Neural Network |
|
|
470 | (1) |
|
21.3.7 Genetic Algorithms |
|
|
471 | (1) |
|
21.3.8 Several Combinations |
|
|
472 | (2) |
|
|
474 | (1) |
|
|
475 | (1) |
|
|
475 | (4) |
|
22 Protein Function Prediction with Data-Mining Techniques |
|
|
479 | (22) |
|
|
|
|
479 | (1) |
|
22.2 Protein Annotation Based on Sequence |
|
|
480 | (4) |
|
22.2.1 Protein Sequence Classification |
|
|
480 | (3) |
|
22.2.2 Protein Subcellular Localization Prediction |
|
|
483 | (1) |
|
22.3 Protein Annotation Based on Protein Structure |
|
|
484 | (1) |
|
22.4 Protein Function Prediction Based on Gene-Expression Data |
|
|
485 | (1) |
|
22.5 Protein Function Prediction Based on Protein Interactome Map |
|
|
486 | (3) |
|
22.5.1 Protein Function Prediction Based on Local Topology Structure of Interaction Map |
|
|
486 | (2) |
|
22.5.2 Protein Function Prediction Based on Global Topology of Interaction Map |
|
|
488 | (1) |
|
22.6 Protein Function Prediction Based on Data Integration |
|
|
489 | (2) |
|
22.7 Conclusions and Perspectives |
|
|
491 | (2) |
|
|
493 | (8) |
|
23 Protein Domain Boundary Prediction |
|
|
501 | (20) |
|
|
|
|
|
501 | (2) |
|
|
503 | (7) |
|
23.2.1 Nonlocal Interaction and Vanishing Gradient Problem |
|
|
506 | (1) |
|
23.2.2 Hierarchical Mixture of Experts |
|
|
506 | (2) |
|
23.2.3 Overall Modular Kernel Architecture |
|
|
508 | (2) |
|
|
510 | (2) |
|
|
512 | (3) |
|
23.4.1 Nonlocal Interactions in Amino Acids |
|
|
512 | (1) |
|
23.4.2 Secondary Structure Information |
|
|
513 | (1) |
|
23.4.3 Hydrophobicity and Profiles |
|
|
514 | (1) |
|
23.4.4 Domain Assignment Is More Accurate for Proteins with Fewer Domains |
|
|
514 | (1) |
|
|
515 | (1) |
|
|
515 | (6) |
|
24 An Introduction to rna Structure and Pseudoknot Prediction |
|
|
521 | (26) |
|
|
|
|
521 | (1) |
|
24.2 RNA Secondary Structure Prediction |
|
|
522 | (12) |
|
24.2.1 Minimum Free Energy Model |
|
|
524 | (2) |
|
24.2.2 Prediction of Minimum Free Energy Structure |
|
|
526 | (4) |
|
24.2.3 Partition Function Calculation |
|
|
530 | (3) |
|
24.2.4 Base Pair Probabilities |
|
|
533 | (1) |
|
|
534 | (9) |
|
24.3.1 Biological Relevance |
|
|
536 | (1) |
|
24.3.2 RNA Pseudoknot Prediction |
|
|
537 | (1) |
|
24.3.3 Dynamic Programming |
|
|
538 | (3) |
|
24.3.4 Heuristic Approaches |
|
|
541 | (1) |
|
24.3.5 Pseudoknot Detection |
|
|
542 | (1) |
|
|
542 | (1) |
|
|
543 | (1) |
|
|
544 | (3) |
|
IV PHYLOGENY RECONSTRUCTION |
|
|
547 | (76) |
|
25 Phylogenetic Search Algorithms for Maximum Likelihood |
|
|
549 | (30) |
|
|
|
549 | (3) |
|
25.1.1 Phylogenetic Inference |
|
|
550 | (2) |
|
25.2 Computing the Likelihood |
|
|
552 | (3) |
|
25.3 Accelerating the PLF by Algorithmic Means |
|
|
555 | (3) |
|
25.3.1 Reuse of Values Across Probability Vectors |
|
|
555 | (2) |
|
25.3.2 Gappy Alignments and Pointer Meshes |
|
|
557 | (1) |
|
|
558 | (1) |
|
25.5 General Search Heuristics |
|
|
559 | (7) |
|
25.5.1 Lazy Evaluation Strategies |
|
|
563 | (1) |
|
25.5.2 Further Heuristics |
|
|
564 | (1) |
|
25.5.3 Rapid Bootstrapping |
|
|
565 | (1) |
|
25.6 Computing the Robinson Foulds Distance |
|
|
566 | (2) |
|
25.7 Convergence Criteria |
|
|
568 | (4) |
|
25.7.1 Asymptotic Stopping |
|
|
569 | (3) |
|
|
572 | (1) |
|
|
573 | (6) |
|
26 Heuristic Methods for Phylogenetic Reconstruction with Maximum Parsimony |
|
|
579 | (20) |
|
|
|
|
|
579 | (1) |
|
26.2 Definitions and Formal Background |
|
|
580 | (1) |
|
26.2.1 Parsimony and Maximum Parsimony |
|
|
580 | (1) |
|
|
581 | (13) |
|
26.3.1 Combinatorial Optimization |
|
|
581 | (1) |
|
|
582 | (1) |
|
26.3.3 Local Search Methods |
|
|
582 | (6) |
|
26.3.4 Evolutionary Metaheuristics and Genetic Algorithms |
|
|
588 | (2) |
|
|
590 | (2) |
|
26.3.6 Problem-Specific Improvements |
|
|
592 | (2) |
|
|
594 | (1) |
|
|
595 | (4) |
|
27 Maximum Entropy Method for Composition Vector Method |
|
|
599 | (24) |
|
|
|
|
|
599 | (2) |
|
27.2 Models and Entropy Optimization |
|
|
601 | (13) |
|
|
601 | (2) |
|
27.2.2 Denoising Formulas |
|
|
603 | (8) |
|
|
611 | (2) |
|
27.2.4 Phylogenetic Tree Construction |
|
|
613 | (1) |
|
27.3 Application and Dicussion |
|
|
614 | (5) |
|
|
614 | (1) |
|
|
614 | (1) |
|
|
615 | (2) |
|
|
617 | (2) |
|
|
619 | (1) |
|
|
619 | (4) |
|
V MICROARRAY DATA ANALYSIS |
|
|
623 | (100) |
|
28 Microarray Gene Expression Data Analysis |
|
|
625 | (26) |
|
|
|
|
625 | (1) |
|
28.2 DNA Microarray Technology and Experiment |
|
|
626 | (1) |
|
28.3 Image Analysis and Expression Data Extraction |
|
|
627 | (3) |
|
28.3.1 Image Preprocessing |
|
|
628 | (1) |
|
28.3.2 Block Segmentation |
|
|
628 | (1) |
|
28.3.3 Automatic Gridding |
|
|
628 | (1) |
|
|
628 | (2) |
|
|
630 | (1) |
|
28.4.1 Background Correction |
|
|
630 | (1) |
|
|
630 | (1) |
|
|
631 | (1) |
|
28.5 Missing Value Imputation |
|
|
631 | (3) |
|
28.6 Temporal Gene Expression Profile Analysis |
|
|
634 | (6) |
|
28.7 Cyclic Gene Expression Profiles Detection |
|
|
640 | (7) |
|
28.7.1 SSA-AR Spectral Estimation |
|
|
643 | (1) |
|
28.7.2 Spectral Estimation by Signal Reconstruction |
|
|
644 | (2) |
|
28.7.3 Statistical Hypothesis Testing for Periodic Profile Detection |
|
|
646 | (1) |
|
|
647 | (1) |
|
|
648 | (1) |
|
|
649 | (2) |
|
29 Biclustering of Microarray Data |
|
|
651 | (14) |
|
|
|
|
651 | (1) |
|
|
652 | (1) |
|
29.3 Groups of Biclusters |
|
|
653 | (1) |
|
29.4 Evaluation Functions |
|
|
654 | (2) |
|
29.5 Systematic and Stochastic Biclustering Algorithms |
|
|
656 | (3) |
|
29.6 Biological Validation |
|
|
659 | (2) |
|
|
661 | (1) |
|
|
661 | (4) |
|
30 Computational Models for Condition-Specific Gene and Pathway Inference |
|
|
665 | (26) |
|
|
|
|
|
|
665 | (1) |
|
30.2 Condition-Specific Pathway Identification |
|
|
666 | (15) |
|
|
667 | (4) |
|
30.2.2 Condition-Specific Pathway Inference |
|
|
671 | (10) |
|
30.3 Disease Gene Prioritization and Genetic Pathway Detection |
|
|
681 | (3) |
|
|
684 | (1) |
|
|
685 | (1) |
|
|
685 | (1) |
|
|
685 | (6) |
|
31 Heterogeneity of Differential Expression in Cancer Studies: Algorithms and Methods |
|
|
691 | (32) |
|
Radha Krishna Murthy Karuturi |
|
|
|
691 | (1) |
|
|
692 | (2) |
|
31.3 Differential Mean of Expression |
|
|
694 | (5) |
|
31.3.1 Single Factor Differential Expression |
|
|
695 | (2) |
|
31.3.2 Multifactor Differential Expression |
|
|
697 | (1) |
|
31.3.3 Empirical Bayes Extension |
|
|
698 | (1) |
|
31.4 Differential Variability of Expression |
|
|
699 | (2) |
|
31.4.1 F-Test for Two-Group Differential Variability Analysis |
|
|
699 | (1) |
|
31.4.2 Bartlett's and Levene's Tests for Multigroup Differential Variability Analysis |
|
|
700 | (1) |
|
31.5 Differential Expression in Compendium of Tumors |
|
|
701 | (4) |
|
31.5.1 Gaussian Mixture Model (GMM) for Finite Levels of Expression |
|
|
701 | (2) |
|
31.5.2 Outlier Detection Strategy |
|
|
703 | (1) |
|
|
704 | (1) |
|
31.6 Differential Expression by Chromosomal Aberrations: The Local Properties |
|
|
705 | (6) |
|
31.6.1 Wavelet Variance Scanning (WAVES) for Single-Sample Analysis |
|
|
708 | (1) |
|
31.6.2 Local Singular Value Decomposition (LSVD) for Compendium of Tumors |
|
|
709 | (1) |
|
31.6.3 Locally Adaptive Statistical Procedure (LAP) for Compendium of Tumors with Control Samples |
|
|
710 | (1) |
|
31.7 Differential Expression in Gene Interactome |
|
|
711 | (3) |
|
31.7.1 Friendly Neighbors Algorithm: A Multiplicative Interactome |
|
|
711 | (1) |
|
31.7.2 GeneRank: A Contributing Interactome |
|
|
712 | (1) |
|
31.7.3 Top Scoring Pairs (TSP): A Differential Interactome |
|
|
713 | (1) |
|
31.8 Differential Coexpression: Global MultiDimensional Interactome |
|
|
714 | (6) |
|
31.8.1 Kostka and Spang's Differential Coexpression Algorithm |
|
|
715 | (3) |
|
31.8.2 Differential Expression Linked Differential Coexpression |
|
|
718 | (1) |
|
31.8.3 Differential Friendly Neighbors (DiffFNs) |
|
|
718 | (2) |
|
|
720 | (1) |
|
|
720 | (3) |
|
|
723 | (142) |
|
32 Comparative Genomics: Algorithms and Applications |
|
|
725 | (24) |
|
|
|
|
725 | (2) |
|
|
727 | (1) |
|
|
727 | (7) |
|
32.3.1 Sequence Similarity-Based Method |
|
|
729 | (2) |
|
32.3.2 Phylogeny-Based Method |
|
|
731 | (1) |
|
32.3.3 Rearrangement-Based Method |
|
|
732 | (2) |
|
32.4 Gene Cluster and Synteny Detection |
|
|
734 | (9) |
|
|
736 | (3) |
|
32.4.2 Gene Cluster Detection |
|
|
739 | (4) |
|
|
743 | (1) |
|
|
743 | (6) |
|
33 Advances in Genome Rearrangement Algorithms |
|
|
749 | (24) |
|
|
|
|
749 | (3) |
|
|
752 | (1) |
|
33.3 Sorting by Reversals |
|
|
753 | (6) |
|
33.3.1 Approaches to Approximation Algorithms |
|
|
754 | (3) |
|
33.3.2 Signed Permutations |
|
|
757 | (2) |
|
33.4 Sorting by Transpositions |
|
|
759 | (2) |
|
33.4.1 Approximation Results |
|
|
760 | (1) |
|
33.4.2 Improved Running Time and Simpler Algorithms |
|
|
761 | (1) |
|
|
761 | (2) |
|
33.5.1 Sorting by Prefix Reversals |
|
|
761 | (1) |
|
33.5.2 Sorting by Prefix Transpositions |
|
|
762 | (1) |
|
33.5.3 Sorting by Block Interchange |
|
|
762 | (1) |
|
33.5.4 Short Swap and Fixed-Length Reversals |
|
|
763 | (1) |
|
33.6 Sorting by More Than One Operation |
|
|
763 | (2) |
|
33.6.1 Unified Operation: Doule Cut and Join |
|
|
764 | (1) |
|
33.7 Future Research Directions |
|
|
765 | (1) |
|
|
766 | (1) |
|
|
767 | (6) |
|
34 Computing Genomic Distances: An Algorithmic Viewpoint |
|
|
773 | (26) |
|
|
|
|
773 | (2) |
|
34.1.1 What this Chapter is About |
|
|
773 | (1) |
|
34.1.2 Definitions and Notations |
|
|
774 | (1) |
|
34.1.3 Organization of the Chapter |
|
|
775 | (1) |
|
34.2 Interval-Based Criteria |
|
|
775 | (10) |
|
34.2.1 Brief Introduction |
|
|
775 | (1) |
|
34.2.2 The Context and the Problems |
|
|
776 | (2) |
|
34.2.3 Common Intervals in Permutations and the Commuting Generators Strategy |
|
|
778 | (4) |
|
34.2.4 Conserved Intervals in Permutations and the Bound-and-Drop Strategy |
|
|
782 | (1) |
|
34.2.5 Common Intervals in Strings and the Element Plotting Strategy |
|
|
783 | (2) |
|
|
785 | (1) |
|
34.3 Character-Based Criteria |
|
|
785 | (10) |
|
34.3.1 Introduction and Definition of the Problems |
|
|
785 | (2) |
|
34.3.2 An Approximation Algorithm for BAL-FMB |
|
|
787 | (4) |
|
34.3.3 An Exact Algorithm for UNBAL-FMB |
|
|
791 | (4) |
|
34.3.4 Other Results and Open Problems |
|
|
795 | (1) |
|
|
795 | (1) |
|
|
796 | (3) |
|
35 Wavelet Algorithms for Dna Analysis |
|
|
799 | (44) |
|
|
|
799 | (3) |
|
|
802 | (10) |
|
35.2.1 Preliminary Remarks on DNA |
|
|
802 | (1) |
|
35.2.2 Indicator Function |
|
|
803 | (3) |
|
|
806 | (1) |
|
35.2.4 Representation Models |
|
|
807 | (1) |
|
35.2.5 Constraints on the Representation in R2 |
|
|
808 | (2) |
|
35.2.6 Complex Representation |
|
|
810 | (1) |
|
|
810 | (2) |
|
35.3 Statistical Correlations in DNA |
|
|
812 | (6) |
|
35.3.1 Long-Range Correlation |
|
|
812 | (2) |
|
|
814 | (3) |
|
|
817 | (1) |
|
|
818 | (5) |
|
35.4.1 Haar Wavelet Basis |
|
|
819 | (1) |
|
|
819 | (2) |
|
35.4.3 Discrete Haar Wavelet Transform |
|
|
821 | (2) |
|
35.5 Haar Wavelet Coefficients and Statistical Parameters |
|
|
823 | (3) |
|
35.6 Algorithm of the Short Haar Discrete Wavelet Transform |
|
|
826 | (2) |
|
35.7 Clusters of Wavelet Coefficients |
|
|
828 | (10) |
|
35.7.1 Cluster Analysis of the Wavelet Coefficients of the Complex DNA Representation |
|
|
830 | (4) |
|
35.7.2 Cluster Analysis of the Wavelet Coefficients of DNA Walks |
|
|
834 | (4) |
|
|
838 | (1) |
|
|
839 | (4) |
|
36 Haplotype Inference Models and Algorithms |
|
|
843 | (22) |
|
|
|
843 | (1) |
|
36.2 Problem Statement and Notations |
|
|
844 | (2) |
|
36.3 Combinatorial Methods |
|
|
846 | (5) |
|
36.3.1 Clark's Inference Rule |
|
|
846 | (2) |
|
36.3.2 Pure Parsimony Model |
|
|
848 | (1) |
|
|
849 | (2) |
|
|
851 | (2) |
|
36.4.1 Maximum Likelihood Methods |
|
|
851 | (1) |
|
|
852 | (1) |
|
36.4.3 Markov Chain Methods |
|
|
852 | (1) |
|
|
853 | (3) |
|
36.5.1 Minimum Recombinant Haplotype Configurations |
|
|
854 | (1) |
|
36.5.2 Zero Recombinant Haplotype Configurations |
|
|
854 | (1) |
|
36.5.3 Statistical Methods |
|
|
855 | (1) |
|
|
856 | (2) |
|
36.6.1 Evaluation Measurements |
|
|
856 | (1) |
|
|
857 | (1) |
|
|
857 | (1) |
|
|
858 | (1) |
|
|
859 | (6) |
|
VII ANALYSIS OF BIOLOGICAL NETWORKS |
|
|
865 | (142) |
|
37 Untangling Biological Networks Using Bioinformatics |
|
|
867 | (26) |
|
|
|
|
|
867 | (11) |
|
37.1.1 Predicting Biological Processes: A Major Challenge to Understanding Biology |
|
|
867 | (1) |
|
37.1.2 Historical Perspective and Mathematical Preliminaries of Networks |
|
|
868 | (2) |
|
37.1.3 Structural Properties of Biological Networks |
|
|
870 | (3) |
|
37.1.4 Local Topology of Biological Networks: Functional Motifs, Modules, and Communities |
|
|
873 | (5) |
|
37.2 Types of Biological Networks |
|
|
878 | (6) |
|
37.2.1 Protein-Protein Interaction Networks |
|
|
878 | (1) |
|
37.2.2 Metabolic Networks |
|
|
879 | (2) |
|
37.2.3 Transcriptional Networks |
|
|
881 | (2) |
|
37.2.4 Other Biological Networks |
|
|
883 | (1) |
|
37.3 Network Dynamic, Evolution and Disease |
|
|
884 | (3) |
|
37.3.1 Biological Network Dynamic and Evolution |
|
|
884 | (2) |
|
37.3.2 Biological Networks and Disease |
|
|
886 | (1) |
|
37.4 Future Challenges and Scope |
|
|
887 | (1) |
|
|
887 | (1) |
|
|
888 | (5) |
|
38 Probabilistic Approaches for Investigating Biological Networks |
|
|
893 | (22) |
|
|
|
38.1 Probabilistic Models for Biological Networks |
|
|
894 | (8) |
|
|
895 | (5) |
|
38.1.2 Probabilistic Boolean Networks: A Natural Extension |
|
|
900 | (1) |
|
38.1.3 Inferring Probabilistic Models from Experiments |
|
|
901 | (1) |
|
38.2 Interpretation and Quantitative Analysis of Probabilistic Models |
|
|
902 | (9) |
|
38.2.1 Dynamical Analysis and Temporal Properties |
|
|
902 | (3) |
|
38.2.2 Impact of Update Strategies for Analyzing Probabilistic Boolean Networks |
|
|
905 | (1) |
|
38.2.3 Simulations of a Probabilistic Boolean Network |
|
|
906 | (5) |
|
|
911 | (1) |
|
|
911 | (1) |
|
|
911 | (4) |
|
39 Modeling and Analysis of Biological Networks with Model Checking |
|
|
915 | (26) |
|
|
|
|
|
|
915 | (1) |
|
|
916 | (3) |
|
|
916 | (1) |
|
|
917 | (1) |
|
|
918 | (1) |
|
39.3 Analyzing Genetic Networks with Model Checking |
|
|
919 | (6) |
|
39.3.1 Boolean Regulatory Networks |
|
|
919 | (1) |
|
|
919 | (2) |
|
39.3.3 Translating Boolean Regulatory Graphs into Promela |
|
|
921 | (1) |
|
|
922 | (2) |
|
39.3.5 Concluding Remarks |
|
|
924 | (1) |
|
39.3.6 Related Work and Bibliographic Notes |
|
|
924 | (1) |
|
39.4 Probabilistic Model Checking for Biological Systems |
|
|
925 | (11) |
|
39.4.1 Motivation and Background |
|
|
926 | (1) |
|
39.4.2 A Kinetic Model of mRNA Translation |
|
|
927 | (1) |
|
39.4.3 Probabilistic Model Checking |
|
|
928 | (1) |
|
|
929 | (4) |
|
|
933 | (1) |
|
39.4.6 Concluding Remarks |
|
|
934 | (1) |
|
39.4.7 Related Work and Bibliographic Notes |
|
|
935 | (1) |
|
|
936 | (5) |
|
40 Reverse Engineering of Molecular Networks from a Common Combinatorial Approach |
|
|
941 | (14) |
|
|
|
|
|
941 | (1) |
|
40.2 Reverse-Engineering of Biological Networks |
|
|
942 | (4) |
|
40.2.1 Evaluation of the Performance of Reverse-Engineering Methods |
|
|
945 | (1) |
|
40.3 Classical Combinatorial Algorithms: A Case Study |
|
|
946 | (5) |
|
40.3.1 Benchmarking RE Combinatorial-Based Methods |
|
|
947 | (3) |
|
40.3.2 Software Availability |
|
|
950 | (1) |
|
|
951 | (1) |
|
|
951 | (1) |
|
|
951 | (4) |
|
41 Unsupervised Learning for gene Regulation Network Inference from Expression Data: A Review |
|
|
955 | (24) |
|
|
|
|
955 | (1) |
|
41.2 Gene Networks: Definition and Properties |
|
|
956 | (2) |
|
41.3 Gene Expression: Data and Analysis |
|
|
958 | (1) |
|
41.4 Network Inference as an Unsupervised Learning Problem |
|
|
959 | (1) |
|
41.5 Correlation-Based Methods |
|
|
959 | (2) |
|
41.6 Probabilistic Graphical Models |
|
|
961 | (2) |
|
41.7 Constraint-Based Data Mining |
|
|
963 | (6) |
|
41.7.1 Multiple Usages of Extracted Patterns |
|
|
965 | (1) |
|
41.7.2 Mining Gene Regulation from Transcriptome Datasets |
|
|
966 | (3) |
|
|
969 | (4) |
|
41.8.1 Statistical Validation of Network Inference |
|
|
970 | (2) |
|
41.8.2 Biological Validation |
|
|
972 | (1) |
|
41.9 Conclusion and Perspectives |
|
|
973 | (1) |
|
|
974 | (5) |
|
42 Approaches to Construction and Analysis of Microrna-Mediated Networks |
|
|
979 | (28) |
|
|
|
|
|
|
979 | (3) |
|
42.1.1 miRNA-mediated Genetic Regulatory Networks |
|
|
979 | (2) |
|
42.1.2 The Four Levels of Regulation in GRNs |
|
|
981 | (1) |
|
42.1.3 Overview of Sections |
|
|
982 | (1) |
|
42.2 Fundamental Component Interaction Research: Predicting miRNA Genes, Regulators, and Targets |
|
|
982 | (6) |
|
42.2.1 Prediction of Novel miRNA Genes |
|
|
983 | (1) |
|
42.2.2 Prediction of miRNA Targets |
|
|
984 | (1) |
|
42.2.3 Prediction of miRNA Transcript Elements and Transcriptional Regulation |
|
|
984 | (4) |
|
42.3 Identifying miRNA-mediated Networks |
|
|
988 | (5) |
|
42.3.1 Forward Engineering---Construction of Multinode Components in miRNA-mediated Networks Using Paired Interaction Information |
|
|
988 | (1) |
|
42.3.2 Reverse Engineering---Inference of MicroRNA Modules Using Top-Down Approaches |
|
|
988 | (5) |
|
42.4 Global and Local Architecture Analysis in miRNA-Containing Networks |
|
|
993 | (8) |
|
42.4.1 Global Architecture Properties of miRNA-mediated Post-transcriptional Networks |
|
|
993 | (1) |
|
42.4.2 Local Architecture Properties of miRNA-mediated Post-transcriptional Networks |
|
|
994 | (7) |
|
|
1001 | (1) |
|
|
1001 | (6) |
Index |
|
1007 | |