Muutke küpsiste eelistusi

Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics [Kõva köide]

Editor-in-chief (Associate Professor in Computer Science, University of Nantes), Edited by (Post-doctoral Research Fellow, Department of Human Genetics, University of Chicago)
  • Formaat: Hardback, 478 pages, kõrgus x laius x paksus: 249x202x25 mm, kaal: 1162 g, 99 b/w and 10 colour illustrations
  • Ilmumisaeg: 18-Sep-2014
  • Kirjastus: Oxford University Press
  • ISBN-10: 0198709021
  • ISBN-13: 9780198709022
Teised raamatud teemal:
  • Formaat: Hardback, 478 pages, kõrgus x laius x paksus: 249x202x25 mm, kaal: 1162 g, 99 b/w and 10 colour illustrations
  • Ilmumisaeg: 18-Sep-2014
  • Kirjastus: Oxford University Press
  • ISBN-10: 0198709021
  • ISBN-13: 9780198709022
Teised raamatud teemal:
Nowadays bioinformaticians and geneticists are faced with myriad high-throughput data usually presenting the characteristics of uncertainty, high dimensionality and large complexity.

These data will only allow insights into this wealth of so-called 'omics' data if represented by flexible and scalable models, prior to any further analysis. At the interface between statistics and machine learning, probabilistic graphical models (PGMs) represent a powerful formalism to discover complex networks of relations.

These models are also amenable to incorporating a priori biological information. Network reconstruction from gene expression data represents perhaps the most emblematic area of research where PGMs have been successfully applied. However these models have also created renewed interest in genetics in the broad sense, in particular regarding association genetics, causality discovery, prediction of outcomes, detection of copy number variations, and epigenetics. This book provides an overview of the applications of PGMs to genetics, genomics and postgenomics to meet this increased interest.

A salient feature of bioinformatics, interdisciplinarity, reaches its limit when an intricate cooperation between domain specialists is requested. Currently, few people are specialists in the design of advanced methods using probabilistic graphical models for postgenomics or genetics. This book deciphers such models so that their perceived difficulty no longer hinders their use and focuses on fifteen illustrations showing the mechanisms behind the models.

Probabilistic Graphical Models for Genetics, Genomics and Postgenomics covers six main themes:
(1) Gene network inference
(2) Causality discovery
(3) Association genetics
(4) Epigenetics
(5) Detection of copy number variations
(6) Prediction of outcomes from high-dimensional genomic data.

Written by leading international experts, this is a collection of the most advanced work at the crossroads of probabilistic graphical models and genetics, genomics, and postgenomics. The self-contained chapters provide an enlightened account of the pros and cons of applying these powerful techniques.
Abbreviations xix
List of Contributors
xxiii
Part I INTRODUCTION
1 Probabilistic Graphical Models for Next-generation Genomics and Genetics
3(27)
Christine Sinoquet
1.1 Fine-grained Description of Living Systems
4(2)
1.1.1 DNA and the Genome
4(1)
1.1.2 Genes and Proteins
5(1)
1.1.3 Phenotype and Genotype
5(1)
1.1.4 Molecular Biology, Genetics, Genomics, and Postgenomics
6(1)
1.2 Higher Description Levels of Living Systems
6(10)
1.2.1 Complexity in Cells
7(2)
1.2.2 Genetics, Epigenetics, and Copy Number Polymorphism
9(2)
1.2.3 Epigenetics with Additional Prior Knowledge on the Genome
11(1)
1.2.4 Transcriptomics
11(2)
1.2.5 Transcriptomics with Prior Biological Knowledge
13(1)
1.2.6 Integrating Data from Several Levels
13(3)
1.2.7 Recapitulation
16(1)
1.3 An Era of High-throughput Genomic Technologies
16(7)
1.3.1 Genotyping
16(3)
1.3.2 Copy Number Polymorphism
19(1)
1.3.3 DNA Methylation Measurements
19(1)
1.3.4 Gene Expression Data
20(1)
1.3.5 Quantitative Trait Loci
21(2)
1.3.6 The Challenge of Handling Omics Data
23(1)
1.4 Probabilistic Graphical Models to Infer Novel Knowledge from Omics Data
23(7)
1.4.1 Gene Network Inference
24(1)
1.4.2 Causality Discovery
24(2)
1.4.3 Association Genetics
26(1)
1.4.4 Epigenetics
26(1)
1.4.5 Detection of Copy Number Variations
26(1)
1.4.6 Prediction of Outcomes from High-dimensional Genomic Data
26(4)
2 Essentials to Understand Probabilistic Graphical Models: A Tutorial about Inference and Learning
30(55)
Christine Sinoquet
2.1 Introduction
32(1)
2.2 Reminders
32(6)
2.3 Various Classes of Probabilistic Graphical Models
38(8)
2.3.1 Markov Chains and Hidden Markov Models
38(1)
2.3.2 Markov Random Fields
39(2)
2.3.3 Variants around the Concept of Markov random field
41(1)
2.3.4 Bayesian networks
41(4)
2.3.5 Unifying Model and Model Extension
45(1)
2.4 Probabilistic Inference
46(11)
2.4.1 Exact Inference
46(5)
2.4.2 Approximate Inference
51(6)
2.5 Learning Bayesian networks
57(12)
2.5.1 Parameter Learning
58(3)
2.5.2 Structure Learning
61(8)
2.6 Learning Markov random fields
69(6)
2.6.1 Parameter Learning
69(3)
2.6.2 Structure Learning
72(3)
2.7 Causal Networks
75(2)
2.8 List of General Monographs and Focused
Chapter Books
77(8)
Part II GENE EXPRESSION
3 Graphical Models and Multivariate Analysis of Microarray Data
85(20)
Harri Kiiveri
3.1 Introduction
85(2)
3.2 The Model
87(1)
3.3 Model Fitting
88(4)
3.3.1 Maximum Likelihood Estimation when the Zero Pattern is Known
89(1)
3.3.2 Determining the Pattern of Zeroes in the Inverse Covariance Matrix
90(2)
3.4 Hypothesis Testing
92(4)
3.4.1 Null Distributions by Permutation
92(1)
3.4.2 A Multivariate Test Statistic
93(1)
3.4.3 Partitioning of the Test Statistic
94(1)
3.4.4 Testing Strategies
95(1)
3.5 Example
96(3)
3.6 Discussion and Conclusions
99(6)
4 Comparison of Mixture Bayesian and Mixture Regression Approaches to Infer Gene Networks
105(16)
Sandra L. Rodriguez-Zas
Bruce R. Southey
4.1 Introduction
106(1)
4.2 Methods
107(5)
4.2.1 Mixture Bayesian Network
107(1)
4.2.2 Mixture Regression Approach
108(2)
4.2.3 Data
110(2)
4.3 Results
112(4)
4.3.1 Comparison of Mixtures
112(1)
4.3.2 Mixture Modeling of Changes in Gene Relationships
112(2)
4.3.3 Interpretation of Mixtures
114(2)
4.3.4 Inference of Large Networks
116(1)
4.4 Conclusions
116(5)
5 Network Inference in Breast Cancer with Gaussian Graphical Models and Extensions
121(28)
Marine Jeanmougin
Camille Charbonnier
Mickael Guedj
Julien Chiquet
5.1 Introduction
122(1)
5.2 Modeling of Gene Networks by Gaussian Graphical Networks
123(11)
5.2.1 Simple Gaussian graphical network
123(4)
5.2.2 Extensions Motivated by Regulatory Network Modeling
127(7)
5.3 Application to Estrogen Receptor Status in Breast Cancer
134(7)
5.3.1 Context
134(1)
5.3.2 Biological Prior Definition
135(4)
5.3.3 Network Inference from Biological Prior: Application and Interpretation
139(2)
5.4 Conclusions and Discussion
141(8)
Part III CAUSALITY DISCOVERY
6 Utilizing Genotypic Information as a Prior for Learning Gene Networks
149(16)
Kyle Chipman
Ambuj Singh
6.1 Introduction
149(2)
6.2 Methods
151(10)
6.2.1 eQTL Data sets
151(1)
6.2.2 LCMS Method for Learning a Prior Matrix of Causal Relationships
151(3)
6.2.3 Bayesian Network Structure Learning
154(1)
6.2.4 Integrating the Prior Matrix
155(1)
6.2.5 Stochastic Causal Tree Method
156(5)
6.3 Conclusion
161(4)
7 Bayesian Causal Phenotype Network Incorporating Genetic Variation and Biological Knowledge
165(31)
Jee Young Moon
Elias Chaibub Neto
Xinwei Deng
Brian S. Yandell
7.1 Introduction
166(1)
7.2 Joint Inference of Causal Phenotype Network and Causal QTLs
167(7)
7.2.1 Standard Bayesian Network Model
168(1)
7.2.2 HCGR Model
169(1)
7.2.3 Systems Genetics and Causal Inference
170(2)
7.2.4 QTL Mapping Conditional on Phenotype Network Structure
172(1)
7.2.5 Joint Inference of Phenotype Network and Causal QTLs
173(1)
7.3 Causal Phenotype Network Incorporating Biological Knowledge
174(9)
7.3.1 Model
175(3)
7.3.2 Sketch of MCMC
178(2)
7.3.3 Summary of Encoding of Biological Knowledge
180(3)
7.4 Simulations
183(2)
7.5 Analysis of Yeast Cell-Cycle Genes
185(3)
7.6 Conclusion
188(8)
8 Structural Equation Models for Studying Causal Phenotype Networks in Quantitative Genetics
196(21)
Guilherme J. M. Rosa
Bruno D. Valente
8.1 Introduction
196(1)
8.2 Classical Linear Mixed-effects Models in Quantitative Genetics
197(5)
8.3 Mixed-effects Structural Equation Models
202(2)
8.4 Data-driven Search for Phenotypic Causal Relationships
204(3)
8.4.1 General Overview
204(2)
8.4.2 Search Algorithms
206(1)
8.5 Inferring Causal Structures in Genetics Applications
207(3)
8.5.1 Genotypic information as Instrumental Variable
207(1)
8.5.2 Accounting for Polygenic Confounding Effects
208(2)
8.6 Concluding Remarks
210(7)
Part IV GENETIC ASSOCIATION STUDIES
9 Modeling Linkage Disequilibrium and Performing Association Studies through Probabilistic Graphical Models: a Visiting Tour of Recent Advances
217(30)
Christine Sinoquet
Raphael Mourad
9.1 Introduction
218(1)
9.2 Modeling Linkage Disequilibrium
219(9)
9.2.1 General Panorama
221(1)
9.2.2 Decomposable Markov Random Fields
221(2)
9.2.3 Bayesian Network-based Approaches without Latent Variables
223(1)
9.2.4 Bayesian Network-based Approaches with Latent Variables
224(2)
9.2.5 Recapitulation
226(2)
9.3 Single-SNP Approaches for Genome-wide Association Studies
228(9)
9.3.1 Integration of Confounding Factors
228(2)
9.3.2 GWAS Multilocus Approach
230(5)
9.3.3 Strengths and Limitations
235(2)
9.4 Identifying Epistasis at the Genome Scale
237(4)
9.4.1 Bayesian Network-based Approaches
237(2)
9.4.2 Markov Blanket-based Method
239(1)
9.4.3 Recapitulation
240(1)
9.5 Discussion
241(1)
9.6 Perspectives
242(5)
10 Modeling Linkage Disequilibrium with Decomposable Graphical Models
247(22)
Haley J. Abel
Alun Thomas
10.1 Introduction
248(1)
10.2 Methods
249(9)
10.2.1 Decomposable Graphical Models
249(2)
10.2.2 Estimating Decomposable Graphical Models
251(3)
10.2.3 Application to Diploid Data by Phase Imputation
254(2)
10.2.4 Estimation on the Genome-Wide Scale
256(2)
10.3 Applications
258(7)
10.3.1 Phasing
258(2)
10.3.2 Unconditional Simulation
260(1)
10.3.3 Phenotypes and Covariates
261(2)
10.3.4 Admixture Mapping
263(2)
10.4 Application to Sequence Data
265(4)
11 Scoring, Searching and Evaluating Bayesian Network Models of Gene-phenotype Association
269(25)
Xia Jiang
Shyam Visweswaran
Richard E. Neapolitan
11.1 Introduction
270(1)
11.2 Background
270(2)
11.2.1 Epistasis
270(1)
11.2.2 Genome-wide association studies
271(1)
11.3 A Bayesian Network Model
272(1)
11.4 Scoring Candidate Models
273(5)
11.4.1 Bayesian Network Scoring Criteria
273(2)
11.4.2 Experiments
275(3)
11.5 Searching over the Space of Models
278(2)
11.5.1 Experiments
280(1)
11.6 Determining Whether a Model is Sufficiently Noteworthy
280(10)
11.6.1 The Bayesian Network Posterior Probability (BNPP)
282(3)
11.6.2 Prior Probabilities
285(2)
11.6.3 Experiments
287(3)
11.7 Discussion and Further Research
290(4)
12 Graphical Modeling of Biological Pathways in Genome-wide Association Studies
294(24)
Min Chen
Judy Cho
Hongyu Zhao
12.1 Introduction
295(1)
12.2 MRF Modeling of Gene Pathways
296(4)
12.3 A Bayesian Framework
300(12)
12.3.1 Prior Specification and Likelihood Function
300(2)
12.3.2 Posterior Distribution
302(2)
12.3.3 Making Inference Based on the Posterior Distribution
304(1)
12.3.4 Numerical Studies
305(4)
12.3.5 Real Data Example---Crohn's Disease Data
309(3)
12.4 Discussion
312(6)
13 Bayesian, Systems-based, Multilevel Analysis of Associations for Complex Phenotypes: from Interpretation to Decision
318(45)
Peter Antal
Andras Millinghoffer
Gabor Hullam
Gergely Hajos
Peter Sarkozy
Andras Gezsi
Csaba Szalai
Andras Falus
13.1 Introduction
319(1)
13.2 Bayesian network-based Concepts of Association and Relevance
320(8)
13.2.1 Association and Strong Relevance
320(2)
13.2.2 Stable Distributions, Markov Blankets and Markov Boundaries
322(1)
13.2.3 Further relevance types
323(3)
13.2.4 Necessary Subsets and Sufficient Supersets in Strong Relevance
326(1)
13.2.5 Relevance for Multiple Targets
327(1)
13.3 A Bayesian View of Relevance for Complex Phenotypes
328(16)
13.3.1 Estimating the Posteriors of Complex Features
330(2)
13.3.2 Sufficiency of the Data for Full Multivariate Analysis
332(1)
13.3.3 Rate of Learning: Effect of Feature and Model Complexity
333(3)
13.3.4 Bayesian network-based Bayesian Multilevel Analysis of Relevance
336(3)
13.3.5 Posteriors for Multiple Target Variables
339(1)
13.3.6 Subtypes of Strong and Weak Relevance
340(2)
13.3.7 Interaction-redundancy Scores Based on Posteriors of Strong Relevance
342(2)
13.4 Bayes Optimal Decisions about Multivariate Relevance
344(6)
13.4.1 Optimal Decision about Univariate Relevance
344(1)
13.4.2 Optimal Bayesian Decision to Control FDR
345(3)
13.4.3 General Bayes Optimal Decision about Multivariate Relevance
348(2)
13.5 Knowledge Fusion: Relevance of Genes and Annotations
350(2)
13.6 Conclusion
352(11)
Part V EPIGENETICS
14 Bayesian Networks in the Study of Genome-wide DNA Methylation
363(24)
Meromit Singer
Lior Pachter
14.1 Introduction to Epigenetics
364(1)
14.2 Next-generation Sequencing and DNA Methylation
365(5)
14.2.1 Assaying Genome-wide DNA Methylation
366(2)
14.2.2 The methyl-Seq Method
368(2)
14.3 A Bayesian network for methyl-Seq Analysis
370(5)
14.3.1 Notation
371(1)
14.3.2 A Generative Model
371(1)
14.3.3 Parameter Learning and Inference of Posterior Probabilities
372(3)
14.4 Genomic Structure as a Prior on Methylation Status
375(4)
14.5 Application: Methyltyping the Human Neutrophil
379(2)
14.5.1 Unmethylated Clusters
379(2)
14.6 Conclusions
381(6)
15 Latent Variable Models for Analyzing DNA Methylation
387(22)
E. Andres Houseman
15.1 Introduction
388(2)
15.2 Latent Variable Methods for DNA Methylation in Low-dimensional Settings
390(6)
15.2.1 Discrete Latent Variables
391(1)
15.2.2 Continuous Latent Variables
392(4)
15.3 Latent Variable Methods for DNA Methylation in High-dimensional Settings
396(5)
15.3.1 Model-based Clustering: Recursively Partitioned Mixture Models
396(3)
15.3.2 Semi-Supervised Recursively Partitioned Mixture Models
399(2)
15.4 Conclusion
401(8)
Part VI DETECTION OF COPY NUMBER VARIATIONS
16 Detection of Copy Number Variations from Array Comparative Genomic Hybridization Data Using Linear-chain Conditional Random Field Models
409(22)
Xiaolin Yin
Jing Li
16.1 Introduction
410(1)
16.2 aCGH Data and Analysis
411(2)
16.2.1 aCGH Data
411(1)
16.2.2 Existing Algorithms
412(1)
16.3 Linear-chain CRF Model for aCGH Data
413(8)
16.3.1 Feature Functions
415(2)
16.3.2 Parameter Estimation
417(4)
16.3.3 Evaluation Methods
421(1)
16.4 Experimental Results
421(4)
16.4.1 A Real Example
421(3)
16.4.2 Simulated Data
424(1)
16.5 Conclusion
425(6)
Part VII PREDICTION OF OUTCOMES FROM HIGH-DIMENSIONAL GENOMIC DATA
17 Prediction of Clinical Outcomes from Genome-wide Data
431(16)
Shyam Visweswaran
17.1 Introduction
431(1)
17.2 Challenges with Genome-wide Data
432(1)
17.3 Background
433(2)
17.3.1 The Naive Bayes Model
433(1)
17.3.2 Bayesian Model Averaging
434(1)
17.3.3 Alzheimer's Disease
434(1)
17.4 The Model-Averaged Naive Bayes (MANB) Algorithm
435(3)
17.4.1 Overview of the MANB Algorithm
435(1)
17.4.2 Details of the MANB Algorithm
436(2)
17.5 Evaluation Protocol
438(1)
17.5.1 Data set
438(1)
17.5.2 Protocol
438(1)
17.6 Results
439(1)
17.7 Conclusion
440(7)
Index 447
Christine Sinoquet is an Associate Professor in Computer Science at the University of Nantes, France, where she works in the area of bioinformatics and computational biology at the Computer Science Institute of Nantes-Atlantic. She holds a M.Sc. in Computer Science from the University of Rennes 1 and received her Ph.D. in Computer Science from this same institution. During her Ph.D. position at the Inria Centre of Rennes, she specialized in bioinformatics. She has initiated two Master degree programs in bioinformatics (University of Clermont, France, and Nantes). She currently serves as the Head of this second Master degree program since 2005. Her research activities have been focused on various topics including data correction prior to molecular phylogeny inference, motif discovery in biological sequences, comparative genomics and imputation of missing genotypic data. Her current research interests are algorithmic and machine learning aspects of complex data analysis in the biomedical field.

Raphaël Mourad received his PhD from the University of Nantes in september 2011. His first postdoc (2011-2012) was at the Lang Li lab, Center for Computational Biology and Bioinformatics, Indiana University Purdue University of Indianapolis (IUPUI). He notably worked on the genome-wide analysis of chromatin interactions. His second postdoc (2012-2013) was at the Carole Ober Laboratory and Dan Nicolae Laboratory, Department of Human Genetics, University of Chicago. He worked on whole-genome sequencing data in asthma. As from november 2013, he started a third postdoc at the LIRMM, in Montpellier (France) which deals with the bioinformatics of HIV.