Preface |
|
xiii | |
|
|
1 | (150) |
|
1 Introduction: Fundamental Concepts and the Human Genome |
|
|
3 | (30) |
|
|
3 | (1) |
|
|
3 | (6) |
|
1.1.1 Motivation and aim of this book |
|
|
3 | (3) |
|
1.1.2 Overview of topics covered in this book |
|
|
6 | (2) |
|
1.1.3 What are DNA, the genome, a gene, and a chromosome? |
|
|
8 | (1) |
|
1.2 Mendel's laws, sexual reproduction, and genetic recombination |
|
|
9 | (3) |
|
1.3 Genetic polymorphisms |
|
|
12 | (3) |
|
1.3.1 Alleles, single-nucleotide polymorphisms (SNPs), and minor allele frequency (MAF) |
|
|
12 | (1) |
|
1.3.2 Monogenic, polygenic, and omnigenic effects |
|
|
13 | (2) |
|
1.4 From genes to protein and the central dogma of molecular biology |
|
|
15 | (5) |
|
1.4.1 From genes to protein: Genes, amino acids, nucleotides, and proteins |
|
|
15 | (3) |
|
1.4.2 The central dogma of molecular biology: Transcription and translation |
|
|
18 | (2) |
|
1.5 Homozygous and heterozygous alleles, dominant and recessive traits |
|
|
20 | (2) |
|
|
22 | (6) |
|
1.6.1 Defining heritability: Broad- and narrow-sense heritability |
|
|
22 | (1) |
|
1.6.2 Common misconceptions about heritability |
|
|
23 | (1) |
|
1.6.3 Twin, SNP, and GWAS heritability |
|
|
24 | (3) |
|
1.6.4 Missing and hidden heritability |
|
|
27 | (1) |
|
|
28 | (5) |
|
|
28 | (1) |
|
Further reading and resources |
|
|
29 | (1) |
|
|
30 | (3) |
|
2 A Statistical Primer for Genetic Data Analysis |
|
|
33 | (22) |
|
|
33 | (1) |
|
|
33 | (1) |
|
2.2 Basic statistical concepts |
|
|
34 | (4) |
|
2.2.1 Mean, standard deviation, and variance |
|
|
34 | (2) |
|
2.2.2 Covariance and the variance-covariance matrix |
|
|
36 | (2) |
|
|
38 | (2) |
|
|
38 | (1) |
|
2.3.2 The null and alternative hypothesis and significance thresholds |
|
|
39 | (1) |
|
2.4 Correlation, causation, and multivariate causal models |
|
|
40 | (7) |
|
2.4.1 Correlation versus causation |
|
|
40 | (2) |
|
2.4.2 Multivariate causal models |
|
|
42 | (5) |
|
2.5 Fixed-effects models, random-effects models, and mixed models |
|
|
47 | (1) |
|
2.6 Replication of results and overfitting |
|
|
48 | (1) |
|
|
49 | (6) |
|
|
50 | (2) |
|
|
52 | (1) |
|
Software for mixed-model analyses |
|
|
52 | (1) |
|
|
52 | (2) |
|
|
54 | (1) |
|
3 A Primer in Human Evolution |
|
|
55 | (22) |
|
|
55 | (1) |
|
|
55 | (1) |
|
3.2 Human dispersal out of Africa |
|
|
56 | (2) |
|
3.3 Population structure and stratification |
|
|
58 | (5) |
|
3.3.1 Population structure, genetic admixture, and Principal Component Analysis (PCA) |
|
|
58 | (1) |
|
3.3.2 Common misnomers of population structure: Ancestry is not race |
|
|
59 | (1) |
|
3.3.3 Genetic scores cannot be transferred across ancestry groups |
|
|
59 | (2) |
|
3.3.4 How genes mirror geography |
|
|
61 | (2) |
|
3.4 Human evolution, selection, and adaptation |
|
|
63 | (6) |
|
3.4.1 Evolution, fitness, and natural selection |
|
|
63 | (5) |
|
|
68 | (1) |
|
3.5 The Hardy--Weinberg equilibrium |
|
|
69 | (2) |
|
3.5.1 Assumptions of the HWE |
|
|
69 | (1) |
|
3.5.2 Understanding the notation of the HWE |
|
|
70 | (1) |
|
3.6 Linkage disequilibrium and haplotype blocks |
|
|
71 | (2) |
|
|
73 | (4) |
|
|
73 | (1) |
|
Further reading and resources |
|
|
74 | (1) |
|
|
74 | (3) |
|
4 Genome-Wide Association Studies |
|
|
77 | (24) |
|
|
77 | (1) |
|
4.1 Introduction and background |
|
|
77 | (2) |
|
4.2 GWAS research design and meta-analysis |
|
|
79 | (4) |
|
4.2.1 GWAS research design |
|
|
79 | (2) |
|
|
81 | (1) |
|
|
82 | (1) |
|
4.3 Statistical inference, methods, and heterogeneity |
|
|
83 | (7) |
|
4.3.1 Nature of the phenotype |
|
|
83 | (1) |
|
4.3.2 P-values and Z-scores |
|
|
83 | (1) |
|
4.3.3 Correcting for multiple testing in a GWAS |
|
|
84 | (1) |
|
|
85 | (2) |
|
4.3.5 Evaluating dichotomous versus quantitative traits |
|
|
87 | (1) |
|
4.3.6 Fixed-effects versus random-effects models |
|
|
88 | (1) |
|
4.3.7 Weighting, false discovery rate (FDR), and imputation |
|
|
89 | (1) |
|
4.3.8 Sources of heterogeneity |
|
|
89 | (1) |
|
4.4 Quality control (QC) of genetic data |
|
|
90 | (1) |
|
4.5 The NHCRI-EBI GWAS Catalog |
|
|
91 | (6) |
|
4.5.1 What is the NHGRI-EBI GWAS Catalog? |
|
|
91 | (1) |
|
4.5.2 A brief history of the GWAS |
|
|
91 | (2) |
|
4.5.3 Lack of diversity in GWASs |
|
|
93 | (4) |
|
4.6 Conclusion and future directions |
|
|
97 | (4) |
|
|
98 | (1) |
|
|
98 | (1) |
|
|
99 | (2) |
|
5 Introduction to Polygenic Scores and Genetic Architecture |
|
|
101 | (28) |
|
|
101 | (1) |
|
|
101 | (6) |
|
5.1.1 What is a polygenic score? |
|
|
105 | (1) |
|
5.1.2 The origins of polygenic scores |
|
|
105 | (2) |
|
5.2 Construction of polygenic scores |
|
|
107 | (1) |
|
5.2.1 Large sample sizes required in GWAS discovery |
|
|
108 | (1) |
|
5.2.2 Selection of SNPs to include |
|
|
108 | (1) |
|
5.3 Validation and prediction of polygenic scores |
|
|
108 | (5) |
|
5.3.1 Independent target sample |
|
|
109 | (1) |
|
5.3.2 Similar ancestry in target sample |
|
|
110 | (1) |
|
5.3.3 Relatedness, population stratification, and differential bias |
|
|
110 | (1) |
|
5.3.4 Variance explained only by common genetic markers missing rare variants |
|
|
111 | (1) |
|
5.3.5 Missing and hidden heritability in prediction of phenotypes from genetic markers (SNPs) |
|
|
111 | (1) |
|
5.3.6 Trade-off between prediction and understanding biological mechanisms |
|
|
112 | (1) |
|
5.4 Shared genetic architecture of phenotypes |
|
|
113 | (6) |
|
5.4.1 Predicting other phenotypes |
|
|
113 | (1) |
|
5.4.2 Phenotypic and genetic correlation |
|
|
114 | (1) |
|
|
115 | (4) |
|
5.4.4 Multitrait analysis |
|
|
119 | (1) |
|
5.5 Causal modeling with polygenic scores |
|
|
119 | (4) |
|
5.5.1 Genetic confounding |
|
|
119 | (1) |
|
5.5.2 Mendelian Randomization |
|
|
120 | (1) |
|
5.5.3 Controlling for confounders |
|
|
120 | (2) |
|
5.5.4 Gene-environment interaction and heterogeneity |
|
|
122 | (1) |
|
|
123 | (6) |
|
|
124 | (1) |
|
|
124 | (1) |
|
|
125 | (4) |
|
6 Gene-Environment Interplay |
|
|
129 | (22) |
|
|
129 | (1) |
|
6.1 Introduction: What is gene-environment (GxE) interplay? |
|
|
129 | (1) |
|
6.2 Defining the environment in GxE research |
|
|
130 | (3) |
|
6.2.1 Nature and scope of E: Multilevel, multidomain, and multitemporal |
|
|
131 | (1) |
|
6.2.2 Interdependence of environmental risk factors |
|
|
132 | (1) |
|
6.3 A brief history of GxE research |
|
|
133 | (3) |
|
|
133 | (1) |
|
6.3.2 Candidate gene cGxE approaches |
|
|
134 | (1) |
|
6.3.3 Genome-wide polygenic score GxE approaches |
|
|
135 | (1) |
|
6.4 Conceptual GxE models |
|
|
136 | (7) |
|
6.4.1 Diathesis-stress, vulnerability, or contextual triggering model |
|
|
136 | (1) |
|
6.4.2 Bioecological or social compensation model |
|
|
137 | (2) |
|
6.4.3 Differential susceptibility model |
|
|
139 | (1) |
|
6.4.4 Social control or social push model |
|
|
140 | (1) |
|
6.4.5 Research designs to study GxE |
|
|
140 | (3) |
|
6.5 Gene-environment correlation (rGE) |
|
|
143 | (3) |
|
6.5.1 Passive gene-environment correlation (rGE) |
|
|
144 | (1) |
|
6.5.2 Evocative (or reactive) rGE |
|
|
145 | (1) |
|
|
145 | (1) |
|
6.5.4 Why are models of rGE important? |
|
|
145 | (1) |
|
6.5.5 Research designs to study rGE |
|
|
146 | (1) |
|
6.6 Conclusion and future directions |
|
|
146 | (5) |
|
6.6.1 Why haven't many GxEs been identified? |
|
|
146 | (1) |
|
|
147 | (1) |
|
|
147 | (1) |
|
|
147 | (4) |
|
II Working with Genetic Data |
|
|
151 | (124) |
|
7 Genetic Data and Analytical Challenges |
|
|
153 | (30) |
|
|
153 | (1) |
|
|
153 | (1) |
|
7.2 Genotyping and sequencing array |
|
|
154 | (6) |
|
7.2.1 Genotyping and sequencing technologies |
|
|
154 | (1) |
|
7.2.2 Linkage disequilibrium and imputation |
|
|
155 | (3) |
|
7.2.3 Limitations of genotyping arrays and next-generation sequencing |
|
|
158 | (1) |
|
7.2.4 Drop in costs per genome |
|
|
159 | (1) |
|
7.3 Overview of human genetic data for analysis |
|
|
160 | (5) |
|
7.3.1 Prominently used genetic data |
|
|
161 | (2) |
|
7.3.2 Sources that archive and distribute data |
|
|
163 | (1) |
|
7.3.3 Obtaining GWAS summary statistics |
|
|
164 | (1) |
|
7.4 Different formats in genomics data |
|
|
165 | (6) |
|
7.4.1 Genomics data is big data |
|
|
165 | (1) |
|
7.4.2 PLINK software and genotype formats |
|
|
166 | (4) |
|
|
170 | (1) |
|
7.5 Genetic formats for imputed data |
|
|
171 | (4) |
|
|
171 | (1) |
|
7.5.2 Oxford file formats |
|
|
172 | (2) |
|
7.5.3 The variant call format (VCF) |
|
|
174 | (1) |
|
7.6 Data used in this book |
|
|
175 | (1) |
|
7.7 Data transfer, storage, size, and computing power |
|
|
176 | (3) |
|
|
176 | (1) |
|
7.7.2 Data sharing, transfer across borders, and cloud storage |
|
|
177 | (1) |
|
7.7.3 Size of data and computational power |
|
|
178 | (1) |
|
|
179 | (4) |
|
|
179 | (1) |
|
Further reading and resources |
|
|
179 | (1) |
|
|
180 | (3) |
|
8 Working with Genetic Data, Part I: Data Management, Descriptive Statistics, and Quality Control |
|
|
183 | (34) |
|
|
183 | (1) |
|
8.1 Introduction: Working with genetic data |
|
|
183 | (1) |
|
8.2 Getting started with PLINK |
|
|
184 | (9) |
|
|
184 | (2) |
|
8.2.2 Calling PLINK and the PLINK command line |
|
|
186 | (2) |
|
8.2.3 Running scripts in terminal |
|
|
188 | (1) |
|
8.2.4 Opening PLINK files |
|
|
189 | (1) |
|
8.2.5 Recode binary files to create new readable dataset with .ped and .map files |
|
|
189 | (2) |
|
8.2.6 Import data from other formats |
|
|
191 | (2) |
|
|
193 | (6) |
|
8.3.1 Select individuals and markers |
|
|
193 | (3) |
|
8.3.2 Merge different genetic files and attaching a phenotype |
|
|
196 | (3) |
|
8.4 Descriptive statistics |
|
|
199 | (3) |
|
|
199 | (1) |
|
|
200 | (2) |
|
8.5 Quality control of genetic data |
|
|
202 | (9) |
|
|
203 | (3) |
|
|
206 | (3) |
|
8.5.3 Genome-wide association meta-analysis QC |
|
|
209 | (2) |
|
|
211 | (6) |
|
|
214 | (1) |
|
Further reading and resources |
|
|
214 | (1) |
|
|
214 | (3) |
|
9 Working with Genetic Data, Part II: Association Analysis, Population Stratification, and Genetic Relatedness |
|
|
217 | (26) |
|
|
217 | (1) |
|
|
217 | (1) |
|
9.1.1 Aim of this chapter |
|
|
217 | (1) |
|
9.12 Data and computer programs used in this chapter |
|
|
218 | (1) |
|
|
218 | (5) |
|
9.3 Linkage disequilibrium |
|
|
223 | (3) |
|
9.4 Population stratification |
|
|
226 | (10) |
|
|
236 | (2) |
|
9.6 Relatedness matrix and heritability with GCTA |
|
|
238 | (2) |
|
|
240 | (3) |
|
|
241 | (1) |
|
Further reading and resources |
|
|
241 | (1) |
|
|
241 | (2) |
|
10 An Applied Guide to Creating and Validating Polygenic Scores |
|
|
243 | (32) |
|
|
243 | (1) |
|
|
243 | (2) |
|
10.1.1 Creating a polygenic score |
|
|
243 | (1) |
|
10.1.2 Data used in this chapter |
|
|
244 | (1) |
|
10.2 How to construct a score with selected variants (monogenic) |
|
|
245 | (2) |
|
10.3 Pruning and thresholding method |
|
|
247 | (4) |
|
10.4 How to calculate a polygenic score using PRSice 2.0 |
|
|
251 | (9) |
|
|
260 | (7) |
|
10.6 LDpred: Accounting for LD in polygenic score calculations |
|
|
267 | (5) |
|
10.6.1 Introduction and three steps |
|
|
267 | (5) |
|
|
272 | (3) |
|
|
273 | (1) |
|
Further reading and resources |
|
|
273 | (1) |
|
|
274 | (1) |
|
III Applications and Advanced Topics |
|
|
275 | (106) |
|
11 Polygenic Score and Gene-Environment Interaction (GxE) Applications |
|
|
277 | (38) |
|
|
277 | (1) |
|
|
277 | (1) |
|
11.2 Polygenic score applications: (Cross-trait) prediction and confounding |
|
|
278 | (21) |
|
11.2.1 Out-of-sample prediction |
|
|
278 | (10) |
|
11.2.2 Cross-trait prediction and genetic covariation |
|
|
288 | (7) |
|
11.2.3 Genetic confounding |
|
|
295 | (4) |
|
11.3 Gene-environment interaction |
|
|
299 | (9) |
|
11.3.1 Application: BMIx birth cohort |
|
|
300 | (8) |
|
11.4 Challenges in gene-environment interaction research |
|
|
308 | (2) |
|
11.5 Conclusion and future directions |
|
|
310 | (5) |
|
|
311 | (1) |
|
|
311 | (1) |
|
|
311 | (4) |
|
12 Applying Genome-Wide Association Results |
|
|
315 | (24) |
|
|
315 | (1) |
|
|
315 | (1) |
|
12.2 Plotting association results |
|
|
316 | (8) |
|
|
316 | (4) |
|
12.1.2 Regional association plots |
|
|
320 | (1) |
|
12.1.3 Quantile-Quantile plots and the λ statistic |
|
|
320 | (4) |
|
12.2 Estimating heritability from summary statistics |
|
|
324 | (4) |
|
12.3 Estimating genetic correlations from summary statistics |
|
|
328 | (5) |
|
12.4 MTAC: Multi-Trait Analysis of Genome-wide association summary statistics |
|
|
333 | (3) |
|
|
336 | (3) |
|
|
336 | (1) |
|
Further reading and resources |
|
|
336 | (1) |
|
|
337 | (2) |
|
13 Mendelian Randomization and Instrumental Variables |
|
|
339 | (20) |
|
|
339 | (1) |
|
|
339 | (2) |
|
13.2 Randomized control trials and causality |
|
|
341 | (1) |
|
13.3 Mendelian Randomization |
|
|
341 | (2) |
|
13.4 Instrumental variables and Mendelian Randomization |
|
|
343 | (6) |
|
13.4.1 The IV model in an MR framework |
|
|
343 | (4) |
|
13.4.2 Violation of statistical assumptions of the IV approach |
|
|
347 | (2) |
|
13.5 Extensions of standard MR |
|
|
349 | (3) |
|
13.5.1 Using multiple markers as independent instruments |
|
|
351 | (1) |
|
13.5.2 Using polygenic scores as IVs |
|
|
351 | (1) |
|
13.5.3 Bidirectional MR analyses |
|
|
352 | (1) |
|
|
352 | (3) |
|
13.6.1 Consequences of alcohol consumption |
|
|
352 | (1) |
|
13.6.2 Body mass index and mortality |
|
|
353 | (1) |
|
13.6.3 Causes of dementia and Alzheimer's disease |
|
|
354 | (1) |
|
|
355 | (4) |
|
|
355 | (1) |
|
|
356 | (1) |
|
|
356 | (3) |
|
14 Ethical Issues in Genomics Research |
|
|
359 | (18) |
|
|
359 | (1) |
|
|
359 | (2) |
|
14.2 Genetics is not destiny: Genetic determinism |
|
|
361 | (2) |
|
14.2.1 Variation in traits and ability to use individual PGSs as predictors |
|
|
361 | (1) |
|
14.2.2 Heritability and missing heritability |
|
|
362 | (1) |
|
14.3 Clinical use of PGSs |
|
|
363 | (4) |
|
14.3.1 Genetics and family history |
|
|
363 | (1) |
|
14.3.2 Genetic scores for screening, intervention, and life planning |
|
|
364 | (1) |
|
|
365 | (1) |
|
14.3.4 Public understanding of genetic information and information risks |
|
|
366 | (1) |
|
14.4 Lack of diversity in genomics |
|
|
367 | (1) |
|
14.4.1 Lack of diversity in GWASs |
|
|
367 | (1) |
|
14.4.2 European ancestry bias related to PGS construction |
|
|
367 | (1) |
|
14.5 Privacy, consent, legal issues, insurance, and General Data Protection Regulation |
|
|
367 | (5) |
|
14.5.1 Privacy in the age of public genetics: Solving crimes and finding people |
|
|
367 | (1) |
|
14.5.2 The changing nature of informed consent in genomic research |
|
|
368 | (1) |
|
14.5.3 Insurance and genetics |
|
|
369 | (1) |
|
|
370 | (2) |
|
14.6 Conclusion and future directions |
|
|
372 | (5) |
|
Further reading and resources |
|
|
373 | (1) |
|
|
373 | (4) |
|
15 Conclusions and Future Directions |
|
|
377 | (4) |
|
15.1 Summary and reflection |
|
|
377 | (1) |
|
|
377 | (4) |
|
|
380 | (1) |
|
Appendix 1 Software Used in This Book |
|
|
381 | (8) |
|
|
381 | (1) |
|
|
381 | (1) |
|
|
382 | (1) |
|
|
382 | (1) |
|
|
382 | (1) |
|
|
383 | (2) |
|
A1.6.1 How to switch from Python 3 to Python 2 |
|
|
384 | (1) |
|
A1.6.2 Installing packages in Python |
|
|
385 | (1) |
|
|
385 | (1) |
|
|
386 | (1) |
|
|
386 | (1) |
|
|
387 | (1) |
|
A1.11 Using Windows for this book |
|
|
388 | (1) |
|
|
388 | (1) |
|
Appendix 2 Data Used in This Book |
|
|
389 | (10) |
|
|
389 | (1) |
|
A2.2 Description of simulated data |
|
|
389 | (2) |
|
A2.3 Health and Retirement Study |
|
|
391 | (4) |
|
A2.4 Data used by chapter |
|
|
395 | (4) |
|
|
397 | (2) |
Glossary |
|
399 | (6) |
Notes |
|
405 | (4) |
Index |
|
409 | |