Klienditugi: 7440010 (E-R 10-18)

Introduction to Statistical Genetic Data Analysis [Pehme köide]

4.67/5 (12 hinnangut Goodreads-ist)

Felix C. Tropf (École Nationale de la Statistique et de L'administration Économique (ENSAE)), Nicola Barban (University of Essex), Melinda C. Mills (University of Oxford)

Formaat: Paperback / softback, 432 pages, kõrgus x laius x paksus: 229x178x25 mm, 72 b&w illus.; 144 Illustrations
Sari: The MIT Press
Ilmumisaeg: 18-Feb-2020
Kirjastus: MIT Press
ISBN-10: 0262538385
ISBN-13: 9780262538381

Teised raamatud teemal:

Genetics (non-medical) - (Hetkel poes: 3 nimetust)
Biogeography
Genetic engineering
Dynamics & statics
Molecular biology

Pehme köide
Hind: 54,10 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 432 pages, kõrgus x laius x paksus: 229x178x25 mm, 72 b&w illus.; 144 Illustrations
Sari: The MIT Press
Ilmumisaeg: 18-Feb-2020
Kirjastus: MIT Press
ISBN-10: 0262538385
ISBN-13: 9780262538381

Teised raamatud teemal:

Genetics (non-medical) - (Hetkel poes: 3 nimetust)
Biogeography
Genetic engineering
Dynamics & statics
Molecular biology

Püsilink: https://www.kriso.ee/db/9780262538381.html

Märksõnad:

A comprehensive introduction to modern applied statistical genetic data analysis, accessible to those without a background in molecular biology or genetics.

Human genetic research is now relevant beyond biology, epidemiology, and the medical sciences, with applications in such fields as psychology, psychiatry, statistics, demography, sociology, and economics. With advances in computing power, the availability of data, and new techniques, it is now possible to integrate large-scale molecular genetic information into research across a broad range of topics. This book offers the first comprehensive introduction to modern applied statistical genetic data analysis that covers theory, data preparation, and analysis of molecular genetic data, with hands-on computer exercises. It is accessible to students and researchers in any empirically oriented medical, biological, or social science discipline; a background in molecular biology or genetics is not required.

The book first provides foundations for statistical genetic data analysis, including a survey of fundamental concepts, primers on statistics and human evolution, and an introduction to polygenic scores. It then covers the practicalities of working with genetic data, discussing such topics as analytical challenges and data management. Finally, the book presents applications and advanced topics, including polygenic score and gene-environment interaction applications, Mendelian Randomization and instrumental variables, and ethical issues. The software and data used in the book are freely available and can be found on the book's website.

A comprehensive introduction to modern applied statistical genetic data analysis, accessible to those without a background in molecular biology or genetics.

Preface

xiii

I Foundations

(150)

1 Introduction: Fundamental Concepts and the Human Genome

(30)

Objectives

(1)

1.1 Introduction

(6)

1.1.1 Motivation and aim of this book

(3)

1.1.2 Overview of topics covered in this book

(2)

1.1.3 What are DNA, the genome, a gene, and a chromosome?

(1)

1.2 Mendel's laws, sexual reproduction, and genetic recombination

(3)

1.3 Genetic polymorphisms

(3)

1.3.1 Alleles, single-nucleotide polymorphisms (SNPs), and minor allele frequency (MAF)

(1)

1.3.2 Monogenic, polygenic, and omnigenic effects

(2)

1.4 From genes to protein and the central dogma of molecular biology

(5)

1.4.1 From genes to protein: Genes, amino acids, nucleotides, and proteins

(3)

1.4.2 The central dogma of molecular biology: Transcription and translation

(2)

1.5 Homozygous and heterozygous alleles, dominant and recessive traits

(2)

1.6 Heritability

(6)

1.6.1 Defining heritability: Broad- and narrow-sense heritability

(1)

1.6.2 Common misconceptions about heritability

(1)

1.6.3 Twin, SNP, and GWAS heritability

(3)

1.6.4 Missing and hidden heritability

(1)

1.7 Conclusion

(5)

Exercises

(1)

Further reading and resources

(1)

References

(3)

2 A Statistical Primer for Genetic Data Analysis

(22)

Objectives

(1)

2.1 Introduction

(1)

2.2 Basic statistical concepts

(4)

2.2.1 Mean, standard deviation, and variance

(2)

2.2.2 Covariance and the variance-covariance matrix

(2)

2.3 Statistical models

(2)

2.3.1 Regression models

(1)

2.3.2 The null and alternative hypothesis and significance thresholds

(1)

2.4 Correlation, causation, and multivariate causal models

(7)

2.4.1 Correlation versus causation

(2)

2.4.2 Multivariate causal models

(5)

2.5 Fixed-effects models, random-effects models, and mixed models

(1)

2.6 Replication of results and overfitting

(1)

2.7 Conclusion

(6)

Exercises

(2)

Further reading

(1)

Software for mixed-model analyses

(1)

Appendix

(2)

References

(1)

3 A Primer in Human Evolution

(22)

Objectives

(1)

3.1 Introduction

(1)

3.2 Human dispersal out of Africa

(2)

3.3 Population structure and stratification

(5)

3.3.1 Population structure, genetic admixture, and Principal Component Analysis (PCA)

(1)

3.3.2 Common misnomers of population structure: Ancestry is not race

(1)

3.3.3 Genetic scores cannot be transferred across ancestry groups

(2)

3.3.4 How genes mirror geography

(2)

3.4 Human evolution, selection, and adaptation

(6)

3.4.1 Evolution, fitness, and natural selection

(5)

3.4.2 Genetic drift

(1)

3.5 The Hardy--Weinberg equilibrium

(2)

3.5.1 Assumptions of the HWE

(1)

3.5.2 Understanding the notation of the HWE

(1)

3.6 Linkage disequilibrium and haplotype blocks

(2)

3.7 Conclusion

(4)

Exercises

(1)

Further reading and resources

(1)

References

(3)

4 Genome-Wide Association Studies

(24)

Objectives

(1)

4.1 Introduction and background

(2)

4.2 GWAS research design and meta-analysis

(4)

4.2.1 GWAS research design

(2)

4.2.2 Data analysis plan

(1)

4.2.3 Meta-analysis

(1)

4.3 Statistical inference, methods, and heterogeneity

(7)

4.3.1 Nature of the phenotype

(1)

4.3.2 P-values and Z-scores

(1)

4.3.3 Correcting for multiple testing in a GWAS

(1)

4.3.4 Manhattan plots

(2)

4.3.5 Evaluating dichotomous versus quantitative traits

(1)

4.3.6 Fixed-effects versus random-effects models

(1)

4.3.7 Weighting, false discovery rate (FDR), and imputation

(1)

4.3.8 Sources of heterogeneity

(1)

4.4 Quality control (QC) of genetic data

(1)

4.5 The NHCRI-EBI GWAS Catalog

(6)

4.5.1 What is the NHGRI-EBI GWAS Catalog?

(1)

4.5.2 A brief history of the GWAS

(2)

4.5.3 Lack of diversity in GWASs

(4)

4.6 Conclusion and future directions

(4)

Exercises

(1)

Further reading

(1)

References

(2)

5 Introduction to Polygenic Scores and Genetic Architecture

101

(28)

Objectives

101

(1)

5.1 Introduction

101

(6)

5.1.1 What is a polygenic score?

105

(1)

5.1.2 The origins of polygenic scores

105

(2)

5.2 Construction of polygenic scores

107

(1)

5.2.1 Large sample sizes required in GWAS discovery

108

(1)

5.2.2 Selection of SNPs to include

108

(1)

5.3 Validation and prediction of polygenic scores

108

(5)

5.3.1 Independent target sample

109

(1)

5.3.2 Similar ancestry in target sample

110

(1)

5.3.3 Relatedness, population stratification, and differential bias

110

(1)

5.3.4 Variance explained only by common genetic markers missing rare variants

111

(1)

5.3.5 Missing and hidden heritability in prediction of phenotypes from genetic markers (SNPs)

111

(1)

5.3.6 Trade-off between prediction and understanding biological mechanisms

112

(1)

5.4 Shared genetic architecture of phenotypes

113

(6)

5.4.1 Predicting other phenotypes

113

(1)

5.4.2 Phenotypic and genetic correlation

114

(1)

5.4.3 Pleiotropy

115

(4)

5.4.4 Multitrait analysis

119

(1)

5.5 Causal modeling with polygenic scores

119

(4)

5.5.1 Genetic confounding

119

(1)

5.5.2 Mendelian Randomization

120

(1)

5.5.3 Controlling for confounders

120

(2)

5.5.4 Gene-environment interaction and heterogeneity

122

(1)

5.6 Conclusion

123

(6)

Exercises

124

(1)

Further reading

124

(1)

References

125

(4)

6 Gene-Environment Interplay

129

(22)

Objectives

129

(1)

6.1 Introduction: What is gene-environment (GxE) interplay?

129

(1)

6.2 Defining the environment in GxE research

130

(3)

6.2.1 Nature and scope of E: Multilevel, multidomain, and multitemporal

131

(1)

6.2.2 Interdependence of environmental risk factors

132

(1)

6.3 A brief history of GxE research

133

(3)

6.3.1 Classic approaches

133

(1)

6.3.2 Candidate gene cGxE approaches

134

(1)

6.3.3 Genome-wide polygenic score GxE approaches

135

(1)

6.4 Conceptual GxE models

136

(7)

6.4.1 Diathesis-stress, vulnerability, or contextual triggering model

136

(1)

6.4.2 Bioecological or social compensation model

137

(2)

6.4.3 Differential susceptibility model

139

(1)

6.4.4 Social control or social push model

140

(1)

6.4.5 Research designs to study GxE

140

(3)

6.5 Gene-environment correlation (rGE)

143

(3)

6.5.1 Passive gene-environment correlation (rGE)

144

(1)

6.5.2 Evocative (or reactive) rGE

145

(1)

6.5.3 Active rGE

145

(1)

6.5.4 Why are models of rGE important?

145

(1)

6.5.5 Research designs to study rGE

146

(1)

6.6 Conclusion and future directions

146

(5)

6.6.1 Why haven't many GxEs been identified?

146

(1)

Exercises

147

(1)

Further reading

147

(1)

References

147

(4)

II Working with Genetic Data

151

(124)

7 Genetic Data and Analytical Challenges

153

(30)

Objectives

153

(1)

7.1 Introduction

153

(1)

7.2 Genotyping and sequencing array

154

(6)

7.2.1 Genotyping and sequencing technologies

154

(1)

7.2.2 Linkage disequilibrium and imputation

155

(3)

7.2.3 Limitations of genotyping arrays and next-generation sequencing

158

(1)

7.2.4 Drop in costs per genome

159

(1)

7.3 Overview of human genetic data for analysis

160

(5)

7.3.1 Prominently used genetic data

161

(2)

7.3.2 Sources that archive and distribute data

163

(1)

7.3.3 Obtaining GWAS summary statistics

164

(1)

7.4 Different formats in genomics data

165

(6)

7.4.1 Genomics data is big data

165

(1)

7.4.2 PLINK software and genotype formats

166

(4)

7.4.3 PLINK binary files

170

(1)

7.5 Genetic formats for imputed data

171

(4)

7.5.1 PLINK 2.0

171

(1)

7.5.2 Oxford file formats

172

(2)

7.5.3 The variant call format (VCF)

174

(1)

7.6 Data used in this book

175

(1)

7.7 Data transfer, storage, size, and computing power

176

(3)

7.7.1 Data storage

176

(1)

7.7.2 Data sharing, transfer across borders, and cloud storage

177

(1)

7.7.3 Size of data and computational power

178

(1)

7.8 Conclusion

179

(4)

Exercises

179

(1)

Further reading and resources

179

(1)

References

180

(3)

8 Working with Genetic Data, Part I: Data Management, Descriptive Statistics, and Quality Control

183

(34)

Objectives

183

(1)

8.1 Introduction: Working with genetic data

183

(1)

8.2 Getting started with PLINK

184

(9)

8.2.1 The command line

184

(2)

8.2.2 Calling PLINK and the PLINK command line

186

(2)

8.2.3 Running scripts in terminal

188

(1)

8.2.4 Opening PLINK files

189

(1)

8.2.5 Recode binary files to create new readable dataset with .ped and .map files

189

(2)

8.2.6 Import data from other formats

191

(2)

8.3 Data management

193

(6)

8.3.1 Select individuals and markers

193

(3)

8.3.2 Merge different genetic files and attaching a phenotype

196

(3)

8.4 Descriptive statistics

199

(3)

8.4.1 Allele frequency

199

(1)

8.4.2 Missing values

200

(2)

8.5 Quality control of genetic data

202

(9)

8.5.1 Per-individual QC

203

(3)

8.5.2 Per-marker QC

206

(3)

8.5.3 Genome-wide association meta-analysis QC

209

(2)

8.6 Conclusion

211

(6)

Exercises

214

(1)

Further reading and resources

214

(1)

References

214

(3)

9 Working with Genetic Data, Part II: Association Analysis, Population Stratification, and Genetic Relatedness

217

(26)

Objectives

217

(1)

9.1 Introduction

217

(1)

9.1.1 Aim of this chapter

217

(1)

9.12 Data and computer programs used in this chapter

218

(1)

9.2 Association analysis

218

(5)

9.3 Linkage disequilibrium

223

(3)

9.4 Population stratification

226

(10)

9.5 Genetic relatedness

236

(2)

9.6 Relatedness matrix and heritability with GCTA

238

(2)

9.7 Conclusion

240

(3)

Exercises

241

(1)

Further reading and resources

241

(1)

References

241

(2)

10 An Applied Guide to Creating and Validating Polygenic Scores

243

(32)

Objectives

243

(1)

10.1 Introduction

243

(2)

10.1.1 Creating a polygenic score

243

(1)

10.1.2 Data used in this chapter

244

(1)

10.2 How to construct a score with selected variants (monogenic)

245

(2)

10.3 Pruning and thresholding method

247

(4)

10.4 How to calculate a polygenic score using PRSice 2.0

251

(9)

10.5 Validating the PGS

260

(7)

10.6 LDpred: Accounting for LD in polygenic score calculations

267

(5)

10.6.1 Introduction and three steps

267

(5)

10.7 Conclusion

272

(3)

Exercises

273

(1)

Further reading and resources

273

(1)

References

274

(1)

III Applications and Advanced Topics

275

(106)

11 Polygenic Score and Gene-Environment Interaction (GxE) Applications

277

(38)

Objectives

277

(1)

11.1 Introduction

277

(1)

11.2 Polygenic score applications: (Cross-trait) prediction and confounding

278

(21)

11.2.1 Out-of-sample prediction

278

(10)

11.2.2 Cross-trait prediction and genetic covariation

288

(7)

11.2.3 Genetic confounding

295

(4)

11.3 Gene-environment interaction

299

(9)

11.3.1 Application: BMIx birth cohort

300

(8)

11.4 Challenges in gene-environment interaction research

308

(2)

11.5 Conclusion and future directions

310

(5)

Exercises

311

(1)

Further reading

311

(1)

References

311

(4)

12 Applying Genome-Wide Association Results

315

(24)

Objectives

315

(1)

12.1 Introduction

315

(1)

12.2 Plotting association results

316

(8)

12.2.1 Manhattan plots

316

(4)

12.1.2 Regional association plots

320

(1)

12.1.3 Quantile-Quantile plots and the λ statistic

320

(4)

12.2 Estimating heritability from summary statistics

324

(4)

12.3 Estimating genetic correlations from summary statistics

328

(5)

12.4 MTAC: Multi-Trait Analysis of Genome-wide association summary statistics

333

(3)

12.5 Conclusion

336

(3)

Exercises

336

(1)

Further reading and resources

336

(1)

References

337

(2)

13 Mendelian Randomization and Instrumental Variables

339

(20)

Objectives

339

(1)

13.1 Introduction

339

(2)

13.2 Randomized control trials and causality

341

(1)

13.3 Mendelian Randomization

341

(2)

13.4 Instrumental variables and Mendelian Randomization

343

(6)

13.4.1 The IV model in an MR framework

343

(4)

13.4.2 Violation of statistical assumptions of the IV approach

347

(2)

13.5 Extensions of standard MR

349

(3)

13.5.1 Using multiple markers as independent instruments

351

(1)

13.5.2 Using polygenic scores as IVs

351

(1)

13.5.3 Bidirectional MR analyses

352

(1)

13.6 Applications of MR

352

(3)

13.6.1 Consequences of alcohol consumption

352

(1)

13.6.2 Body mass index and mortality

353

(1)

13.6.3 Causes of dementia and Alzheimer's disease

354

(1)

13.7 Conclusion

355

(4)

Exercises

355

(1)

Further reading

356

(1)

References

356

(3)

14 Ethical Issues in Genomics Research

359

(18)

Objectives

359

(1)

14.1 Introduction

359

(2)

14.2 Genetics is not destiny: Genetic determinism

361

(2)

14.2.1 Variation in traits and ability to use individual PGSs as predictors

361

(1)

14.2.2 Heritability and missing heritability

362

(1)

14.3 Clinical use of PGSs

363

(4)

14.3.1 Genetics and family history

363

(1)

14.3.2 Genetic scores for screening, intervention, and life planning

364

(1)

14.3.3 Pharmacogenetics

365

(1)

14.3.4 Public understanding of genetic information and information risks

366

(1)

14.4 Lack of diversity in genomics

367

(1)

14.4.1 Lack of diversity in GWASs

367

(1)

14.4.2 European ancestry bias related to PGS construction

367

(1)

14.5 Privacy, consent, legal issues, insurance, and General Data Protection Regulation

367

(5)

14.5.1 Privacy in the age of public genetics: Solving crimes and finding people

367

(1)

14.5.2 The changing nature of informed consent in genomic research

368

(1)

14.5.3 Insurance and genetics

369

(1)

14.5.4 GDPR and genetics

370

(2)

14.6 Conclusion and future directions

372

(5)

Further reading and resources

373

(1)

References

373

(4)

15 Conclusions and Future Directions

377

(4)

15.1 Summary and reflection

377

(1)

15.2 Future directions

377

(4)

References

380

(1)

Appendix 1 Software Used in This Book

381

(8)

A1.1 Introduction

381

(1)

A1.2 RStudio and R

381

(1)

A1.3 PLINK

382

(1)

A1.4 GCTA

382

(1)

A1.5 PRSice

382

(1)

A1.6 Python

383

(2)

A1.6.1 How to switch from Python 3 to Python 2

384

(1)

A1.6.2 Installing packages in Python

385

(1)

A1.7 Git

385

(1)

A1.8 LDpred

386

(1)

A1.9 LDSC

386

(1)

A1.10 MTAG

387

(1)

A1.11 Using Windows for this book

388

(1)

References

388

(1)

Appendix 2 Data Used in This Book

389

(10)

A2.1 Introduction

389

(1)

A2.2 Description of simulated data

389

(2)

A2.3 Health and Retirement Study

391

(4)

A2.4 Data used by chapter

395

(4)

References

397

(2)

Glossary

399

(6)

Notes

405

(4)

Index

409

Introduction to Statistical Genetic Data Analysis [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv