Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Algorithms for Next-Generation Sequencing

Wing-Kin Sung (National University of Singapore)

Formaat: 364 pages
Sari: Chapman & Hall/CRC Computational Biology Series
Ilmumisaeg: 18-May-2017
Kirjastus: CRC Press Inc
Keel: eng
ISBN-13: 9781498752985

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 58,49 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 364 pages
Sari: Chapman & Hall/CRC Computational Biology Series
Ilmumisaeg: 18-May-2017
Kirjastus: CRC Press Inc
Keel: eng
ISBN-13: 9781498752985

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Advances in sequencing technology have allowed scientists to study the human genome in greater depth and on a larger scale than ever before as many as hundreds of millions of short reads in the course of a few days. But what are the best ways to deal with this flood of data?

Algorithms for Next-Generation Sequencing is an invaluable tool for students and researchers in bioinformatics and computational biology, biologists seeking to process and manage the data generated by next-generation sequencing, and as a textbook or a self-study resource. In addition to offering an in-depth description of the algorithms for processing sequencing data, it also presents useful case studies describing the applications of this technology.

Arvustused

"With every advance in sequencing technology, existing string algorithms are yet again at their limits and need to be developed further. As a result, a book like this one is sorely needed. It lays out the concepts and approaches both for practical data analysis and for developing algorithms powerful enough to deal with the deluge of sequence data." Martin Vingron, Max Planck Institute for Molecular Genetics

Preface

1 Introduction

(20)

1.1 DNA, RNA, protein and cells

(2)

1.2 Sequencing technologies

(1)

1.3 First-generation sequencing

(2)

1.4 Second-generation sequencing

(6)

1.4.1 Template preparation

(1)

1.4.2 Base calling

(1)

1.4.3 Polymerase-mediated methods based on reversible terminator nucleotides

(3)

1.4.4 Polymerase-mediated methods based on unmodified nucleotides

(1)

1.4.5 Ligase-mediated method

(1)

1.5 Third-generation sequencing

(4)

1.5.1 Single-molecule real-time sequencing

(1)

1.5.2 Nanopore sequencing method

(2)

1.5.3 Direct imaging of DNA using electron microscopy

(1)

1.6 Comparison of the three generations of sequencing

(1)

1.7 Applications of sequencing

(2)

1.8 Summary and further reading

(1)

1.9 Exercises

(2)

2 NGS file formats

(14)

2.1 Introduction

(1)

2.2 Raw data files: fasta and fastq

(2)

2.3 Alignment files: SAM and BAM

(3)

2.3.1 FLAG

(1)

2.3.2 CIGAR string

(1)

2.4 Bed format

(2)

2.5 Variant Call Format (VCF)

(2)

2.6 Format for representing density data

(2)

2.7 Exercises

(2)

3 Related algorithms and data structures

(34)

3.1 Introduction

(1)

3.2 Recursion and dynamic programming

(3)

3.2.1 Key searching problem

(1)

3.2.2 Edit-distance problem

(1)

3.3 Parameter estimation

(5)

3.3.1 Maximum likelihood

(1)

3.3.2 Unobserved variable and EM algorithm

(3)

3.4 Hash data structures

(6)

3.4.1 Maintain an associative array by simple hashing

(2)

3.4.2 Maintain a set using a Bloom filter

(1)

3.4.3 Maintain a multiset using a counting Bloom filter

(1)

3.4.4 Estimating the similarity of two sets using minHash

(2)

3.5 Full-text index

(9)

3.5.1 Suffix trie and suffix tree

(1)

3.5.2 Suffix array

(1)

3.5.3 FM-index

(4)

3.5.3.1 Inverting the BWT B to the original text T

(1)

3.5.3.2 Simulate a suffix array using the FM-index

(1)

3.5.3.3 Pattern matching

(1)

3.5.4 Simulate a suffix trie using the FM-index

(1)

3.5.5 Bi-directional BWT

(2)

3.6 Data compression techniques

(8)

3.6.1 Data compression and entropy

(1)

3.6.2 Unary, gamma, and delta coding

(1)

3.6.3 Golomb code

(1)

3.6.4 Huffman coding

(2)

3.6.5 Arithmetic code

(2)

3.6.6 Order-k Markov Chain

(1)

3.6.7 Run-length encoding

(1)

3.7 Exercises

(3)

4 NGS read mapping

(54)

4.1 Introduction

(1)

4.2 Overview of the read mapping problem

(6)

4.2.1 Mapping reads with no quality score

(1)

4.2.2 Mapping reads with a quality score

(1)

4.2.3 Brute-force solution

(2)

4.2.4 Mapping quality

(1)

4.2.5 Challenges

(1)

4.3 Align reads allowing a small number of mismatches

(21)

4.3.1 Mismatch seed hashing approach

(1)

4.3.2 Read hashing with a spaced seed

(4)

4.3.3 Reference hashing approach

(2)

4.3.4 Suffix trie-based approaches

(13)

4.3.4.1 Estimating the lower bound of the number of mismatches

(2)

4.3.4.2 Divide and conquer with the enhanced pigeon- hole principle

(3)

4.3.4.3 Aligning a set of reads together

(2)

4.3.4.4 Speed up utilizing the quality score

(3)

4.4 Aligning reads allowing a small number of mismatches and indels

(8)

4.4.1 q-mer approach

(2)

4.4.2 Computing alignment using a suffix trie

(6)

4.4.2.1 Computing the edit distance using a suffix trie

100

(3)

4.4.2.2 Local alignment using a suffix trie

103

(2)

4.5 Aligning reads in general

105

(11)

4.5.1 Seed-and-extension approach

107

(7)

4.5.1.1 BWA-SW

108

(1)

4.5.1.2 Bowtie 2

109

(1)

4.5.1.3 Bat-Align

110

(1)

4.5.1.4 Cushaw2

111

(1)

4.5.1.5 BWA-MEM

112

(1)

4.5.1.6 LAST

113

(1)

4.5.2 Filter-based approach

114

(2)

4.6 Paired-end alignment

116

(1)

4.7 Further reading

117

(1)

4.8 Exercises

118

(5)

5 Genome assembly

123

(52)

5.1 Introduction

123

(1)

5.2 Whole genome shotgun sequencing

124

(2)

5.2.1 Whole genome sequencing

124

(2)

5.2.2 Mate-pair sequencing

126

(1)

5.3 De novo genome assembly for short reads

126

(28)

5.3.1 Read error correction

128

(10)

5.3.1.1 Spectral alignment problem (SAP)

129

(4)

5.3.1.2 k-mer counting

133

(5)

5.3.2 Base-by-base extension approach

138

(3)

5.3.3 De Bruijn graph approach

141

(9)

5.3.3.1 De Bruijn assembler (no sequencing error)

143

(1)

5.3.3.2 De Bruijn assembler (with sequencing errors)

144

(2)

5.3.3.3 How to select k

146

(1)

5.3.3.4 Additional issues of the de Bruijn graph approach

147

(3)

5.3.4 Scaffolding

150

(3)

5.3.5 Gap filling

153

(1)

5.4 Genome assembly for long reads

154

(14)

5.4.1 Assemble long reads assuming long reads have a low sequencing error rate

155

(2)

5.4.2 Hybrid approach

157

(4)

5.4.2.1 Use mate-pair reads and long reads to improve the assembly from short reads

160

(1)

5.4.2.2 Use short reads to correct errors in long reads

160

(1)

5.4.3 Long read approach

161

(14)

5.4.3.1 MinHash for all-versus-all pairwise alignment

162

(1)

5.4.3.2 Computing consensus using Falcon Sense

163

(2)

5.4.3.3 Quiver consensus algorithm

165

(3)

5.5 How to evaluate the goodness of an assembly

168

(1)

5.6 Discussion and further reading

168

(2)

5.7 Exercises

170

(5)

6 Single nucleotide variation (SNV) calling

175

(34)

6.1 Introduction

175

(3)

6.1.1 What are SNVs and small indels?

175

(3)

6.1.2 Somatic and germline mutations

178

(1)

6.2 Determine variations by resequencing

178

(2)

6.2.1 Exome/Targeted sequencing

179

(1)

6.2.2 Detection of somatic and germline variations

180

(1)

6.3 Single locus SNV calling

180

(7)

6.3.1 Identifying SNVs by counting alleles

181

(1)

6.3.2 Identify SNVs by binomial distribution

182

(2)

6.3.3 Identify SNVs by Poisson-binomial distribution

184

(1)

6.3.4 Identifying SNVs by the Bayesian approach

185

(2)

6.4 Single locus somatic SNV calling

187

(5)

6.4.1 Identify somatic SNVs by the Fisher exact test

187

(1)

6.4.2 Identify somatic SNVs by verifying that the SNVs appear in the tumor only

188

(11)

6.4.2.1 Identify SNVs in the tumor sample by posterior odds ratio

188

(3)

6.4.2.2 Verify if an SNV is somatic by the posterior odds ratio

191

(1)

6.5 General pipeline for calling SNVs

192

(1)

6.6 Local realignment

193

(2)

6.7 Duplicate read marking

195

(1)

6.8 Base quality score recalibration

195

(3)

6.9 Rule-based filtering

198

(1)

6.10 Computational methods to identify small indels

199

(5)

6.10.1 Split-read approach

199

(1)

6.10.2 Span distribution-based clustering approach

200

(3)

6.10.3 Local assembly approach

203

(1)

6.11 Correctness of existing SNV and indel callers

204

(1)

6.12 Further reading

205

(1)

6.13 Exercises

206

(3)

7 Structural variation calling

209

(36)

7.1 Introduction

209

(2)

7.2 Formation of SVs

211

(3)

7.3 Clinical effects of structural variations

214

(1)

7.4 Methods for determining structural variations

215

(2)

7.5 CNV calling

217

(5)

7.5.1 Computing the raw read count

218

(1)

7.5.2 Normalize the read counts

219

(1)

7.5.3 Segmentation

219

(3)

7.6 SV calling pipeline

222

(1)

7.6.1 Insert size estimation

222

(1)

7.7 Classifying the paired-end read alignments

223

(3)

7.8 Identifying candidate SVs from paired-end reads

226

(13)

7.8.1 Clustering approach

227

(9)

7.8.1.1 Clique-finding approach

228

(1)

7.8.1.2 Confidence interval overlapping approach

229

(4)

7.8.1.3 Set cover approach

233

(3)

7.8.1.4 Performance of the clustering approach

236

(1)

7.8.2 Split-mapping approach

236

(1)

7.8.3 Assembly approach

237

(1)

7.8.4 hybrid approach

238

(1)

7.9 Verify the SVs

239

(3)

7.10 Further reading

242

(1)

7.11 Exercises

242

(3)

8 RNA-seq

245

(26)

8.1 Introduction

245

(2)

8.2 High-throughput methods to study the transcriptome

247

(1)

8.3 Application of RNA-seq

248

(2)

8.4 Computational Problems of RNA-seq

250

(1)

8.5 RNA-seq read mapping

250

(10)

8.5.1 Features used in RNA-seq read mapping

250

(3)

8.5.1.1 Transcript model

250

(2)

8.5.1.2 Splice junction signals

252

(1)

8.5.2 Exon-first approach

253

(3)

8.5.3 Seed-and-extend approach

256

(4)

8.6 Construction of isoforms

260

(1)

8.7 Estimating expression level of each transcript

261

(7)

8.7.1 Estimating transcript abundances when every read maps to exactly one transcript

261

(3)

8.7.2 Estimating transcript abundances when a read maps to multiple isoforms

264

(2)

8.7.3 Estimating gene abundance

266

(2)

8.8 Summary and further reading

268

(1)

8.9 Exercises

268

(3)

9 Peak calling methods

271

(18)

9.1 Introduction

271

(1)

9.2 Techniques that generate density-based datasets

271

(3)

9.2.1 Protein DNA interaction

271

(2)

9.2.2 Epigenetics of our genome

273

(1)

9.2.3 Open chromatin

274

(1)

9.3 Peak calling methods

274

(11)

9.3.1 Model fragment length

276

(3)

9.3.2 Modeling noise using a control library

279

(1)

9.3.3 Noise in the sample library

280

(1)

9.3.4 Determination if a peak is significant

281

(2)

9.3.5 Unannotated high copy number regions

283

(1)

9.3.6 Constructing a signal profile by Kernel methods

284

(1)

9.4 Sequencing depth of the ChIP-seq libraries

285

(1)

9.5 Further reading

286

(1)

9.6 Exercises

287

(2)

10 Data compression techniques used in NGS files

289

(18)

10.1 Introduction

289

(1)

10.2 Strategies for compressing fasta/fastq files

290

(1)

10.3 Techniques to compress identifiers

290

(1)

10.4 Techniques to compress DNA bases

291

(8)

10.4.1 Statistical-based approach

291

(1)

10.4.2 BWT-based approach

292

(3)

10.4.3 Reference-based approach

295

(2)

10.4.4 Assembly-based approach

297

(2)

10.5 Quality score compression methods

299

(3)

10.5.1 Lossless compression

300

(1)

10.5.2 Lossy compression

301

(1)

10.6 Compression of other NGS data

302

(2)

10.7 Exercises

304

(3)

References

307

(32)

Index

339

Wing-Kin Sung

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97814987529856e.html

Märksõnad:

E-raamat: Algorithms for Next-Generation Sequencing

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv