Muutke küpsiste eelistusi

E-raamat: Algorithms for Next-Generation Sequencing

(National University of Singapore)
  • Formaat - EPUB+DRM
  • Hind: 58,49 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Advances in sequencing technology have allowed scientists to study the human genome in greater depth and on a larger scale than ever before as many as hundreds of millions of short reads in the course of a few days. But what are the best ways to deal with this flood of data?

Algorithms for Next-Generation Sequencing is an invaluable tool for students and researchers in bioinformatics and computational biology, biologists seeking to process and manage the data generated by next-generation sequencing, and as a textbook or a self-study resource. In addition to offering an in-depth description of the algorithms for processing sequencing data, it also presents useful case studies describing the applications of this technology.

Arvustused

"With every advance in sequencing technology, existing string algorithms are yet again at their limits and need to be developed further. As a result, a book like this one is sorely needed. It lays out the concepts and approaches both for practical data analysis and for developing algorithms powerful enough to deal with the deluge of sequence data." Martin Vingron, Max Planck Institute for Molecular Genetics

Preface xi
1 Introduction 1(20)
1.1 DNA, RNA, protein and cells
1(2)
1.2 Sequencing technologies
3(1)
1.3 First-generation sequencing
4(2)
1.4 Second-generation sequencing
6(6)
1.4.1 Template preparation
6(1)
1.4.2 Base calling
7(1)
1.4.3 Polymerase-mediated methods based on reversible terminator nucleotides
7(3)
1.4.4 Polymerase-mediated methods based on unmodified nucleotides
10(1)
1.4.5 Ligase-mediated method
11(1)
1.5 Third-generation sequencing
12(4)
1.5.1 Single-molecule real-time sequencing
12(1)
1.5.2 Nanopore sequencing method
13(2)
1.5.3 Direct imaging of DNA using electron microscopy
15(1)
1.6 Comparison of the three generations of sequencing
16(1)
1.7 Applications of sequencing
17(2)
1.8 Summary and further reading
19(1)
1.9 Exercises
19(2)
2 NGS file formats 21(14)
2.1 Introduction
21(1)
2.2 Raw data files: fasta and fastq
22(2)
2.3 Alignment files: SAM and BAM
24(3)
2.3.1 FLAG
26(1)
2.3.2 CIGAR string
26(1)
2.4 Bed format
27(2)
2.5 Variant Call Format (VCF)
29(2)
2.6 Format for representing density data
31(2)
2.7 Exercises
33(2)
3 Related algorithms and data structures 35(34)
3.1 Introduction
35(1)
3.2 Recursion and dynamic programming
35(3)
3.2.1 Key searching problem
36(1)
3.2.2 Edit-distance problem
37(1)
3.3 Parameter estimation
38(5)
3.3.1 Maximum likelihood
39(1)
3.3.2 Unobserved variable and EM algorithm
40(3)
3.4 Hash data structures
43(6)
3.4.1 Maintain an associative array by simple hashing
43(2)
3.4.2 Maintain a set using a Bloom filter
45(1)
3.4.3 Maintain a multiset using a counting Bloom filter
46(1)
3.4.4 Estimating the similarity of two sets using minHash
47(2)
3.5 Full-text index
49(9)
3.5.1 Suffix trie and suffix tree
49(1)
3.5.2 Suffix array
50(1)
3.5.3 FM-index
51(4)
3.5.3.1 Inverting the BWT B to the original text T
53(1)
3.5.3.2 Simulate a suffix array using the FM-index
54(1)
3.5.3.3 Pattern matching
55(1)
3.5.4 Simulate a suffix trie using the FM-index
55(1)
3.5.5 Bi-directional BWT
56(2)
3.6 Data compression techniques
58(8)
3.6.1 Data compression and entropy
58(1)
3.6.2 Unary, gamma, and delta coding
59(1)
3.6.3 Golomb code
60(1)
3.6.4 Huffman coding
60(2)
3.6.5 Arithmetic code
62(2)
3.6.6 Order-k Markov Chain
64(1)
3.6.7 Run-length encoding
65(1)
3.7 Exercises
66(3)
4 NGS read mapping 69(54)
4.1 Introduction
69(1)
4.2 Overview of the read mapping problem
70(6)
4.2.1 Mapping reads with no quality score
70(1)
4.2.2 Mapping reads with a quality score
71(1)
4.2.3 Brute-force solution
72(2)
4.2.4 Mapping quality
74(1)
4.2.5 Challenges
75(1)
4.3 Align reads allowing a small number of mismatches
76(21)
4.3.1 Mismatch seed hashing approach
77(1)
4.3.2 Read hashing with a spaced seed
78(4)
4.3.3 Reference hashing approach
82(2)
4.3.4 Suffix trie-based approaches
84(13)
4.3.4.1 Estimating the lower bound of the number of mismatches
87(2)
4.3.4.2 Divide and conquer with the enhanced pigeon- hole principle
89(3)
4.3.4.3 Aligning a set of reads together
92(2)
4.3.4.4 Speed up utilizing the quality score
94(3)
4.4 Aligning reads allowing a small number of mismatches and indels
97(8)
4.4.1 q-mer approach
97(2)
4.4.2 Computing alignment using a suffix trie
99(6)
4.4.2.1 Computing the edit distance using a suffix trie
100(3)
4.4.2.2 Local alignment using a suffix trie
103(2)
4.5 Aligning reads in general
105(11)
4.5.1 Seed-and-extension approach
107(7)
4.5.1.1 BWA-SW
108(1)
4.5.1.2 Bowtie 2
109(1)
4.5.1.3 Bat-Align
110(1)
4.5.1.4 Cushaw2
111(1)
4.5.1.5 BWA-MEM
112(1)
4.5.1.6 LAST
113(1)
4.5.2 Filter-based approach
114(2)
4.6 Paired-end alignment
116(1)
4.7 Further reading
117(1)
4.8 Exercises
118(5)
5 Genome assembly 123(52)
5.1 Introduction
123(1)
5.2 Whole genome shotgun sequencing
124(2)
5.2.1 Whole genome sequencing
124(2)
5.2.2 Mate-pair sequencing
126(1)
5.3 De novo genome assembly for short reads
126(28)
5.3.1 Read error correction
128(10)
5.3.1.1 Spectral alignment problem (SAP)
129(4)
5.3.1.2 k-mer counting
133(5)
5.3.2 Base-by-base extension approach
138(3)
5.3.3 De Bruijn graph approach
141(9)
5.3.3.1 De Bruijn assembler (no sequencing error)
143(1)
5.3.3.2 De Bruijn assembler (with sequencing errors)
144(2)
5.3.3.3 How to select k
146(1)
5.3.3.4 Additional issues of the de Bruijn graph approach
147(3)
5.3.4 Scaffolding
150(3)
5.3.5 Gap filling
153(1)
5.4 Genome assembly for long reads
154(14)
5.4.1 Assemble long reads assuming long reads have a low sequencing error rate
155(2)
5.4.2 Hybrid approach
157(4)
5.4.2.1 Use mate-pair reads and long reads to improve the assembly from short reads
160(1)
5.4.2.2 Use short reads to correct errors in long reads
160(1)
5.4.3 Long read approach
161(14)
5.4.3.1 MinHash for all-versus-all pairwise alignment
162(1)
5.4.3.2 Computing consensus using Falcon Sense
163(2)
5.4.3.3 Quiver consensus algorithm
165(3)
5.5 How to evaluate the goodness of an assembly
168(1)
5.6 Discussion and further reading
168(2)
5.7 Exercises
170(5)
6 Single nucleotide variation (SNV) calling 175(34)
6.1 Introduction
175(3)
6.1.1 What are SNVs and small indels?
175(3)
6.1.2 Somatic and germline mutations
178(1)
6.2 Determine variations by resequencing
178(2)
6.2.1 Exome/Targeted sequencing
179(1)
6.2.2 Detection of somatic and germline variations
180(1)
6.3 Single locus SNV calling
180(7)
6.3.1 Identifying SNVs by counting alleles
181(1)
6.3.2 Identify SNVs by binomial distribution
182(2)
6.3.3 Identify SNVs by Poisson-binomial distribution
184(1)
6.3.4 Identifying SNVs by the Bayesian approach
185(2)
6.4 Single locus somatic SNV calling
187(5)
6.4.1 Identify somatic SNVs by the Fisher exact test
187(1)
6.4.2 Identify somatic SNVs by verifying that the SNVs appear in the tumor only
188(11)
6.4.2.1 Identify SNVs in the tumor sample by posterior odds ratio
188(3)
6.4.2.2 Verify if an SNV is somatic by the posterior odds ratio
191(1)
6.5 General pipeline for calling SNVs
192(1)
6.6 Local realignment
193(2)
6.7 Duplicate read marking
195(1)
6.8 Base quality score recalibration
195(3)
6.9 Rule-based filtering
198(1)
6.10 Computational methods to identify small indels
199(5)
6.10.1 Split-read approach
199(1)
6.10.2 Span distribution-based clustering approach
200(3)
6.10.3 Local assembly approach
203(1)
6.11 Correctness of existing SNV and indel callers
204(1)
6.12 Further reading
205(1)
6.13 Exercises
206(3)
7 Structural variation calling 209(36)
7.1 Introduction
209(2)
7.2 Formation of SVs
211(3)
7.3 Clinical effects of structural variations
214(1)
7.4 Methods for determining structural variations
215(2)
7.5 CNV calling
217(5)
7.5.1 Computing the raw read count
218(1)
7.5.2 Normalize the read counts
219(1)
7.5.3 Segmentation
219(3)
7.6 SV calling pipeline
222(1)
7.6.1 Insert size estimation
222(1)
7.7 Classifying the paired-end read alignments
223(3)
7.8 Identifying candidate SVs from paired-end reads
226(13)
7.8.1 Clustering approach
227(9)
7.8.1.1 Clique-finding approach
228(1)
7.8.1.2 Confidence interval overlapping approach
229(4)
7.8.1.3 Set cover approach
233(3)
7.8.1.4 Performance of the clustering approach
236(1)
7.8.2 Split-mapping approach
236(1)
7.8.3 Assembly approach
237(1)
7.8.4 hybrid approach
238(1)
7.9 Verify the SVs
239(3)
7.10 Further reading
242(1)
7.11 Exercises
242(3)
8 RNA-seq 245(26)
8.1 Introduction
245(2)
8.2 High-throughput methods to study the transcriptome
247(1)
8.3 Application of RNA-seq
248(2)
8.4 Computational Problems of RNA-seq
250(1)
8.5 RNA-seq read mapping
250(10)
8.5.1 Features used in RNA-seq read mapping
250(3)
8.5.1.1 Transcript model
250(2)
8.5.1.2 Splice junction signals
252(1)
8.5.2 Exon-first approach
253(3)
8.5.3 Seed-and-extend approach
256(4)
8.6 Construction of isoforms
260(1)
8.7 Estimating expression level of each transcript
261(7)
8.7.1 Estimating transcript abundances when every read maps to exactly one transcript
261(3)
8.7.2 Estimating transcript abundances when a read maps to multiple isoforms
264(2)
8.7.3 Estimating gene abundance
266(2)
8.8 Summary and further reading
268(1)
8.9 Exercises
268(3)
9 Peak calling methods 271(18)
9.1 Introduction
271(1)
9.2 Techniques that generate density-based datasets
271(3)
9.2.1 Protein DNA interaction
271(2)
9.2.2 Epigenetics of our genome
273(1)
9.2.3 Open chromatin
274(1)
9.3 Peak calling methods
274(11)
9.3.1 Model fragment length
276(3)
9.3.2 Modeling noise using a control library
279(1)
9.3.3 Noise in the sample library
280(1)
9.3.4 Determination if a peak is significant
281(2)
9.3.5 Unannotated high copy number regions
283(1)
9.3.6 Constructing a signal profile by Kernel methods
284(1)
9.4 Sequencing depth of the ChIP-seq libraries
285(1)
9.5 Further reading
286(1)
9.6 Exercises
287(2)
10 Data compression techniques used in NGS files 289(18)
10.1 Introduction
289(1)
10.2 Strategies for compressing fasta/fastq files
290(1)
10.3 Techniques to compress identifiers
290(1)
10.4 Techniques to compress DNA bases
291(8)
10.4.1 Statistical-based approach
291(1)
10.4.2 BWT-based approach
292(3)
10.4.3 Reference-based approach
295(2)
10.4.4 Assembly-based approach
297(2)
10.5 Quality score compression methods
299(3)
10.5.1 Lossless compression
300(1)
10.5.2 Lossy compression
301(1)
10.6 Compression of other NGS data
302(2)
10.7 Exercises
304(3)
References 307(32)
Index 339
Wing-Kin Sung