Short Read Sequencing Analysis Workshop Day 1 Considerations

  • Slides: 26
Download presentation
Short Read Sequencing Analysis Workshop Day 1 Considerations for Sequencing

Short Read Sequencing Analysis Workshop Day 1 Considerations for Sequencing

Different types of sequencing libraries • • Whole genome sequencing RNA Sequencing/GRO-Seq Ch. IP-seq

Different types of sequencing libraries • • Whole genome sequencing RNA Sequencing/GRO-Seq Ch. IP-seq DNAse 1, ATAC-seq Exome sequencing Methyl-Seq Metagenomic/Amplicon (low diversity)

Platform Comparsion

Platform Comparsion

Platform Comparison Mini. Seq Mi. Seq Next. Seq Hi. Seq 2500 Hi. Seq 3000/4000

Platform Comparison Mini. Seq Mi. Seq Next. Seq Hi. Seq 2500 Hi. Seq 3000/4000 Hi. Seq X Output per run 1. 65 Gb – 7. 5 Gb 0. 5 Gb – 15 Gb 16 Gb – 120 Gb 9 Gb – 500 Gb 105 Gb – 750 Gb 800 Gb – 900 Gb Reads per run 7 M – 25 M 12 M – 25 M 130 M – 400 M 300 M – 4 B 2. 1 M – 2. 5 B 2. 6 B – 3 B Max read length 2 x 150 2 x 300 2 x 150 2 x 250 2 x 150 Time per run 7 h – 24 h 5 h – 56 h 11 h – 30 h 7 h – 6 d 1 d – 3. 5 d <3 d 2 color/4 color 2 color 4 color Flowcell PE PE PE SR / PE Pattern Samples/FC 1 1 1 2 or 8 8 8

How does Illumina sequencing work? Library generation and affixing library to flow cell http:

How does Illumina sequencing work? Library generation and affixing library to flow cell http: //bitesizebio. com/13546/sequencing-bysynthesis-explaining-the-illumina-sequencingtechnology/

How does Illumina sequencing work? Cluster Generation

How does Illumina sequencing work? Cluster Generation

How does Illumina sequencing work? Sequencing by synthesis with reversible terminators

How does Illumina sequencing work? Sequencing by synthesis with reversible terminators

How does Illumina sequencing work?

How does Illumina sequencing work?

Output: Millions of short read sequences Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… ATGCGTGCTGCAGTGCCAC… ATCGACGGTTAACTGATCG…

Output: Millions of short read sequences Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… ATGCGTGCTGCAGTGCCAC… ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… ATGCGTGCTGCAGTGCCAC… Index Read 1 (i 7) Index Read 2 (i 5) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTA TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTC ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC CAACGTTC ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC CAACGTTC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… TGACCATTGGGTACAACCC… CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… TGACCATTGGGTACAACCC…

Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… ATGCGTGCTGCAGTGCCAC… Index Read 1 (i 7) Index

Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… ATGCGTGCTGCAGTGCCAC… Index Read 1 (i 7) Index Read 2 (i 5) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTA TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTC ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC CAACGTTC Current Illumina kits allow up to 384 unique indexes to be pooled Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… TGACCATTGGGTACAACCC…

Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… ATGCGTGCTGCAGTGCCAC… Index Read 1 (i 7) Index

Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… ATGCGTGCTGCAGTGCCAC… Index Read 1 (i 7) Index Read 2 (i 5) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTA ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC CAACGTTC Sample 1 Read 1 ATCGACGGTTAACTGATCG… CGTGGACCAAATGGCACAT… Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… TGACCATTGGGTACAACCC… Sample 3 Read 2 CTGGTGACAACTGATGCTT… CCAGTGAACGTGAGCAAGT… Read 1 CTGTGAAACAATTGGGGAT… Sample 2 Read 1 ATGCGTGCTGCAGTGCCAC… Read 2 TGACCATTGGGTACAACCC… Read 2 GGTTGACCATTGGGGTGAC…

What to do with the data? Variant Calling Assembly Short Read Sequencing Quality Metrics

What to do with the data? Variant Calling Assembly Short Read Sequencing Quality Metrics & Trimming Align to reference genome Expression/Read Depth Alternative splicing Metagenomics Peak/Region identification

Quality Assessment & Trimming • Pinpoint problems with library prep/sequencing • Identify possible biases

Quality Assessment & Trimming • Pinpoint problems with library prep/sequencing • Identify possible biases • Improve mapping through trimming

Align to reference genome Reference Chr 1 1000 -2500 Sample 1 reads Sample 3

Align to reference genome Reference Chr 1 1000 -2500 Sample 1 reads Sample 3 reads Sample 2 reads Bowtie 2 Tophat 2 BWA

Variant Calling Reference A C C C Chr 1 1000 -2500

Variant Calling Reference A C C C Chr 1 1000 -2500

Differential Expression Reference Chr 1 1000 -2500

Differential Expression Reference Chr 1 1000 -2500

Alternative Splicing

Alternative Splicing

Peak/Region identification Reference Chr 1 1000 -2500 Peak

Peak/Region identification Reference Chr 1 1000 -2500 Peak

Experimental Design considerations • • • Genome Size Read Length Sequencing Depth # of

Experimental Design considerations • • • Genome Size Read Length Sequencing Depth # of Replicates Single-end vs. Paired-end Insert Size

Coverage & Read-depth • Coverage = estimate of average number of reads covering a

Coverage & Read-depth • Coverage = estimate of average number of reads covering a single base Avg Coverage = (# reads) x (read length) size of genome Reference D p E Pt Th H

Typical Coverage Requirements • DNA-Resequencing (SNPs, small indels) – 30 X with paired-end reads

Typical Coverage Requirements • DNA-Resequencing (SNPs, small indels) – 30 X with paired-end reads • De novo DNA-Seq – 100 X minimum, longest paired-end, multiple insert size runs • Exome – 100 -200 X of the exome

What that means in reads. . . • 30 X Coverage with 2 x

What that means in reads. . . • 30 X Coverage with 2 x 150 bp reads – For E. coli, ~4. 6 Mb • 138 Mbp, 0. 46 Million reads • ~3% of a Mi. Seq run – For Human, ~3. 2 Gb • 96 Gbp, 320 Million reads • 80% of a Next. Seq High Output run or 1. 3 lanes of Hi. Seq 2500 run

RNA-Seq Requirements • Can’t use coverage as a measure • Differential Expression (highly expressed)

RNA-Seq Requirements • Can’t use coverage as a measure • Differential Expression (highly expressed) – Small genomes: 5 Million reads – Large genomes: 10 -30 Million reads • De novo Assembly/DE (lowly expressed) – Small genomes: 30 -65 Million reads – Large genomes: 100 -200 Million reads ***For RNA-Seq, replicates typically more powerful than read depth, read length

Which Sequencer should I use? • Mi. Seq – – • 15 -25 M

Which Sequencer should I use? • Mi. Seq – – • 15 -25 M reads/run 8 h – 4 days/run 1 x 50 to 2 x 300 $$$/bp Next. Seq – – 130 -400 M reads/run 12 – 30 h/run 1 x 75 to 2 x 150 $$/bp • Hi. Seq 2500 – – • Hi. Seq 4000 – – • 250 M reads/lane, 8 lanes/run 7 h – 3 d/run 1 x 36 to 2 x 125 $$/bp 312 M reads/lane, 8 lanes/run 1 – 3. 5 d/run 1 x 50 to 2 x 150 $/bp Hi. Seq X Ten – – 350 M reads/lane, 8 lanes/run 3 d/run 2 x 150 $/bp BUT minimums on orders

Other considerations • • • Base diversity (at each position) Custom versus kitted libraries

Other considerations • • • Base diversity (at each position) Custom versus kitted libraries – kit biases PCR/PCR-free libraries How unique is the run-type you want Queue times/Data delivery times Many more. .

Questions?

Questions?