High Throughput Sequencing Methods and Concepts Cedric Notredame
- Slides: 52
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S. M Brown
DNA Sequencing • The final essential tool in the molecular biology toolkit is the ability to read the base sequence of DNA molecules • Fred Sanger developed an elegant method to sequence DNA by using DNA polymerase enzyme • (for which he was awarded the Nobel Prize in 1980) • The Sanger method copies a piece of cloned DNA but some of the copies are halted at each base pair along the sequence.
Sanger Method • DNA polymerase adds free nucleotides to a primer which is complementary DNA template. • Sanger used some modified dideoxynucleotides to stop the replication process if they are incorporated in the growing DNA chain (terminators). • This produces a set of partial DNA copies of the original template sequence, each one stopping at a different base. • Sanger used 4 different reactions that each contained only terminators for one of the bases. • When the partial copies are sorted by size using electrophoresis, all fragment of a distinct size are terminated with the same base.
Automated Sequencing • Sequencing technology was improved in the late 1980 s by Leroy Hood who developed fluorescent color labels for the 4 terminator nucleotide bases. • This allowed all 4 bases to be sequenced in a single reaction and sorted in a single gel lane. • Hood also pioneered direct data collection by computer. • Minor improvements in this technology now enable the sequencing of billion base genomes in a year or less.
Automated sequencing machines, particularly those made by PE Applied Biosystems, use 4 colors, so they can read all 4 bases at once.
DNA Sequencing capability has grown exponentially DNA sequences in Gen. Bank Doubling time = 18 months
Next Generation Sequencing • 454 Life Sciences/Roche – Genome Sequencer FLX: currently produces 400 -600 million bases per day per machine – Published 1 million bases of Neanderthal DNA in 2006 – May 2007 published complete genome of James Watson (3. 2 billion bases ~20 x coverage) • Solexa/Illumina – 10 GB per machine/week – May 2008 published complete genomes for 3 hapmap subjects (14 x coverage) • ABI SOLID – 20 GB per machine/week
“Paradigm Shift” • Standard ABI “Sanger” sequencing – 96 samples/day – Read length ~650 bp – Total = 450, 000 bases of sequence data • 454 was the game changer! – ~400, 000 different templates (reads)/day – Read length ~250 bp – Total = 100, 000 bases of sequence data!!!
Solexa ups the Game • Solexa (Illumina GA) – 60, 000 different sequence templates (yes that is an insane 60 million reads) – 36 bp read length – 4 billion bases of DNA per run (3 days)
Nanotechnology • Each system works differently, but they are all based on a similar principals: – Shear target DNA into small pieces – bind individual DNA molecules to a solid surface, – amplify each molecule into a cluster – copy one base at a time and detect different signals for A, C, T, & G bases – requires very precise high-resolution imaging of tiny features • (Solexa has 800 images @ 4 megapixels each)
One (of 800) tiles on Solexa Sequencer
Huge Amount of Image Data • The raw image data is truly huge: 1 -2 TB for the Solexa, more for ABI-SOLID, less for 454 • The images are immediately processed into intensity data (spots w/ location and brightness) • Intensity data is then processed into basecalls (A, C, T, or G plus a quality score for each) • Basecall data is on the order of 5 -10 GB per run (or a week of runs for 454).
454 • First high-throughput DNA sequencer, commercially available in 2004 • Now (10/08) produces ~500 MB reads of 500 bp • Run of 8 samples in 10 hours, so can do multiple runs/week • Uses pyrosquencing, beads, and a microtiter plate • Low error rate, but insert/delete problems with homopolymers (stretches of a single base)
Illumina Genome Analyzer • Originally developed by Solexa, now subsidiary of Illumina. • Commercially available in 2006 • Now produces 8 -12 million reads per sample of 36 bp length = 10 GB/week. • Run takes 3 days for 7 samples. • Low error rate, mostly base changes, few indels
Illumina sequencing technology in 12 steps Source: http: //www. illumina. com/downloads/SS_DNAsequencing. pdf
1. Prepare genomic DNA 2. Attach DNA to surface DNA 3. Bridge amplification adapters 4. Fragments become double stranded 5. Denature the doublestranded molecules 6. Complete amplification Randomly fragment genomic DNA and ligate adapters to both ends of the fragments
adapter DNA fragment 1. Prepare genomic DNA 2. Attach DNA to surface dense lawn of primers adapter 3. Bridge amplification 4. Fragments become double stranded 5. Denature the doublestranded molecules 6. Complete amplification Bind single-stranded fragments randomly to the inside surface of the flow cell channels
1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the doublestranded molecules 6. Complete amplification Add unlabeled nucleotides and enzyme to initiate solid-phase bridge amplification
1. Prepare genomic DNA 2. Attach DNA to surface Attached terminus free terminus Attached terminus 3. Bridge amplification 4. Fragments become double stranded 5. Denature the doublestranded molecules 6. Complete amplification The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate
1. Prepare genomic DNA 2. Attach DNA to surface Attached 3. Bridge amplification 4. Fragments become double stranded 5. Denature the doublestranded molecules 6. Complete amplification Denaturation leaves singlestranded templates anchored to the substrate
1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded Clusters 5. Denature the doublestranded molecules 6. Complete amplification Several million dense clusters of double-stranded DNA are generated in each channel of the flow cell
7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles Laser The first sequencing cycle begins by adding four labeled reversible terminators, primers, and DNA polymerase 12. Align data
7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data After laser excitation, the emitted fluorescence from each cluster is captured and the first base is identified
7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles Laser The next cycle repeats the incorporation of four labeled reversible terminators, primers, and DNA polymerase 12. Align data
7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data After laser excitation the image is captured as before, and the identity of the second base is recorded.
7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data The sequencing cycles are repeated to determine the sequence of bases in a fragment, one base at a time.
Reference sequence 7. Determine first base 8. Image first base 9. Determine second base Unknown variant identified and called Known SNP called 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data The data are aligned and compared to a reference, and sequencing differences are identified.
Illumina Genome Analyzer Richard K. Wilson
Paired-End Sequencing Nature Methods 5, May 2008
Sequencing OH OH Denaturation and Hybridization Sequencing First Read Denaturation and De-Protection OH Resynthesis of P 5 Strand (15 Cycles) OH Sequencing Second Read Denaturation and Hybridization Block with dd. NTPs P 7 Linearization
ABI-SOLID • First commercially available in late 2007 • Currently capable of producing 20 GB of data per run (week) • Most users generate 6 GB/run • Reads ~30 bp long • Uses unique sequence-by-ligation method • “color-space” data • Very low error rate
Short Reads • Short reads from Nex-Gen machines are a challenge (Solexa = 36 bp) – Very hard to assemble whole genomes – Difficult to get any information on repeat regions • Requires many-fold coverage • New algorithms needed for many traditional bioinformatics operations • Reads are getting longer – another moving target
Pac. Bio • High throughput Single Molecule Real Time (SMRT) Sequencing
Pac. Bio • High throughput Single Molecule Real Time (SMRT) Sequencing
Pac. Bio
Pac. Bio
Pac. Bio www. pacificbiosciences. com/
Applications • “If you build it, they will come. ” • An explosion of scientific innovation! • Every new technology enables new applications, which are not directly foreseen by the original developers of the tech. • Cheap access to high-volume sequencing becomes a data collection method for many different types of experimental applications
When All You Have is a Hammer, All Problems Look Like Nails Mark Twains
Applications
- Daan speth
- Perkin elmer high content screening
- High throughput phenotyping
- High throughput screening
- High throughput satellite
- Helioscope sequencing
- Olga vinnere pettersson
- Next generation sequencing methods
- Cédric barrey âge
- Cedric charvet
- Duscheneau
- Lord cedric
- Throughput and bandwidth
- Metal coping fpd
- Throughput formula
- Cisco 3945 throughput
- Fty vs rty
- Rolled throughput yield formula
- Throughput vs bandwidth
- Throughput vs bandwidth
- Input throughput output voorbeeld
- Throughput vs goodput
- Throughput costing
- Rolled throughput yield vs first pass yield
- Throughput time formula
- Aggregate throughput
- Learner throughput rates
- Average throughput
- Throughput model pmo
- Throughput vs latency
- Patient throughput definition
- Cisco sbtg
- Average throughput
- Throughput formula
- The throughput billing of cosmosdb is based on
- Can far memory improve job throughput
- Throughput yield
- Berkeley
- Edelman award
- Throughput flow rate
- Process capacity analysis
- Sequencing selection and iteration
- Micro program sequencer
- Scheduling rules operations management
- Difference between ngs and sanger sequencing
- Conditional and iterative statements
- Repetition pseudocode example
- Sequencing strategies and tactics
- Cloning and sequencing explorer series
- Sequencing batch reactor advantages and disadvantages
- Directive supportive leadership
- High directive and low supportive behavior
- Ap csp sequencing