Stanford Comprehensive Cancer Center Stanford Genome Technology Center

  • Slides: 62
Download presentation
Stanford Comprehensive Cancer Center Stanford Genome Technology Center An Introduction to Next Generation Sequencing

Stanford Comprehensive Cancer Center Stanford Genome Technology Center An Introduction to Next Generation Sequencing Hanlee Ji, M. D. ã Stanford University

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Overview • Principles of next generation

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Overview • Principles of next generation DNA sequencing • Analysis of genetic variation and research applications

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Advances in DNA sequencing technology M.

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Advances in DNA sequencing technology M. Stratton et al. Nature 458 (2009)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Applications • Identifying genetic variants –

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Applications • Identifying genetic variants – Whole genome – Exome – Subsets • Transcriptomes (e. g. RNASeq) • Chip-seq • Epigenomes (methylation) • Many others!

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing-by-synthesis • Individual DNA molecules from

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing-by-synthesis • Individual DNA molecules from a “sequencing library”. • Sequencing via multiple cycles of nucleotide incorporation. • Solid phase support • High density reads using a photodetector (i. e. CCDS) or solid state system • Images per cycle provides sequence data. J. Shendure and Ji. Nat Biotech (2008)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing-by-ligation Complete Genomics • DNA nanoballs

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing-by-ligation Complete Genomics • DNA nanoballs from circles • Combinatorial probe anchor ligation • 10 base reads adjacent to 8 anchor sites • 31 - to 35 -base matepaired reads Dramanac et al. Science (2010)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Solid state detection of DNA synthesis

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Solid state detection of DNA synthesis “Nanowell” solid-state detection Rothberg et al. Nature (2011)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Single molecule sequencing New technologies •

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Single molecule sequencing New technologies • Single molecule detection • Pacific Biosciences – Sequencing by synthesis – Single base incorporation “nanowell” sequencing-by-synthesis

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Nanopore sequencing • DNA inserted in

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Nanopore sequencing • DNA inserted in a nanopore in lipid membrane • speed control provided by a phi 29 DNA polymerase • Translocation via an electrical field and polymerase DNA sequence via changes in the ionic current

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Issues with next generation DNA sequencing

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Issues with next generation DNA sequencing • Higher sequencing error rates – <0. 1 to 10% or greater depending on sequencing chemistry and configuration • Systematic bias based on approach • Short sequence reads (<250 bases) • Massive data output – Data storage anagement – Variant calling analysis

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Aspects of sequencing next generation sequencing

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Aspects of sequencing next generation sequencing • DNA sequencing library preparation. • Processing of sequence reads • Types of reads (e. g. “mate pairs”) • Alignment – Fold coverage • Assembly • Variant calling

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Overview of the process in whole

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Overview of the process in whole genome sequencing D Koboldt et al. Briefings in Bioinformatics (2010)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing library preparation – 454 system

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing library preparation – 454 system

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing process

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing process

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing data generation and analysis D

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing data generation and analysis D Koboldt et al. Briefings in Bioinformatics (2010)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Quality metrics to improve variant calls

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Quality metrics to improve variant calls • Sequencing fold coverage based on alignment. – Higher fold coverage required in cancer genomes • Elimination of duplicate reads. – Bottlenecks which propagate errors from DNA amplification. • Using high quality base calls – Quality scores 30 or higher • Repeat sequences in genomes. • Significance or confidence values for variants

Stanford Comprehensive Cancer Center Stanford Genome Technology Center DNA sequence data format and visualization

Stanford Comprehensive Cancer Center Stanford Genome Technology Center DNA sequence data format and visualization • Sequence alignment map (SAM) • Viewing “pileups”

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Genetic variation • Point mutations –

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Genetic variation • Point mutations – Nonsynonymous versus synonymous • Insertion / deletions (indels) • Copy number variations (CNVs) • Structural variants (SV) – Intrachromosomal • Large indels • Duplications • Inversions – Interchromosomal • Balanced translocations • Imbalanced translocations

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Single nucleotide variants from cancer genomes

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Single nucleotide variants from cancer genomes P Sohrab et al. Nature, 461 (2010)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Variant callers • Genome Analysis Toolkit

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Variant callers • Genome Analysis Toolkit • Varscan • SAMTools • SNVmix • Others…

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Single nucleotide mutations • Silent =

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Single nucleotide mutations • Silent = synonymous • Substitution = nonsynonymous • Nonsense = premature stop http: //commons. wikimedia. org

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Transitions versus transversion mutations Transition Transversion

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Transitions versus transversion mutations Transition Transversion Transition Transversion • Transition – A <-> G – C <-> T • Transversions – A <-> T – A <-> C – G <-> T – G <-> C Ding et al. Nature (2010)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Small insertion and deletions

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Small insertion and deletions

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Targeting strategies for resequencing genomic subsets

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Targeting strategies for resequencing genomic subsets In-solution capture (e. g. molecular inversion probes) Array-based hybridization capture In-solution Hybridization capture

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Rapid targeted mutation analysis from cancer

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Rapid targeted mutation analysis from cancer genomes a Preparation Sequence Processing Target-specific Data Flow cell oligonucleotide Single-adaptor b library STEP 1 Primer-probe preparation Hybridization, extension and denaturation STEP 2 STEP 3 Target capture Cluster preparation Primer-probe Immobilized Primer ‘C’ Immobilized Primer ‘D’ Sequencing Primer 1 Immobilized DNA Sequencing Primer 2

Stanford Comprehensive Cancer Center Stanford Genome Technology Center “Onconomic” diagnostic mutations analysis • Rapid

Stanford Comprehensive Cancer Center Stanford Genome Technology Center “Onconomic” diagnostic mutations analysis • Rapid mutation for pointof-care analysis • Analysis of identified cancer drivers • Determination of pathogenic mutations • Example, nonsense mutation in SMAD 4 Normal Tumor

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Visualizing sequence 1. 5 Mb region

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Visualizing sequence 1. 5 Mb region on Chromosome 18 SNP genotyping

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Whole genome sequencing M. Stratton et

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Whole genome sequencing M. Stratton et al. Nature, 458 (2009)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center A needle in a human genome

Stanford Comprehensive Cancer Center Stanford Genome Technology Center A needle in a human genome haystack? • A human genome . . GATC. . ERROR. . TTCCAA. . has 23 chromosomes. • 6 billion individual DNA basepairs per genome. • A single basepair error can be a disease mutation. X

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Exome sequencing M. Clark et al.

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Exome sequencing M. Clark et al. Nature Biotechnology (2012)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center A cancer family pedigree AP 43

Stanford Comprehensive Cancer Center Stanford Genome Technology Center A cancer family pedigree AP 43 y/o 42 y/o Male Female No Cancer Colorectal Cancer Colon Polyps

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Assessment of a cancer family –

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Assessment of a cancer family – unaffected versus affected Father Mother AP

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Exome sequencing analysis for identifying inherited

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Exome sequencing analysis for identifying inherited disease AP’s unique family variants 1 2 3 4 5 6 7 8 9 10 11 etc. AP Mother • Identify the variants unique to the affected members. 1 Father 2 3 4 5 6 7 8 9 10 11 etc.

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Interpretation of genetic variants • Substitutions

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Interpretation of genetic variants • Substitutions translation bioinformatically • SIFT - probability that a substitution is tolerated – < 0. 05 is deleterious. • Poly. Phen – categorical definitions – "benign", "possibly damaging" and "probably damaging” • Protein structural mapping Parson et al. , Science, (2008) IDH 1 mapping of Arg 132 cancer mutation

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequence assembly • Assembling fragments of

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequence assembly • Assembling fragments of random sequence to form a set of larger contiguous sequences (contigs). • Used to assemble de novo genomes of new organisms. • Useful for reconstruction regions of high complexity such as SVs. Zerbino DR, Birney E, Genome Research, 18 (2010)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Metagenomic characterization of bacterial flora

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Metagenomic characterization of bacterial flora

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Copy number from genome sequencing •

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Copy number from genome sequencing • Genome shotgun sequencing comparison. • Copy number variation derived directly from sequence reads. • 15 Kb windows with sequence tag counting Campbell et al. , Nature Genetics, (2008)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Copy number variations (CNVs) from genomic

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Copy number variations (CNVs) from genomic sequencing Breast cancer – Chromosome 1 Genomic sequence analysis Array CGH CNV analysis

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Structural variations in human genomes Deletion

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Structural variations in human genomes Deletion Duplication Inversion Intrachromosomal Insertion Translocation Interchromosomal http: //commons. wikimedia. org

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Structural variation Normal 300 nts in

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Structural variation Normal 300 nts in n o Ex 1 i n+ on x E Intact region Tumor 300 nts in n o Ex Deleted region • Mate pair sequences dependent on the genomic DNA insertion size (population).

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Genomic deletion analysis Normal • Breast

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Genomic deletion analysis Normal • Breast cancer genome sequencing. Primary • Mate pair sequences used in indel analysis. Metastasis • Changes in the location of mapped reads that are not concordant Xenograft with the sequencing library insert size. Ding et al, Nature, (2010)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Structural variants from small cell lung

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Structural variants from small cell lung cancer genome Duplication Inversion Campbell et al. , Nature Genetics, (2008)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Translocations in colorectal cancer genomes •

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Translocations in colorectal cancer genomes • Balanced tranlsocations between chr 8 and 20 p arms • Structural changes can only be delineated based on sequencing Bass et al. , Nature Genetics, (2011)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Cancer transcriptome sequencing (RNASeq) • Mate

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Cancer transcriptome sequencing (RNASeq) • Mate pair analysis from prostrate cancer m. RNA • Identification of reads indicating gene fusions. N Palanisamy et al, Nat Med 16 (2007)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequenced cancer genomes – nonsmall cell

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequenced cancer genomes – nonsmall cell lung Tumor coverage 60 X Normal coverage 46 X Mutation rate per Mb 17. 7 Total identified tumor mutations 83, 000 Coding mutations 540 Validated mutations 302 Total identified indels Coding indels 54, 921 253 Total identified structural variants 79 Validated structural variants 43 Lee et al. Nature 465, (2010)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Whole genome analysis of colorectal cancer

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Whole genome analysis of colorectal cancer • Cancer Genome Atlas analysis of colon adenocarcinoma • “Circos” plots of whole genome data

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Gene expression and RNASeq

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Gene expression and RNASeq

Stanford Comprehensive Cancer Center Stanford Genome Technology Center CHIP-Seq

Stanford Comprehensive Cancer Center Stanford Genome Technology Center CHIP-Seq

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Ultrasensitive mutation detection • Robust detection

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Ultrasensitive mutation detection • Robust detection of 1 mutant allele from 1, 000 wildtype alleles in heterogeneous mixtures • Application to viral infections • Analysis of cancer point mutations Flaherty et al. , Nucleic Acids Research, 2012

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Deep resequencing for rare variants •

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Deep resequencing for rare variants • Derived from reassortment of swine and human flu in swine • More than 214 countries in 2009 • More than 622, 482 infections confirmed • 18, 449 deaths confirmed by WHO Smith et al. , Nature 2009

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Oseltamivir resistance mutation in influenza

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Oseltamivir resistance mutation in influenza

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Detection of the oseltamivir resistance mutationc

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Detection of the oseltamivir resistance mutationc • Neuramindase bound to oseltamivir (Tamiflu) • Mutations cluster around sialic acid/oseltamivir binding pocket Collins et al. Nature 2008.

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Phylogenetic tree of H 1 influenza

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Phylogenetic tree of H 1 influenza genomes

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Conclusion • Multiple approaches available for

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Conclusion • Multiple approaches available for analysis of genomes • Scale of sequence data requires extensive computational, bioinformatic and statistical data analysis • Methods, technologies and analysis continue to improve and become simpler.

Stanford Comprehensive Cancer Center Stanford Genome Technology Center

Stanford Comprehensive Cancer Center Stanford Genome Technology Center

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Genetics via large scale sequencing •

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Genetics via large scale sequencing • mutations and other genomic DNA aberrations contribute to neoplastic development • Specific genetic variants and other indicate clinical phenotype • Utility as diagnostics

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Cancer exome survey (Sanger sequencing) TP

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Cancer exome survey (Sanger sequencing) TP 53 • Each row represents a chromosome. • Peaks represent driver mutations Wood et al, Science, 318 (2007)

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Tiers of cancer genome sequencing Cancer

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Tiers of cancer genome sequencing Cancer diagnostic Translational studies ? Genomic Subsets Exomes & Discovery Transcriptomes Complete Human Genomes

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Whole cancer genome sequencing • Pros

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Whole cancer genome sequencing • Pros – Most comprehensive coverage of the genome. – Least bias – most objective analysis – Highest resolution at base pair level – Identification of complex structural variants – Experimentally straightforward… • Cons – Cost (rapidly dropping!) – Rapidly evolution of technologies – Challenging data management and analysis

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sample and genetic complexity of cancer

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sample and genetic complexity of cancer • Sample variability – Normal stroma contamination – Mixtures of variable lineages – Degradation of DNA • Intratumoral genetic heterogeneity – Clonal subpopulations carrying different mutations • Background random mutations (e. g. passengers) • Complex genomic structure

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing cancer genomes – clinical samples

Stanford Comprehensive Cancer Center Stanford Genome Technology Center Sequencing cancer genomes – clinical samples • Type of samples – Cancer cell lines – Xenografts – Primary tumors – Purified cancer cells • Requirement for matched samples – Normal diploid genome

Stanford Comprehensive Cancer Center Stanford Genome Technology Center False positive mutation rates in genome

Stanford Comprehensive Cancer Center Stanford Genome Technology Center False positive mutation rates in genome sequencing • Mutation false positive rate requires high accuracy. – 1 base / 10, 000 error = 300, 000 false mutations – 1 base / 100, 000 error = 30, 000 false mutations • Ideal false positive rate – 1 base / 1, 000 error – ~50% of candidate mutations are correct!