Medical variations Gabor T Marth Boston College Biology
Medical variations Gabor T. Marth Boston College Biology Department BI 543 Fall 2013 February 5, 2013
Medical variations
Phenotypic effects are often caused by genetic variants
Many SNPs have phenotypic effects Some notable genetic diseases: cystic fibrosis (Mendelian recessive) sickle-cell anemia (Mendelian recessive) Badano and Katsanis, NRG 2002
Genetic variants may affect drug metabolism: Pharmacogenetics Evans and Relling, Science 1999
Genetic variants in Pharmacogenetics Evans and Rellig, Science 1999
Finding variants that cause genetic disease
Population genetics 101 • sequence variations are the result of mutation events • mutations are propagated down through generations TAAAAAT TAACAAT MRCA TAAAAAT • and determine present-day variation patterns TAACAAT
Mendelian diseases have simple inheritance genotype inheritance Mendelian diseases have simple relationship between genotype + phenotype inheritance
Linkage analysis compares the transmission of marker genotype and phenotype in families Sequence regions of the genome to determine which loci are linked with the trait. Works well for Mendelian diseases
However, some diseases have complex inheritance A) Multiple genes may influence the trait. B) E. g. retinitis pigmentosa requires heterozygosity for two genes. Badano and Katsanis, NRG 2002
Population genetics continued… accgttatgtaga acggttatgtaga acggttatgtaga accgttatgtaga • because of recombination, DNA sequences may not have a unique common ancestor, hence phylogenetic analysis may not apply
Genetic mapping
Allelic association (linkage disequilibrium, LD) • allelic association is the nonrandom assortment between alleles i. e. it measures how well knowledge of the allele state at one site permits prediction at another marker site functional site • significant allelic association between a marker and a functional site permits localization (mapping) even without having the functional site in our collection • allelic association, and the use of genetic markers is the basis for mapping functional alleles
Case-control association testing • genotyping cases and controls at various polymorphisms clinical cases • searching for markers with “significant” marker allele frequency differences between cases and controls; these marker signify regions of possible causative alleles AF(controls) clinical controls AF(cases)
Genome-wide scans for human diseases SNPs in Complement Factor H (CFH) gene are associated with Age-related Macular Degeneration (AMD) Klein et al, Science 2005
Where is the missing heritability of disease? Manolio et al. Nature 2009
Variant discovery in population sequencing data
Intro • International project to construct a foundational data set for human genetics – Discover virtually all common human variations by investigating many genomes at the base pair level – Consortium with multiple centers, platforms, funders • Aims • Discover population level human genetic variations of all types (95% of variation > 1% frequency) • Define haplotype structure in the human genome • Develop sequence analysis methods, tools, and other reagents that can be transferred to other sequencing projects
1000 Genomes Project Populations EUROPE IBS CEU FIN GBR TSI AMERICAS EAST ASIA Great Britain Utah, USA Colorado, USA MXL PUR CHB Finland ASW Los Angeles, USA ACB CLM Italy Spain Southwest, USA Houston, USA The Gambia Puerto Rico Sierra, Leone Barbados Nigeria Medellín, Colombia Beijing, Tokyo, Japan China Pakistan Yunnan, Hunan & Fujian, China Bangladesh Vietnam Kenya JPT CHS CDX KHV Lima, Peru PEL GWD MSL ESN YRI GIH LWK ITU PJL BEB SOUTH ASIA AFRICA International Hap. Map Population STU Hap. Map 3 Population New 1000 Genomes Population ~2, 500 samples representing all continents
Sequencing strategies Deep-coverage whole-exome data Low-coverage whole-genome data
1000 Genome Project variants
We know 99% of SNP variants in any individual Date Fraction not in db. SNP February, 2000 98% February, 2001 80% April, 2008 10% February, 2011 2% May 2011 1% 38 M SNPs are known as of Phase 1 of the 1000 Genomes Project Ryan Poplin, David Altshuler
Newly discovered SNPs are mostly rare 12 M number of sites 10 M 8 M 6 M 4 M (Ryan Poplin) 2 M 0 0. 001 0. 1 1. 0 frequency of alternate allele
Deep exome vs. low-cov. WG sequencing
Properties of low-frequency variation
Rare SNPs enriched for functional variants
Challenges for finding rare disease variants Bansal et al. NRG 2010
Concepts for method development Bansal et al. NRG 2010
Concepts for method development Bansal et al. NRG 2010
A rare variant predictor (VAAST) Yandell et al. GR 2011 • • Instead of individual variants, use a larger unit for comparison e. g. a gene Weight predicted impact of variant (e. g. non-synonymous change, large allele frequency difference etc. )
Systems bringing high-res genetic knowledge to the “bedside”
- Slides: 32