6 0476 878HST 507 Computational Biology Genomes Networks
6. 047/6. 878/HST. 507 Computational Biology: Genomes, Networks, Evolution Lecture 15 Regulatory variation and e. QTLs Chris Cotsapas cotsapas@broadinstitute. org
Module 4: Population / Evolution / Phylogeny • L 15/16: Association mapping for disease and molecular traits – Statistical genetics: disease mapping in populations (Mark Daly) – Quantitative traits and molecular variation: e. QTLs, c. QTLs • L 17/18: Phylogenetics / Phylogenomics – Phylogenetics: Evolutionary models, Tree building, Phylo inference – Phylogenomics: gene/species trees, coalescent models, populations • L 19/20: Human history, Missing heritability – Measuring natural selection in human populations – The missing heritability in genome-wide associations • And done! Last pset Nov 11 (no lab), In-class quiz on Nov 20 – No lab 4! Then entire focus shifts to projects, Thanksgiving, Frontiers
Today: Regulatory variation and e. QTLs 1. Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-e. QTLs – Underlying regulatory variation: e. QTLs, GWAS, cis-e. QTL 2. Finding trans-e. QTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3. Identifying underlying regulatory mechanisms – Cis-e. QTLs: TSS-distance, cell type specificity – e. QTLs vs. GWAS: Expression as intermediate trait 4. Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction
Quantitative traits - weight, height - anything measurable - today: gene expression QTLs (QT Loci) - The loci that control quantitative traits
Regulatory variation • What do trait-associated variants do? • Genetic changes to: – Coding sequence ** – Gene expression levels – Splice isomer levels – Methylation patterns – Chromatin accessibility – Transcription factor binding kinetics – Cell signaling – Protein-protein interactions Regulatory
History, e. QTL, m. QTL, others BASIC CONCEPTS
Within a population • • Damerval et al 1994 42/72 protein levels differ in maize 2 D electrophoresis, eyeball spot quantitation Problems: – genome coverage – quantitation – post-translational modifications • Solution: use expression levels instead!
Usual mapping tools available • Discretization approach
Whole-genome e. QTL analysis is an independent GWAS for expression of each gene 1 gene 2 gene 3 gene 4 gene 5 gene N
Genetics of gene expression (e. QTL) • cis-e. QTL – The position of the e. QTL maps near the physical position of the gene. – Promoter polymorphism? – Insertion/Deletion? – Methylation, chromatin conformation? • trans-e. QTL – The position of the e. QTL does not map near the physical position of the gene. – Regulator? – Direct or indirect? Modified from Cheung and Spielman 2009 Nat Gen
yeast, mouse, maize, human e. QTL – THE ARRAY ERA
Yeast • Brem et al Science 2002 • Linkage in 40 offspring of lab x wild strain cross • 1528/6215 DE between parents • 570 map in cross – multiple QTLs – 32% of 570 have cis linkage • 262 not DE in parents also map
trans hotspots Brem et al Science 2002
Yvert et al Nat Genet 2003
Mammals I • F 2 mice on atherogenic diet • Expression arrays; WG linkage Schadt et al Nature 2003
Mammals II 10% !! Chesler et al Nat Genet 2005
Mammals III • No major trans loci in humans – Cheung et al Nature 2003 – Monks et al AJHG 2004 – Stranger et al PLo. S Genet 2005, Science 2007
Today: Regulatory variation and e. QTLs 1. Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-e. QTLs – Underlying regulatory variation: e. QTLs, GWAS, cis-e. QTL 2. Finding trans-e. QTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3. Identifying underlying regulatory mechanisms – Cis-e. QTLs: TSS-distance, cell type specificity – e. QTLs vs. GWAS: Expression as intermediate trait 4. Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction
Open question WHERE ARE THE TRANS e. QTLS?
Whole-genome e. QTL analysis is an independent GWAS for expression of each gene 1 gene 2 gene 3 gene 4 gene 5 gene N
Issues with trans mapping • Power – Genome-wide significance is 5 e-8 – Multiple testing on ~20 K genes – Sample sizes clearly inadequate • Data structure – Bias corrections deflate variance – Non-normal distributions • Sample sizes – Far too small
But… • Assume that trans e. QTLs affect many genes… • …and you can use cross-trait methods!
Association data Z 1, 1 Z 2, 1 : : Zs, 1 Z 1, 2 … … Z 1, p Zs, p
Cross-phenotype meta-analysis SCPMA ~ L(data | λ≠ 1) L(data | λ=1) Cotsapas et al, PLo. S Genetics
CPMA detects trans mixtures
Open research questions • Do trans effects exist? – Yes – heritability estimates suggest so. – Can we detect them? • Larger cohorts? – Most e. QTL studies ~50 -500 individuals – See later, GTEx Project • Better methods? – Collapsing data? – PCA, summary statistics, modeling?
Today: Regulatory variation and e. QTLs 1. Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-e. QTLs – Underlying regulatory variation: e. QTLs, GWAS, cis-e. QTL 2. Finding trans-e. QTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3. Identifying underlying regulatory mechanisms – Cis-e. QTLs: TSS-distance, cell type specificity – e. QTLs vs. GWAS: Expression as intermediate trait 4. Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction
CAN WE LEARN REGULATORY VARIATION FROM e. QTL?
First, let’s define the question • Can we use genetic perturbations as a way to understand how genes are regulated? • In what groups, in which tissues? • To what stimuli/signaling events? • Do cis e. QTLs perturb promoter elements? • Do trans perturb TFs? Signaling cascades?
Significant associations are symmetrically distributed around TSS Most significant SNP per gene 0. 001 permutation threshold Stranger et al. , PLo. S Gen 2012
69 -80% of cis associations are cell type-specific No. of cell types with gene association Cell type-specific and cell type-shared gene associations (0. 001 permutation threshold) • • 268 271 262 73 85 82 86 86 86 cell type cis association sharing increases slightly when significance thresholds are relaxed Cell type specificity verified experimentally for subset of e. QTLs Dimas et al Science 2009 Slide courtesy Antigone Dimas et al Science 2009
Open research questions • Do cis e. QTLs perturb functional elements? – Given each is independent, how can we know? • Do tissue-specific effects correlate with the expression of a gene across tissues? Or a regulator? – Perhaps a gene is expressed, but in response to different regulators across tissues? • If we ever find trans e. QTLs… – Common regulators of coregulated genes? – Tissue specificity? – Mechanisms?
Candidate genes, perturbations underlying organismal phenotypes APPLICATION TO GWAS
e. QTLs as intermediate traits Schadt et al Nat Genet 2005
Exploring e. QTLs in the relevant cell type is important for disease association studies relevant cell type for disease cell type not relevant for disease Importance of cataloguing regulatory variation in multiple cell types Slide courtesy Antigone Dimas Modified from Nica and Dermitzakis Hum Mol Genet 2008
Barrett et al 2008 de Jager et al 2007
Franke et al 2010 Anderson et al 2011
Today: Regulatory variation and e. QTLs 1. Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-e. QTLs – Underlying regulatory variation: e. QTLs, GWAS, cis-e. QTL 2. Finding trans-e. QTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3. Identifying underlying regulatory mechanisms – Cis-e. QTLs: TSS-distance, cell type specificity – e. QTLs vs. GWAS: Expression as intermediate trait 4. Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction
POPULATION DIFFERENCES
Shared association in 8 Hap. Map populations APOH: apolipoprotein H Stranger et al. , PLo. S Gen 2012
Number of genes with cis-e. QTL associations 8 extended Hap. Map populations SRC: permutation threshold Stranger et al. , PLo. S Gen 2012
Direction of allelic effect same SNP-gene combination across populations AGREEMENT OPPOSITE log 2 expression Population 2 log 2 expression Population 1 Stranger et al. , PLo. S Gen 2012
Slide courtesy Alkes Price
Population differences could have non-genetic basis • Differences due to environment? 2008) (Idaghdour et al. • Differences in cell line preparation? (Stranger et al. 2007) • Differences due to batch effects? (Akey et al. 2007) (Reviewed in Gilad et al. 2008) Slide courtesy Alkes Price
Gene expression experiment Does gene expression in 60 CEU + 60 YRI vary with ancestry? c Does gene expression in 89 AA vary with % Eur ancestry? 60 CEU + 60 YRI from Hap. Map, 89 AA from Coriell HD 100 AA Gene expression measurements at 4, 197 genes obtained using Affymetrix Focus array Slide courtesy Alkes Price
Gene expression differences in African Americans validate CEU-YRI differences 12% ± 3% in cis c = 0. 43 (± 0. 02) (P-value < 10 -25) Slide courtesy Alkes Price
RNAseq, GTEx EMERGING EFFORTS
RNAseq questions • Standard e. QTLs – Montgomery et al, Pickrell et al Nature 2010 • Isoform e. QTLs – Depth of sequence! • • Long genes are preferentially sequenced Abundant genes/isoforms ditto Power!? Mapping biases due to SNPs
Strategies for transcript assembly Garber et al. Nat Methods 8: 469 (2011)
GTEx – Genotype-Tissue EXpression An NIH common fund project Current: 35 tissues from 50 donors Scale up: 20 K tissues from 900 donors. Novel methods groups: 5 current + RFA
RNAseq combined with other techs • Regulons: TF gene sets via CHi. P/seq – Look for trans effects • Open chromatin states (Dnase I; methylation) – Find active genes – Changes in epigenetic marks correlated to RNA – Genetic effects • RNA/DNA comparisons – Simultaneous SNP detection/genotyping – RNA editing ? ? ?
Today: Regulatory variation and e. QTLs 1. Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-e. QTLs – Underlying regulatory variation: e. QTLs, GWAS, cis-e. QTL 2. Finding trans-e. QTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3. Identifying underlying regulatory mechanisms – Cis-e. QTLs: TSS-distance, cell type specificity – e. QTLs vs. GWAS: Expression as intermediate trait 4. Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction
- Slides: 59