PGC Worldwide Lab Call Details DATE Friday April
PGC Worldwide Lab Call Details DATE: Friday, April 12 th, 2013 PRESENTER: Alkes Price, Harvard University TITLE: “GWAS in multiple ancestries: heritability, association, and replication” START: We will begin promptly on the hour. 1000 EDT - US East Coast 0700 PDT - US West Coast 1500 BST - UK 1600 CET - Central Europe 0000 AEDT – Australia (Saturday, April 13 th, 2013) DURATION: 1 hour TELEPHONE: - US Toll free: 1 866 515. 2912 - International direct: +1 617 399. 5126 - Toll-free number? See http: //www. btconferencing. com/globalaccess/? bid=75_public - Operators will be on standby to assist with technical issues. “*0” will get you assistance. - This conference line can handle up to 300 participants. PASSCODE: 275 694 38
Lines are Muted NOW Lines have been automatically muted by operators as it is possible for just one person to ruin the call for everyone due to background noise, electronic feedback, crying children, wind, typing, etc. Operators announce callers one at a time during question and answer sessions. Dial *1 if you would like to ask a question of the presenter. Presenter will respond to calls as time allows. Dial *0 if you need operator assistance at any time during the duration of the call.
UPCOMING PGC Worldwide Lab DATE: Friday, May 10 th, 2013 PRESENTER: To Be Announced TITLE: To Be Announced START: We will begin promptly on the hour. 1000 EDT - US East Coast 0700 PDT - US West Coast 1500 BST - UK 1600 CEST - Central Europe 0000 AEST – Australia (Saturday, May 11 th, 2013) DURATION: 1 hour TELEPHONE: - US Toll free: 1 866 515. 2912 - International direct: +1 617 399. 5126 - Toll-free number? See http: //www. btconferencing. com/globalaccess/? bid=75_public - Operators will be on standby to assist with technical issues. “*0” will get you assistance. - This conference line can handle up to 300 participants. PASSCODE: 275 694 38
GWAS in multiple ancestries: heritability, association, and replication Alkes L. Price Harvard School of Public Health April 12, 2013
Should East Asian (or African-American) samples be included in a predominantly European GWAS? Advantages: • More samples => more power if effect sizes are similar. Disadvantages: • Diverse samples => less power if effect sizes are different. ( • Requires careful treatment of population stratification. ) ( • Analysis is more complex. )
Should East Asian (or African-American) samples be included in a predominantly European GWAS? Advantages: • More samples => more power if effect sizes are similar. Disadvantages: • Diverse samples => less power if effect sizes are different. ( • Requires careful treatment of population stratification. ) ( • Analysis is more complex. ) Are effect sizes similar or different in diverse samples? How can we quantify this?
What does “effect sizes” mean in “are effect sizes similar or different”? Causal SNPs at genome-wide significant loci or Top associated SNP at genome-wide significant loci or All genotyped SNPs in the genome
What does “effect sizes” mean in “are effect sizes similar or different”? Causal SNPs at genome-wide significant loci -- Hypothesis: same effect sizes at causal SNPs Top associated SNP at genome-wide significant loci or All genotyped SNPs in the genome
What does “effect sizes” mean in “are effect sizes similar or different”? Causal SNPs at genome-wide significant loci -- Hypothesis: same effect sizes at causal SNPs Top associated SNP at genome-wide significant loci -- This is what matters most in multi-ethnic GWAS All genotyped SNPs at genome-wide significant loci or All genotyped SNPs in the genome
What does “effect sizes” mean in “are effect sizes similar or different”? Causal SNPs at genome-wide significant loci -- Hypothesis: same effect sizes at causal SNPs Top associated SNP at genome-wide significant loci -- This is what matters most in multi-ethnic GWAS All genotyped SNPs at genome-wide significant loci -- Less similar effect sizes due to different LD patterns All genotyped SNPs in the genome -- Again, less similar effect sizes due to different LD patterns
Outline 1. Cross-ethnic replication: are effect sizes similar or different? 2. GWAS in multiple ancestries: should people do it? 3. GWAS in multiple ancestries: how should people do it? 4. Heritability: another approach to similar vs. different effect sizes.
Things I am not planning to talk about today • Use of local ancestry information in admixed populations. (Pasaniuc et al. 2011 PLo. S Genet; Shriner et al. 2011 PLo. S Comp Biol; reviewed in Seldin et al. 2011 Nat Rev Genet) • Fine-mapping using multiple ancestries. (Zaitlen et al. 2010 Am J Hum Genet; Franceschini et al. 2012 Am J Hum Genet; Peters et al. 2013 PLo. S Genet; Wu et al. 2013 PLo. S Genet) • Polygenic risk prediction in Europeans or multiple ancestries. (Purcell et al. 2009 Nature; Ripke et al. 2011 Nat Genet; Chatterjee et al. 2013 Nat Genet)
Outline 1. Cross-ethnic replication: are effect sizes similar or different? 2. GWAS in multiple ancestries: should people do it? 3. GWAS in multiple ancestries: how should people do it? 4. Heritability: another approach to similar vs. different effect sizes.
Cross-ethnic replication: let’s look at replication in Europeans first Slope of log(OR) regression for replication in Europeans: SNPs at P < 5 x 10 -8 in training sample log(odds ratio) in test sample (Umea) Disclaimer: all results preliminary log(odds ratio) in training sample
Cross-ethnic replication: let’s look at replication in Europeans first Slope of log(OR) regression for replication in Europeans: SNPs at P < 5 x 10 -8 in training sample log(odds ratio) in test sample (Bulgarian trios) Disclaimer: all results preliminary log(odds ratio) in training sample
Cross-ethnic replication: let’s look at replication in Europeans first Slope of log(OR) regression for replication in Europeans, as a function of training sample P-value threshold: P –val thresh 10 -7 5 x 10 -8 10 -9 10 -10 10 -11 Umea Slope 0. 74 0. 81 0. 68 0. 72 0. 77 0. 61 BUTR Slope 0. 64 0. 62 0. 67 0. 70 0. 57 0. 76 Ice Slope 0. 67 0. 73 0. 75 0. 77 1. 03 0. 97 Noice Slope 0. 64 0. 62 0. 67 0. 74 0. 75 Disclaimer: all results preliminary
Cross-ethnic replication: let’s look at replication in Europeans first Slope of log(OR) regression for replication in Europeans, as a function of training sample P-value threshold: P –val thresh 10 -7 5 x 10 -8 10 -9 10 -10 10 -11 AVG Slope 0. 69 0. 73 0. 78 0. 77 0. 67 (Note: slope of log(OR) regression is not affected by replication sample size. ) Disclaimer: all results preliminary
Cross-ethnic replication: let’s look at replication in Europeans first Slope of log(OR) regression for replication in Europeans, as a function of training sample P-value threshold: P –val thresh 10 -7 5 x 10 -8 10 -9 10 -10 10 -11 AVG Slope 0. 69 0. 73 0. 78 0. 77 0. 67 (Note: slope of log(OR) regression is not affected by replication sample size. ) Why is slope < 1? • Winner’s curse (effect sizes overestimated in training sample) (Zollner & Pritchard 2007 Am J Hum Genet; Zhong & Prentice 2010 Genet Epidemiol) • True heterogeneity across cohorts (Lee et al. 2012 Nat Genet: genetic correlations ~0. 85) Disclaimer: all results preliminary
Cross-ethnic replication: replication in East Asians Slope of log(OR) regression for replication in Europeans: SNPs at P < 5 x 10 -8 in training sample log(odds ratio) in test sample (East Asians) Disclaimer: all results preliminary log(odds ratio) in training sample
Cross-ethnic replication: replication in East Asians Slope of log(OR) regression for replication in Europeans: SNPs at P < 5 x 10 -8 in training sample log(odds ratio) in test sample (East Asians) Disclaimer: all results preliminary ? ? ? log(odds ratio) in training sample
Cross-ethnic replication: replication in East Asians Slope of log(OR) regression for replication in Europeans: SNPs at P < 5 x 10 -8 in training sample log(odds ratio) in test sample (East Asians) Disclaimer: all results preliminary Not an error. Low MAF in East Asians. log(odds ratio) in training sample
Cross-ethnic replication: replication in East Asians Slope of log(OR) regression for replication in East Asians, as a function of training sample P-value threshold: P –val thresh 10 -7 5 x 10 -8 10 -9 10 -10 10 -11 Asian Slope 0. 40 0. 41 0. 64 0. 58 0. 52 0. 40 (Note: slope of log(OR) regression is not affected by replication sample size. ) Disclaimer: all results preliminary
Cross-ethnic replication: replication in African Americans Work in progress …
Outline 1. Cross-ethnic replication: are effect sizes similar or different? 2. GWAS in multiple ancestries: should people do it? 3. GWAS in multiple ancestries: how should people do it? 4. Heritability: another approach to similar vs. different effect sizes.
GWAS in multiple ancestries: should people do it? • Of course GWAS should be conducted in diverse populations, since some loci may not be discovered in Europeans. (Unoki et al. 2008 Nat Genet; Yasua et al. 2008 Nat Genet: KCNQ 1 T 2 D locus. Also see Rosenberg et al. 2010 Nat Rev Genet; Bustamante et al. 2011 Nature)
GWAS in multiple ancestries: should people do multi-ethnic meta-analysis? • Of course GWAS should be conducted in diverse populations, since some loci may not be discovered in Europeans. (Unoki et al. 2008 Nat Genet; Yasua et al. 2008 Nat Genet: KCNQ 1 T 2 D locus. Also see Rosenberg et al. 2010 Nat Rev Genet; Bustamante et al. 2011 Nature) • The real question is whether to meta-analyze GWAS of multiple ancestries at the discovery stage, before fine-mapping. (More samples => more power if effect sizes are similar. Diverse samples => less power if effect sizes are different. ) Many previous authors have chosen multi-ethnic meta-analysis (Morris et al. 2012 Nat Genet; Estrada et al. 2012 Nat Genet; Franceschini et al. 2012 Am J Hum Genet; Lu et al. 2013 Nat Genet; Fritsche et al. 2013 Nat Genet)
GWAS in multiple ancestries: should people do multi-ethnic meta-analysis? Many previous studies have chosen multi-ethnic meta-analysis (Morris et al. 2012 Nat Genet; Estrada et al. 2012 Nat Genet; Franceschini et al. 2012 Am J Hum Genet; Lu et al. 2013 Nat Genet; Fritsche et al. 2013 Nat Genet) Caveat: reports of similar effect sizes across populations may not represent a random sample if those associated SNPs were discovered in a multi-ethnic meta-analysis, which is primarily well-powered to detect associated SNPs with similar effect sizes across populations.
GWAS in multiple ancestries: should people do multi-ethnic meta-analysis? Many previous studies have chosen multi-ethnic meta-analysis (Morris et al. 2012 Nat Genet; Estrada et al. 2012 Nat Genet; Franceschini et al. 2012 Am J Hum Genet; Lu et al. 2013 Nat Genet; Fritsche et al. 2013 Nat Genet) Caveat: reports of similar effect sizes across populations may not represent a random sample if those associated SNPs were discovered in a multi-ethnic meta-analysis, which is primarily well-powered to detect associated SNPs with similar effect sizes across populations. But many studies have also reported partial replication in multi-ethnic samples of variants discovered in Europeans. (e. g. Waters et al. 2010 PLo. S Genet, N’Diaye et al. 2011 PLo. S Genet)
GWAS in multiple ancestries: should people do multi-ethnic meta-analysis? • Of course GWAS should be conducted in diverse populations, since some loci may not be discovered in Europeans. (Unoki et al. 2008 Nat Genet; Yasua et al. 2008 Nat Genet: KCNQ 1 T 2 D locus. Also see Rosenberg et al. 2010 Nat Rev Genet; Bustamante et al. 2011 Nature) • The real question is whether to meta-analyze GWAS of multiple ancestries at the discovery stage, before fine-mapping. If slope of cross-ethnic log(odds ratio) regression is high: YES If slope of cross-ethnic log(odds ratio) regression is low: NO
GWAS in multiple ancestries: should people do multi-ethnic meta-analysis? • Of course GWAS should be conducted in diverse populations, since some loci may not be discovered in Europeans. (Unoki et al. 2008 Nat Genet; Yasua et al. 2008 Nat Genet: KCNQ 1 T 2 D locus. Also see Rosenberg et al. 2010 Nat Rev Genet; Bustamante et al. 2011 Nature) • The real question is whether to meta-analyze GWAS of multiple ancestries at the discovery stage, before fine-mapping. If slope of cross-ethnic log(odds ratio) regression is high: YES If slope of cross-ethnic log(odds ratio) regression is low: NO
Should people do multi-ethnic meta-analysis? An example Example: 45 K European samples, 5 K non-European samples, slope* of cross-ethnic log(odds ratio) = 0. 6 (*: corrected for winner’s curse) By including 5 K non-European samples in meta-analysis: • Sample size increases by a factor of 1. 111. • log(odds ratio) decreases by a factor of 0. 960. • Sample size * [log(odds ratio)]2 (which determines power) increases by a factor of 1. 111 * [0. 960]2 = 1. 024. (“worst-case” computation for SNPs associated in Europeans)
Should people do multi-ethnic meta-analysis? Yes if slope of cross-ethnic log(odds ratio) ≥ 0. 5 In general: N 1 European samples, N 2 non-European samples, slope* of cross-ethnic log(odds ratio) = s (*: corrected for winner’s curse) By including N 2 non-European samples in meta-analysis: • Sample size increases by a factor of (N 1+N 2)/N 1. • log(odds ratio) decreases by a factor of (N 1+ s. N 2)/(N 1+N 2). • Sample size * [log(odds ratio)]2 (which determines power) increases by a factor of (N 1+s. N 2)2/[N 1(N 1+N 2)]. Conclusion: power will increase if and only if (≈ 0. 5 when N 1 >> N 2). (“worst-case” computation for SNPs associated in Europeans)
Outline 1. Cross-ethnic replication: are effect sizes similar or different? 2. GWAS in multiple ancestries: should people do it? 3. GWAS in multiple ancestries: how should people do it? 4. Heritability: another approach to similar vs. different effect sizes.
GWAS in multiple ancestries: how should people do it? Option #1: Single mega-analysis with union of all samples. Option #2: Analyze each ethnicity separately, then do meta-analysis using weighted z-scores. Option #3: Analyze each ethnicity separately, then do meta-analysis using inverse-variance weighting.
GWAS in multiple ancestries: minimizing dangers of population stratification Option #1: Single mega-analysis with union of all samples. Subtle within-continent ancestries harder to distinguish in PCA of samples with diverse continental ancestries. Option #2: Analyze each ethnicity separately, then do meta-analysis using weighted z-scores. OK for minimizing population stratification. Option #3: Analyze each ethnicity separately, then do meta-analysis using inverse-variance weighting. OK for minimizing population stratification.
GWAS in multiple ancestries: maximizing power Option #1: Single mega-analysis with union of all samples. Lose power if effect sizes same, allele freq different. Option #2: Analyze each ethnicity separately, then do meta-analysis using weighted z-scores. Lose power if effect sizes same, allele freq different. Option #3: Analyze each ethnicity separately, then do meta-analysis using inverse-variance weighting. OK for maximizing power.
GWAS in multiple ancestries: maximizing power: an example Example: 20 K European cases: 20 K European controls: 2 K non-European cases: 2 K non-European controls: p = 0. 015 (odds ratio = 1. 5) p = 0. 01 p = 0. 6 (odds ratio = 1. 5) p = 0. 5
GWAS in multiple ancestries: maximizing power: an example Example: 20 K European cases: 20 K European controls: 2 K non-European cases: 2 K non-European controls: p = 0. 015 (odds ratio = 1. 5) p = 0. 01 p = 0. 6 (odds ratio = 1. 5) p = 0. 5 Combined sample: 22 K cases: 22 K controls: p = 0. 0682 (odds ratio = 1. 27) p = 0. 0545
GWAS in multiple ancestries: maximizing power: an example Example: 20 K European cases: 20 K European controls: 2 K non-European cases: 2 K non-European controls: p = 0. 015 (odds ratio = 1. 5) p = 0. 01 p = 0. 6 (odds ratio = 1. 5) p = 0. 5 Option #1 (Mega-analysis): χ2 = 35. 49, P-value =3 x 10 -9 Option #2 (Meta-analysis, weighted z-scores): χ2 = 31. 49, P-value = 2 x 10 -8 Option #3 (Meta-analysis, inverse-variance weighting): χ2 = 44. 69, P-value = 2 x 10 -11
GWAS in multiple ancestries: how should people do it? Option #1: Single mega-analysis with union of all samples. Option #2: Analyze each ethnicity separately, then do meta-analysis using weighted z-scores. Option #3: Analyze each ethnicity separately, then do meta-analysis using inverse-variance weighting.
Outline 1. Cross-ethnic replication: are effect sizes similar or different? 2. GWAS in multiple ancestries: should people do it? 3. GWAS in multiple ancestries: how should people do it? 4. Heritability: another approach to similar vs. different effect sizes.
What does “effect sizes” mean in “are effect sizes similar or different”? Causal SNPs at genome-wide significant loci -- Hypothesis: same effect sizes at causal SNPs Top associated SNP at genome-wide significant loci -- This is what matters most in multi-ethnic GWAS All genotyped SNPs at genome-wide significant loci -- Less similar effect sizes due to different LD patterns All genotyped SNPs in the genome -- Again, less similar effect sizes due to different LD patterns
Heritability explained by genotyped SNPs (hg 2) Yang et al. 2010 Nat Genet; also see Purcell et al. 2009 Nature
Cross-trait hg 2 Lee et al. 2012 Nat Genet; Lee et al. 2012 Bioinformatics
Cross-population hg 2 • Employ same methods as Lee et al. 2012 papers. • A few thousand samples from each population is sufficient. (Yang et al. 2010 Nat Genet; Lee et al. 2012 Bioinformatics) • Quantifying genome-wide sharing of effect sizes across multi-ethnic populations will yield important clues about disease architecture.
Acknowledgements • Bjarni Vilhjalmsson, HSPH • Stephan Ripke, MGH • PGC Consortium members who have contributed to recent discussions on GWAS in multiple ancestries.
- Slides: 46