Genetic MetaAnalysis and Mendelian Randomization George Davey Smith

  • Slides: 40
Download presentation
Genetic Meta-Analysis and Mendelian Randomization George Davey Smith MRC Centre for Causal Analyses in

Genetic Meta-Analysis and Mendelian Randomization George Davey Smith MRC Centre for Causal Analyses in Translational Epidemiology, University of Bristol

RCT vs Observational Meta. Analysis: fundamental difference in assumptions • In meta-analysis of observational

RCT vs Observational Meta. Analysis: fundamental difference in assumptions • In meta-analysis of observational studies confounding, residual confounding and bias: – May introduce heterogeneity – May lead to misleading (albeit very precise) estimates

Trial (Year) Mortality results from 33 trials of beta-blockers in secondary prevention after myocardial

Trial (Year) Mortality results from 33 trials of beta-blockers in secondary prevention after myocardial infarction Barber (1967) Reynolds (1972) Wilhelmsson (1974) Ahlmark (1974) Multicentre International (1975) Yusuf (1979) Andersen (1979) Rehnqvist (1980) Baber (1980) Wilcox Atenolol (1980) Wilcox Propanolol (1980) Hjalmarson (1981) Norwegian Multicentre (1981) Hansteen (1982) Julian (1982) BHAT (1982) Taylor (1982) Manger Cats (1983) Rehnqvist (1983) Australian-Swedish (1983) Mazur (1984) EIS (1984) Salathia (1985) Roque (1987) LIT 91987) Kaul (1988) Boissel (1990) Schwartz low risk (1992) Schwartz high risk (1992) SSSD (1993) Darasz (1995) Basu (1997) Aronow (1997) Adapted from Freemantle et al BMJ 1999 0. 80 (0. 74 - 0. 86) Overall (95% CI) 0. 1 0. 2 0. 5 1 2 5 10 Relative risk (95% confidence interval)

Results from 29 studies examining the association between intact foreskin and the risk of

Results from 29 studies examining the association between intact foreskin and the risk of HIV infection in men Study Allen Barongo Bollinger Bwayo Cameron Carael Chao Chiasson Diallo Greenblatt Grosskurth Hira Hunter Konde-Luc Kreiss Malamba Mehendal Moss Nasio Pepin Quigley Sassan Sedlin Seed Simonsen Tyndall Urassa 1 Urassa 2 Urassa 3 Urassa 4 Urassa 5 Van de Perre 0. 2 0. 5 1 2 Relative risk (95% confidence interval) 5 10 Adapted from Van Howe Int J STD AIDS 1999

Vitamin E supplement use and risk of Coronary Heart Disease 1. 0 Stampfer et

Vitamin E supplement use and risk of Coronary Heart Disease 1. 0 Stampfer et al NEJM 1993; 328: 144 -9; Rimm et al NEJM 1993; 328: 1450 -6; Eidelman et al Arch Intern Med 2004; 164: 1552 -6

Genetic meta-analysis, while of observational data, may be analogous to RCT meta-analysis NOT conventional

Genetic meta-analysis, while of observational data, may be analogous to RCT meta-analysis NOT conventional observational meta-analysis

Clustered environments and randomised genes (93 phenotypes, 23 SNPs) Phenotype / phenotype 4278 pairwise

Clustered environments and randomised genes (93 phenotypes, 23 SNPs) Phenotype / phenotype 4278 pairwise associations Phenotype / genotype 2139 pairwise combinations Genotype / genotype 253 pairwise combinations 43% significant 20 significant at 4 / 253 at p<0. 01 vs 21 significant at expected p<0. 01 vs 3 expected Davey Smith et al. PLo. S Medicine 2007 in press

WTCCC: blood donors versus 1958 birth cohort controls

WTCCC: blood donors versus 1958 birth cohort controls

A leading epidemiologist speaks … “Forget what you learnt at the London School of

A leading epidemiologist speaks … “Forget what you learnt at the London School of Hygiene and Tropical Medicine …. just get as many cases as possible and a bunch of controls from wherever you can. . ” Paul Mc. Keigue, Nov 2002

Or the polite version … “This approach allows geneticists to focus on collecting large

Or the polite version … “This approach allows geneticists to focus on collecting large numbers of cases and controls at low cost, without the strict population-based sampling protocols that are required to minimize selection bias in case-control studies of environmental exposures” Am J Human Genetics 2003; 72: 1492 -1504

If not confounding or selection bias, why have genetic association studies such a poor

If not confounding or selection bias, why have genetic association studies such a poor history of replication?

Are genetic association studies replicable? Hirschhorn et al reviewed 166 putative associations for which

Are genetic association studies replicable? Hirschhorn et al reviewed 166 putative associations for which there were 3 or more published studies and found that only 6 had been consistently replicated (defined as “achieving statistically significant findings in 75% or more of published studies”) Hirschhorn JN et al. Genetics in Medicine 2002; 4: 45 -61

Reasons for inconsistent genotype – disease associations True variation Variation of allelic association between

Reasons for inconsistent genotype – disease associations True variation Variation of allelic association between subpopulations Effect modification by other genetic or environmental factors that vary between populations Spurious variation Misclassification of phenotype Confounding by population structure Lack of power Chance Publication bias Colhoun et al, Lancet 2003; 361: 865 -72

True variation in genotype and health outcome between populations Disease-causing allele is in LD

True variation in genotype and health outcome between populations Disease-causing allele is in LD with More likely when disease-causing a different allele at the marker locus variant is rare or has been subject to in different groups selection pressure Allelic heterogeneity (different variants within the same gene) between ethnic groups The association is modified by other genetic or environmental factors that vary between the groups studied Effect modification by genes unlikely to account for failure to replicate studies in similar populations. Modification by environmental factors more likely, especially when absolute risk of disease varies

Biases vary between studies Differential misclassification of genotypes Avoided by appropriate laboratory procedures Differential

Biases vary between studies Differential misclassification of genotypes Avoided by appropriate laboratory procedures Differential misclassification of outcome: possible if genotype is known when outcome is classified Unlikely, because outcome is usually confirmed in advance of genotyping

Confounding by population substructure Population is divided into strata that vary by disease risk

Confounding by population substructure Population is divided into strata that vary by disease risk and by allele frequencies at the marker locus Unlikely to be a serious problem in most studies: when confounding is a problem, it can be controlled in study design by restriction or use of family-based controls, or in the analysis by quantifying and

Case-mix heterogeneity Case mix heterogeneity in an apparently homogenous outcome between populations studied: for

Case-mix heterogeneity Case mix heterogeneity in an apparently homogenous outcome between populations studied: for instance in a study of stroke, mix of haemorrhagic and thrombotic subtypes may vary Unlikely to be an explanation for failure to replicate studies in similar populations with similar case sampling strategies

Absence of power leading to false-negative results and failure to replicate Failure to consider

Absence of power leading to false-negative results and failure to replicate Failure to consider that the initial effect size reported is an inflation of the true effect size Replication studies should be powered to detect effect sizes that are smaller than the initial effect size reported, especially when the initial study had low power

The Beavis effect If the location of a variant and its phenotypic effect size

The Beavis effect If the location of a variant and its phenotypic effect size are estimated from the same data sets, the effect size will be over-estimated, in many cases substantially. Statistical significance and the estimated magnitude of the parameter are highly correlated. H Göring et al. Am J Hum Genetics 2001; 69: 1357 -69

False positive results by chance in initial positive studies Multiple testing and Perhaps the

False positive results by chance in initial positive studies Multiple testing and Perhaps the most publication bias: multiple likely reason for loci are assessed in each failure to replicate? study, many statistical tests are done, and multiple studies are undertaken but only positive results are reported

What is being associated in genetic association studies? • Estimates of 15 M SNPs

What is being associated in genetic association studies? • Estimates of 15 M SNPs in human genome (rare allele frequency >1% in at least one population) • Large number of outcomes (diseases and subcategories of particular disease outcomes) • Large number of potential subgroups • Multiple possible genetic contrasts

What percentage of associations that are studied actually exist? … 1 in 10? (at

What percentage of associations that are studied actually exist? … 1 in 10? (at 80% power, 5% significance level) Polymorphism really is associated with disease Polymorphism is Total not associated with disease Result of experiment Association declared 80 to exist 45 125 Association not declared to exist 20 855 875 Total 100 900 1000 Oakes 1986; Davey Smith 1998; Sterne & Davey Smith 2001

Percentage of “significant” results that are false positives if 10% of studied associations actually

Percentage of “significant” results that are false positives if 10% of studied associations actually exist Power of study (% of time we reject null hypothesis if it is false) P=0. 05 P=0. 01 P=0. 001 20 69. 2 31. 0 4. 3 50 47. 4 15. 3 1. 8 80 36. 0 10. 1 1. 1 Sterne & BMJ 2001; 322: 226 -231 Davey Smith

Percentage of “significant” results that are false positives if 1% of studied associations actually

Percentage of “significant” results that are false positives if 1% of studied associations actually exist Power of study (% of time we reject null hypothesis if it is false) P=0. 05 P=0. 01 P=0. 001 20 96. 1 83. 2 33. 1 50 90. 8 66. 4 16. 5 80 86. 1 55. 3 11. 0 Sterne & Davey Smith. BMJ 2001; 322: 226 -231

P values often misinterpreted in both genetic and conventional epidemiology Low prior probability major

P values often misinterpreted in both genetic and conventional epidemiology Low prior probability major issue in genetic epidemiology; meaningless (but real) associations a major issue in conventional epidemiology

Why has replication proved to be so difficult? § LOW STATISTICAL POWER § A

Why has replication proved to be so difficult? § LOW STATISTICAL POWER § A consistent feature of almost all analyses § Fundamental to many of the explanations or the approach needed to correct for them § If we need 5, 000 cases to test for a given aetiological effect with a power of 80%, and with a critical p-value of 0. 0001, how much power would there

Why has replication proved to be so difficult? § LOW STATISTICAL POWER!! § A

Why has replication proved to be so difficult? § LOW STATISTICAL POWER!! § A key feature of almost all proffered explanations, and/or of the approach needed to correct for them § If we need 5, 000 cases to test for a given aetiological effect with a power of 80%, and with a critical p-value of 0. 0001, how much power would there be for a study with 500 cases? 0. 008

Deducing “true numerical ratios” requires “the greatest possible number of individual values; and the

Deducing “true numerical ratios” requires “the greatest possible number of individual values; and the greater the number of these the more effectively will mere chance be eliminated”. Gregor Mendel 1865/6

Association of GNB 3 and Hypertension Bagos et al, J Hypertens March 2007 34

Association of GNB 3 and Hypertension Bagos et al, J Hypertens March 2007 34 Studies Cases = 14, 094 Controls = 17, 760 Total = 21, 654

Are genetic associations studies replicable: take two? Joel Hirschhorn’s group selected 25 of the

Are genetic associations studies replicable: take two? Joel Hirschhorn’s group selected 25 of the 166 genetic associations that they had studied and performed formal meta-analysis, claiming that 8 of these (one third) were robust. “One third” claim widely welcomed! Lohmueller KE et al. Nature Genetics 2003; 33: 177 -182

Replicable Studies ABCC 8, type 2 diabetes 2. 28 (1. 27 -4. 10) COL

Replicable Studies ABCC 8, type 2 diabetes 2. 28 (1. 27 -4. 10) COL 1 A 1, fracture 1. 59 (1. 36 -1. 86) CTLA 4, type 1 diabetes 1. 27 (1. 17 -1. 37) DRD 3, schizophrenia 1. 12 (1. 02 -1. 23) GSTM 1, head/neck cancer 1. 20 (1. 09 -1. 33) HTR 2 A, schizophrenia 1. 07 (1. 01 -1. 14) PPARG, type 2 diabetes 1. 22 (1. 08 -1. 37) SLC 2 A 1, type 2 diabetes 1. 76 (1. 35 -2. 31)

Are genetic associations studies replicable: take two? “Low hanging fruit” and a best-case scenario.

Are genetic associations studies replicable: take two? “Low hanging fruit” and a best-case scenario. Effect size estimates not so widely welcomed. .

All Studies Combined 14, 585 cases 17, 968 controls 1. 17 1. 12 1.

All Studies Combined 14, 585 cases 17, 968 controls 1. 17 1. 12 1. 13 1. 20 1. 12 1. 14 1. 12 1. 37 Science, June 1, 2007 1. 14 TCF 7

Nature, June 7, 2007 Distribution of OR’s for 70 Common Disease Variants % Odds

Nature, June 7, 2007 Distribution of OR’s for 70 Common Disease Variants % Odds Ratio

for exposures with small effect sizes it is very difficult to exclude confounding and

for exposures with small effect sizes it is very difficult to exclude confounding and bias in conventional epidemiology, and level of statistical “significance” does not help statistical deviation from the null more important in genetic epidemiology

Mendel on Mendelian randomization Mendel in 1862 “the behaviour of each pair of differentiating

Mendel on Mendelian randomization Mendel in 1862 “the behaviour of each pair of differentiating characteristics in hybrid union is independent of the other differences between the two original plants, and, further, the hybrid produces just so many kinds of egg and pollen cells as there are possible constant combination forms” (Sometimes called Mendel’s second law – the law of independent assortment) Gregor Mendel, 1865.

Mendelian randomization Genotypes can proxy for some modifiable environmental factors, and there should be

Mendelian randomization Genotypes can proxy for some modifiable environmental factors, and there should be no confounding of genotype by behavioural, socioeconomic or physiological factors (excepting those influenced by alleles at closely proximate loci or due to population stratification), no bias due to reverse causation, and lifetime exposure patterns can be captured

Mendelian randomisation and RCTs MENDELIAN RANDOMISATION RANDOM SEGREGATION OF ALLELES EXPOSED: FUNCTIONAL ALLELLES CONTROL:

Mendelian randomisation and RCTs MENDELIAN RANDOMISATION RANDOM SEGREGATION OF ALLELES EXPOSED: FUNCTIONAL ALLELLES CONTROL: NULL ALLELLES CONFOUNDERS EQUAL BETWEEN GROUPS OUTCOMES COMPARED BETWEEN GROUPS RANDOMISED CONTROLLED TRIAL RANDOMISATION METHOD EXPOSED: INTERVENTION CONTROL: NO INTERVENTION CONFOUNDERS EQUAL BETWEEN GROUPS OUTCOMES COMPARED BETWEEN GROUPS