Multiple Comparisons Measures of LD Jess Paulus Sc

  • Slides: 19
Download presentation
Multiple Comparisons Measures of LD Jess Paulus, Sc. D January 29, 2013

Multiple Comparisons Measures of LD Jess Paulus, Sc. D January 29, 2013

Today’s topics 1. 2. Multiple comparisons Measures of Linkage disequilibrium • D’ and r

Today’s topics 1. 2. Multiple comparisons Measures of Linkage disequilibrium • D’ and r 2 • r 2 and power

Multiple testing & significance thresholds l l Concern about multiple testing Standard thresholds (p<0.

Multiple testing & significance thresholds l l Concern about multiple testing Standard thresholds (p<0. 05) will lead to a large number of “significant” results l l Vast majority of which are false positives Various approaches to handling this statistically

Possible Errors in Statistical Inference Unobserved Truth in the Population Ha: SNP prevents DM

Possible Errors in Statistical Inference Unobserved Truth in the Population Ha: SNP prevents DM Observed in the Sample H 0: No association Reject H 0: SNP prevents DM True positive (1 – β) False positive Type I error (α) Fail to reject H 0: No assoc. False negative Type II error (β): True negative (1 - α)

Probability of Errors α= Also known as: “Level of significance” Probability of Type I

Probability of Errors α= Also known as: “Level of significance” Probability of Type I error – rejecting null hypothesis when it is in fact true (false positive), typically 5% p value = The probability of obtaining a result as extreme or more extreme than you found in your study by chance alone

Type I Error (α) in Genetic and Molecular Research A genome-wide association scan of

Type I Error (α) in Genetic and Molecular Research A genome-wide association scan of 500, 000 SNPs will yield: 25, 000 false positives by chance alone using α = 0. 05 5, 000 false positives by chance alone using α = 0. 01 500 false positives by chance alone using α = 0. 001

Multiple Comparisons Problem l l l Multiple comparisons (or "multiple testing") problem occurs when

Multiple Comparisons Problem l l l Multiple comparisons (or "multiple testing") problem occurs when one considers a set, or family, of statistical inferences simultaneously Type I errors are more likely to occur Several statistical techniques have been developed to attempt to adjust for multiple comparisons l Bonferroni adjustment

Adjusting alpha l Standard Bonferroni correction l l l Test each SNP at the

Adjusting alpha l Standard Bonferroni correction l l l Test each SNP at the α* =α /m 1 level Where m 1 = number of markers tested Assuming m 1 = 500, 000, a Bonferroni-corrected threshold of α*= 0. 05/500, 000 = 1 x 10– 7 Conservative when the tests are correlated Permutation or simulation procedures may increase power by accounting for test correlation

Measures of LD Jess Paulus, Sc. D January 29, 2013

Measures of LD Jess Paulus, Sc. D January 29, 2013

Haplotype definition l Haplotype: an ordered sequence of alleles at a subset of loci

Haplotype definition l Haplotype: an ordered sequence of alleles at a subset of loci along a chromosome l Moving from examining single genetic markers to sets of markers

Measures of linkage disequilibrium a g A G A G A g a g

Measures of linkage disequilibrium a g A G A G A g a g A G A G A G a g l Basic data: table of haplotype frequencies A a G 8 0 50% g 2 6 50% 62. 5% 37. 5%

D’ and r 2 are most common l l Both measure correlation between two

D’ and r 2 are most common l l Both measure correlation between two loci D prime … l l Ranges from 0 [no LD] to 1 [complete LD] R squared… l l also ranges from 0 to 1 is correlation between alleles on the same chromosome

D l Deviation of the observed frequency of a haplotype from the expected is

D l Deviation of the observed frequency of a haplotype from the expected is a quantity called the linkage disequilibrium (D) l If two alleles are in LD, it means D ≠ 0 l l If D=1, there is complete dependency between loci Linkage equilibrium means D=0

G g Measure A n 11 n 01 n 1 a n 10 n

G g Measure A n 11 n 01 n 1 a n 10 n 0 Formula n 1 n 0 Ref. D’ Lewontin (1964) 2 = r 2 * Hill and Weir (1994) Levin (1953) Edwards (1963) Q Yule (1900)

a g A G A G A g a g A G A G

a g A G A G A g a g A G A G A G a g D’ = A G 8 g 2 62. 5% a 0 6 37. 5% D’ =(8 6 – 0 x 2) / (8 6) =1 50% R 2 = r 2 = (8 6 – 0 x 2)2 / (10 6 8 8) =. 6

r 2 and power l r 2 is directly related to study power l

r 2 and power l r 2 is directly related to study power l A low r 2 corresponds to a large sample size that is required to detect the LD between the markers l r 2*N is the “effective sample size” l If a marker M and causal gene G are in LD, then a study with N cases and controls which measures M (but not G) will have the same power to detect an association as a study with r 2*N cases and controls that directly measured G

r 2 and power l Example: l N = 1000 (500 cases and 500

r 2 and power l Example: l N = 1000 (500 cases and 500 controls) l r 2 = 0. 4 l If you had genotyped the causal gene directly, would only need a total N=400 (200 cases and 200 controls)

Today’s topics 1. 2. Multiple comparisons Measures of Linkage disequilibrium • D’ and r

Today’s topics 1. 2. Multiple comparisons Measures of Linkage disequilibrium • D’ and r 2 • r 2 and power