Association analysis Shaun Purcell Boulder Twin Workshop 2004
Association analysis Shaun Purcell Boulder Twin Workshop 2004
Overview • Candidate gene association • Haplotypes and linkage disequilibrium • Linkage and association • Family-based association
What is association? • Categorical traits – disease susceptibility genes • Continuous traits – quantitative trait loci, QTL
Disease traits Is there a difference in allele/genotype frequency between cases and controls? Case AA n 1 Aa n 3 aa Control n 2 n 4 n 5 n 6
Disease traits Is there a difference in allele/genotype frequency between cases and controls? Case AA Aa aa Control 30 50 25 50 20 Test for independence 25 p 2 2 p(1 -p)2 , p-value
Disease traits Additive model General model Case Control AA n 1 n 2 Case A Dominant model for A Control 2 n 1+n 3 2 n 2+n 4 Case Control A* n 1+n n 2+n 4 3 Aa n 3 n 4 aa n 5 n 6 a 2 n 5+n 3 2 n 6+n 4 aa 1 df 2 df Effect sizes calculated as odds ratios n 5 n 6 1 df
Quantitative traits Aa aa AA aa Y = a. A + d. D + e ID 001 002 003 004 005 … Y 0. 34 1. 23 1. 66 2. 74 1. 33 … Aa G aa Aa Aa AA AA … AA A -1 0 0 1 1 … D 0 1 1 0 0 …
Some web resources • BGIM http: //statgen. iop. kcl. ac. uk/bgim/ Introductory tutorials on twin analysis, primer on maximum likelihood, Mx language. • Gx. E moderator models http: //statgen. iop. kcl. ac. uk/gxe/ • Power calculation http: //statgen. iop. kcl. ac. uk/gpc/ • Case/control association tools http: //statgen. iop. kcl. ac. uk/gpc/model/
Relative risk Genotype P(D|G) RR AA P(D|AA)/P(D|aa) Aa P(D|Aa)/P(D|aa) aa P(D|aa) 1 P(D|AA) / P(D|aa) labelled RR(AA) P(D|Aa) / P(D|aa) labelled RR(Aa)
Genetic models Model RR(Aa) RR(AA) General x y Multiplicative x x 2 Dominant x x Recessive 1. 000 x No effect 1. 000
Tests Test Alternate Null Any effect? General No effect Any effect assuming a multiplicative gene? Multiplicative No effect Any effect assuming a dominant gene? Dominance No effect Any effect assuming a recessive gene? Recessive No effect Can we assume a multiplicative effect? General Multiplicative Can we assume a dominant effect? General Dominance Can we assume a recessive effect? General Recessive
Multiple samples • Constrain frequencies across samples • Constrain effects across samples – Can test genetic models with effects and/or frequencies constrained to be equal – Can perform tests of homogeneity of effects and/or frequencies across samples
An example 2 case/control samples • Population frequency 5% Case Control AA 17 11 AA 37 10 Aa 35 59 Aa 67 43 aa 24 40 aa 20 37
Homogeneous effects across samples Homogeneous allele frequencies across samples Model ----Gen Mult Dom Rec None p 0. 367 RR(Aa) -----1. 979 RR(AA) -----3. 663 -2 LL ---- 0. 367 1. 911 3. 651 793. 199 0. 401 1. 990 802. 927 0. 405 1. 000 1. 921 805. 064 0. 442 1. 000 815. 628 793. 143
Heterogeneous effects across samples Homogeneous allele frequencies across samples Model ----Gen Mult Dom Rec None p 0. 367 RR(Aa) -----1. 235 2. 890 RR(AA) -----2. 136 5. 547 -2 LL ---- 0. 367 1. 440 2. 282 2. 073 5. 208 788. 262 0. 401 1. 216 2. 936 796. 422 0. 405 1. 000 1. 519 2. 195 803. 849 0. 443 1. 000 815. 628 786. 498
TESTS OF GENETIC MODELS -- ASSUMING EQ EFFECTS & EQ FREQS ============================= Gen Mult Dom Rec Gen Gen vs vs None Mult Dom Rec (2 (1 (1 (1 df) df) : : : : 22. 485 22. 429 12. 701 10. 564 0. 056 9. 784 11. 921 p p p p = = = = 0. 000 0. 001 0. 813 0. 002 0. 001 TESTS OF GENETIC MODELS -- ASSUMING UNEQ EFFECTS & EQ FREQS ============================== Gen Mult Dom Rec Gen Gen vs vs None Mult Dom Rec (4 (2 (2 (2 df) df) : : : : 29. 130 27. 366 19. 205 11. 779 1. 764 9. 925 17. 351 p p p p = = = = 0. 000 0. 003 0. 414 0. 007 0. 000 TESTS OF EQUAL EFFECTS -- ASSUMING EQ FREQS ====================== w/ w/ Gen model Mult model Dom model Rec model (2 (1 (1 (1 df) df) : : 6. 645 4. 938 6. 505 1. 215 p p = = 0. 036 0. 026 0. 011 0. 270
Indirect association Genotyped markers QTL Ungenotyped markers
Recombination Homologous chromosomes in one parent Paternal chromosome Maternal chromosome Recombination event during meiosis Recombinant gamete transmitted, harboring mutation
Recombination Homologous chromosomes in one parent Paternal chromosome Maternal chromosome No recombination event during meiosis Nonrecombinant gamete transmitted, not harboring mutation
Linkage: affected sib pairs Paternal chromosome Maternal chromosome First affected offspring, no recombination Second affected offspring, recombinant gamete IBD sharing from this one parent (0 or 1) 1 0
Association analysis • Mutation occurs on a ‘red’ chromosome
Association analysis • Mutation occurs on a ‘red’ chromosome
Association analysis • Association due to `linkage disequilibrium’
Haplotypes M m A AM Am a a. M am This individual has aa and Mm genotypes and am and a. M haplotypes
Haplotypes M m A AM Am a a. M am This individual has Aa and Mm genotypes and AM and am haplotypes … but given only genotype data,
Haplotypes M m A AM Am a a. M am This individual has AA and Mm genotypes and AM and Am haplotypes
Equilibrium haplotype frequencies M m A pr qr r a ps qs s p q
Linkage disequilibrium M m A pr + D qr - D r DMAX = Min(qs, pr) D’ = D /DMAX r 2 = D’ / pqrs a ps - D qs + D s p q
Haplotype analysis 1. Estimate haplotypes from genotypes 2. Associate haplotypes with trait Haplotype AAGG AAGT CGCG AGCT Freq. 40% 30% 25% 5% Odds Ratio 1. 00* 2. 21 1. 07 0. 92 * baseline, fixed to 1. 00
Linkage Association Sib correlation Trait aa Aa AA QTL genotype 0 1 2 IBD at the QTL Sib correlation Trait LD RF 0 1 2 IBD at the Marker Trait 0 1 2 IBD at the QTL aa Aa AA Marker genotype aa Aa AA QTL genotype
Variance Components • Means M 1 M 2 ASSOCIATION • Variance-covariance matrix LINKAGE V 1 C 12 C 21 V 2
Variance Components • Means M 1 + b. G 1 M 2 + b. G 2 ASSOCIATION b = regression coef. G = individual’s genotype • Variance-covariance matrix LINKAGE V 1 C 12 + q( -½) C 21+ q( -½) V 2 q = regression coef. = IBD sharing 0, ½, 1
Components of a Genetic Theory G – Allele & genotype frequencies G – Demographics & population history – Linkage disequilibrium, haplotype structure • TRANSMISSION MODEL – Mendelian segregation – Identity by descent & genetic relatedness G G G G G P P • PHENOTYPE MODEL – Biometrical model of quantitative traits – Additive & dominance components G Time • POPULATION MODEL G
Linkage without association 3/5 3/6 2/6 5/6 3/5 3/2 Both families are ‘linked’ with the marker… …but a different allele is involved. 2/6 5/2
Linkage and association 3/5 3/6 2/6 5/6 3/2 2/4 6/2 4/6 6/6 All families are ‘linked’ with the marker… … and allele 6 is ‘associated’ with disease Linkage is just association within families 2/6 6/6
Association without linkage Controls Cases 6/6 6/2 3/5 3/4 3/6 2/4 3/2 5/6 3/6 4/6 2/2 2/6 5/2 Allele 6 is more common in the GREEN population The disease is more common in the GREEN population … a ‘spurious association’ 2/5
TDT • Transmission disequilibrium test – test for linkage and association AA Aa Aa AA AA AA aa AA Aa Aa
TDT “A” disease allele AA x Aa aa x Aa AA Aa Aa aa Additive + - Dominant 0. 5 + - Recessive + - 0. 5
Between and within components Sib 1 Sib 2 Sib 1 = B - W Sib 2 = B + W
Between and within components • Fulker et al (1999) S 1 S 2 B W AA AA 1 1 1 0 B+W B-W AA Aa 1 0 0. 5 B+W B-W AA aa 1 -1 0 1 B+W B-W Note : W = S 1 – B S 1 S 2
Parental genotypes • Use parental genotypes to generate B • Examples – AA from AAx. AA – Aa from AAx. Aa – Aa from Aax. Aa W=0 W = -0. 5 W=0 Pat Mat B 1 1 0 -1 1 0. 5 0 0 -1 -1 -1 1 0 -1 0. 5 0 -0. 5 -1
assoc. mx • Sibling pair sample • B and W components precalculated in input file • Single SNP genotype • Quantitative trait
assoc. dat s 1 -0. 007 -0. 829 0. 369 0. 318 1. 52 -0. 948 0. 596 -1. 91 0. 499 -1. 17 -0. 16 s 2 -0. 972 -0. 196 0. 645 1. 55 0. 910 -1. 55 -0. 394 -0. 905 0. 940 -1. 29 -1. 81 g 1 -1 1 1 0 0 1 1 1 g 2 0 1 1 1 0 1 0 0 1 b -0. 5 1 1 0. 5 0. 5 1 w 1 -0. 5 0 0 0. 5 -0. 5 0 w 2 0. 5 0 0 -0. 5 0
! Mx script for QTL association: sib pairs, univariate Group 1 : Calc NG=2 Begin Matrices; ! ** Parameters B Full 1 1 free W Full 1 1 free M Full 1 1 free S Full 1 1 free N Full 1 1 free ! association : between component ! association : within component ! mean ! Shared residual variance ! Nonshared residual variance ! ** Definition variables ** C Full 1 1 ! association : between X Full 1 1 ! association : within, sib 1 Y Full 1 1 ! association : within, sib 2 End Matrices; ! ** Uncomment for B=W model ! Equate W 1 1 1 B 1 1 1 ! Starting values Matrix B 0 Matrix W 0 Matrix M 0 Matrix S 0. 5 Matrix N 0. 5 End
Group 2 : Data Group Data NI=7 NO=0 RE file=assoc. dat Labels Sib 1 Sib 2 g 1 g 2 b w 1 w 2 Select Sib 1 Sib 2 b w 1 w 2 / Definition b w 1 w 2 / Matrices = Group 1 Means M + B*C + W*X Covariance S + N S | | Specify C b / Specify X w 1 / Specify Y w 2 / End | S _ S + N / M + B*C + W*Y /
Models B&W B Full 1 1 free W Full 1 1 free !Equate W 1 1 1 B=W B Full 1 1 free W Full 1 1 free Equate W 1 1 1 B B Full 1 1 free W Full 1 1 !Equate W 1 1 1 B=W=0 B Full 1 1 W Full 1 1 !Equate W 1 1 1 B 1 1 1
Tests Test HA H 0 Standard association test B=W=0 Test of stratification B&W B=W Robust association test B&W B
assoc. mx Model B W -2 LL df B&W -0. 478 -0. 365 2103. 96 795 B=W -0. 420 2105. 05 796 B -0. 4778 2127. 01 796 2163. 34 797 B=W=0 Test of total association HA H 0 B=W=0 2105. 05 2163. 34 Δ-2 LL = 58. 29, df = 1, p < 1 e-14
assoc. mx Model B W -2 LL df B&W -0. 478 -0. 365 2103. 96 795 B=W -0. 420 2105. 05 796 B -0. 4778 2127. 01 796 2163. 34 797 B=W=0 Test of stratification HA H 0 B &W B=W 2103. 96 2105. 05 Δ-2 LL = 1. 09, df = 1, p =0. 29
assoc. mx Model B W -2 LL df B&W -0. 478 -0. 365 2103. 96 795 B=W -0. 420 2105. 05 796 B -0. 4778 2127. 01 796 2163. 34 797 B=W=0 Test of within association HA H 0 B &W B 2103. 96 2127. 01 Δ-2 LL = 23. 06, df = 1, p < 1 e-6
Implementation • QTDT – Abecasis et al (2001) AJHG – extends between/within model to general pedigrees – multiple alleles – covariates – combined test of linkage and association – discrete as well as quantitative traits
Linkage Association • families • unrelateds or families • detectable over large distances >10 c. M • detectable over small distances <1 c. M • large effects OR >3, variance>10% • small effects OR<2, variance<1%
- Slides: 55