Using Y chromosomal haplogroups in genetic association studies

Introduction to Population Genetics Father Mother Childre n Example of transmission of autosomes (and

Inheriting Y-DNA and mt. DNA National Geographic

Y-DNA phylogenetic tree Underhill et al, Dec 2007

Haplogroups �“Group of haplotypes which share a common ancestor with a SNP mutation” ◦

Haplogroups – Exercise SNP 1 4 SNP 1 SNP 3 SNP 5 SNP 8

Haplogroups SNP 1 4 SNP 6 SNP 2 SNP 1 SNP 3 SNP 5

Introduction to study � In apparently homogeneous populations (as defined by PCA), there is

Y-DNA haplogroup R and I? PC 1 Theory PC 2 Is there Y-DNA variability

Aims and objectives �Derive Y-DNA haplogroups of 6, 537 males from two cohorts ◦

Methods � Genome-wide individuals � QC – PLINK SNP chip genotyping of 9, 912

Methods (continued) � Y-DNA haplogroup determination – Yfitter ◦ Haplogroups clustered under ‘Clade’ name

Results – ALSPAC 9, 912 participants with 500, 527 genotyped SNPs QC 8, 365

Results – Direct Association Linear regression between BMI and Y-DNA haplogroup I in ALSPAC

Results – Interaction ALSPAC 1958 BC ALSPAC + 1958 BC Figures a-c: Subgroup analysis

Results – Sub-clustering PC 1 PC 2 Y-DNA haplogroup vs top two principal components

Conclusions �Haplogroup R is the most frequent (72%) and I the second most common

Discussion � Population stratification is a potential confounder in genetic association studies. Haplotypic variation

Discussion � Stratified analysis of Y-DNA haplogroups could be used to: (i) Inform genetic

Limitations of Y-DNA haplogroup studies �Y-DNA haplogroup analyses excludes females �Sample size, especially in

My haplogroups �Y-DNA haplogroup = R 1 b 1 b 2 a R 1

My haplogroups �mt. DNA haplogroup = H 1 Haplogroup H 1 is widespread in

Slides: 27

Download presentation

Using Y chromosomal haplogroups in genetic association studies and suggested implications Erzurumluoglu et al. , Apr 2016 Translation: “Can (non-recombining) paternal ancestry information be of any use in GWAS? ” Nice review: Use of Y chromosome and mitochondrial DNA population structure in tracing human migrations. Underhill et al. , 2007, Ann Rev Genet Fantastic website: http: //www. eupedia. com/europe/origins_haplogroups_europe. shtml Journal club: 15/06/16 Mesut

Introduction to Population Genetics Father Mother Childre n Example of transmission of autosomes (and X chromosome) to offspring Recombination event during meiosis

Inheriting Y-DNA and mt. DNA National Geographic

Y-DNA phylogenetic tree Underhill et al, Dec 2007

Haplogroups �“Group of haplotypes which share a common ancestor with a SNP mutation” ◦ Exercise next slide �Can give information about ancestry, bottlenecks (e. g. war, catastrophe), migrations and disease ◦ Where anthropology, history and genetics meets

Haplogroups – Exercise SNP 1 4 SNP 1 SNP 3 SNP 5 SNP 8 SNP 6 SNP 2 SNP 7 6 individuals (males) genotyped at 8 Y-DNA SNP loci How many haplogroups (clusters) are there? SNP 6 SNP 2

Haplogroups SNP 1 4 SNP 6 SNP 2 SNP 1 SNP 3 SNP 5 Haplogroup A Ancestral SNP: SNP 1 Haplogroup B 1 Ancestral SNP: SNP 6 SNP 2 SNP 7 SNP 8 SNP 6 SNP 2 Haplogroup B Ancestral SNP: SNP 2

Introduction to study � In apparently homogeneous populations (as defined by PCA), there is still Y-DNA haplogroup variation which will result from population history. ◦ Therefore, hidden stratification and/or differential phenotypic effects by Y-DNA haplogroups could exist � To test this, we hypothesised that stratifying individuals according to their Y-DNA haplogroups before testing associations between autosomal SNPs and phenotypes will yield difference in association � Chose to study BMI as a few Y-DNA haplogroups have been directly associated with cardiometabolic diseases ◦ Larger sample sizes in ALSPAC and 1958 BC ◦ BMI studies were well-established at the time of study (2012)

Y-DNA haplogroup R and I? PC 1 Theory PC 2 Is there Y-DNA variability and sub-clustering within a PCA defined ‘homogenous’ cluster (e. g. in CEU)? A real life example would be the North/South sub-clustering that one observes in some populations (e. g. Italian)

Aims and objectives �Derive Y-DNA haplogroups of 6, 537 males from two cohorts ◦ ALSPAC (N=5, 080, 816 Y-DNA SNPs) ◦ 1958 Birth Cohort (N=1, 457, 1, 849 Y-DNA SNPs) �Carry out association study within different strata using 32 (published) BMI SNPs �Explore (sub)stratification within PCA defined groups ◦ Does it exist? ◦ What effect? If any…

Methods � Genome-wide individuals � QC – PLINK SNP chip genotyping of 9, 912 ◦ Individuals �Mismatch of sex �Minimal or excessive heterozygosity �<0. 320 and >0. 345 for the Sanger data �<0. 310 and >0. 330 for the Lab. Corp data �Individual missingness >3% �Cryptic relatedness >10% IBD �Non-European (and White) ancestry ◦ SNPs �MAF <1% �Call rate <95% �HWE P<5 e-7 �Pseudo-autosomal region � Imputation – Ma. CH

Methods (continued) � Y-DNA haplogroup determination – Yfitter ◦ Haplogroups clustered under ‘Clade’ name � Association study between BMI and the Y-DNA haplogroups in ALSPAC - Stata ◦ Individuals in haplogroup R used as baseline (coded as ‘ 0’), and individuals in haplogroup I coded as ‘ 1’ ◦ Covariates: Age, age^2 and top 10 PCs � Analysis of the effects of Y-DNA haplogroups on SNPs associated with BMI - Stata ◦ 32 SNPs associated with BMI (Speliotes et al, 2010) ◦ Covariates: Age, age^2 and top 10 PCs ◦ Likelihood ratio test �Compare the two regression models: With and without an interaction term (genotype dosage x Y-DNA haplogroup) � Replication in 1958 BC - Stata

Results – ALSPAC 9, 912 participants with 500, 527 genotyped SNPs QC 8, 365 unrelated individuals Y-DNA Haplogroup 5, 080 males with Y-DNA haplogroup data Y-DNA Clades Y-DNA haplogroup clades R (72%) and I (19%)

Results – ALSPAC Y-DNA profile Y-DNA haplogroups in ALSPAC has within it individuals belonging to 12 of the major Y-DNA haplogroups (C, E, G, H, I, J, L, N, O, Q, R, T), albeit only 5 of the groups have 50 (>1%) or more individuals in them. These five clades are E, G, I, J and R and have 153 (3%), 94 (1. 9%), 960 (19%), 142 (2. 8%) and 3564 (72%) individuals in them respectively.

Results – 1958 BC Y-DNA profile Y-DNA haplogroups in 1958 Birth Cohort (Dataset: EGAD 0000022) 1958 BC has within it individuals belonging to five major Y-DNA haplogroups (E, I, J, C, R), albeit 2 of the groups have less than 50 individuals in them. The clades E, I, J, C and R have 44 (3%), 296 (20%), 34 (2%), 1 and 1078 (74%) individuals in them respectively.

Results – Direct Association Linear regression between BMI and Y-DNA haplogroup I in ALSPAC and 1958 BC The z-test for heterogeneity shows that the effect size of Y-DNA haplogroup I on BMI is differential depending on the cohort (albeit not to be taken seriously as result has not been replicated somewhere else – see discussion slide)

Results – Interaction ALSPAC 1958 BC ALSPAC + 1958 BC Figures a-c: Subgroup analysis comparing effect size of rs 8050136 on BMI in two Y-DNA haplogroups The statistics above represent p values from the likelihood ratio test for interaction between Y-DNA haplogroup I and rs 8050136 (FTO). Heterogeneity tests (z test) comparing Y-DNA haplogroups I and R yielded p values of 0. 005, 0. 4169 and 0. 014 for a, b and c respectively. ggplot 2 package in R was used to create the plot.

Results – Sub-clustering PC 1 PC 2 Y-DNA haplogroup vs top two principal components in ALSPAC individuals Plotting the Y-DNA haplogroup clades on a PCA plot reveals that there is no apparent sub-clustering within the ALSPAC individuals. Thus adding Y-DNA haplogroup information as covariates to control for additional population stratification in ALSPAC is not needed

Conclusions �Haplogroup R is the most frequent (72%) and I the second most common Y-DNA haplogroup (19%) in ALSPAC (and 1958 BC) �There was no strong evidence of association between Y-DNA haplogroups and BMI ◦ P=0. 066 in ALSPAC ◦ P=0. 107 in 1958 cohort �One instance of heterogeneity (p= 0. 008) between the two haplogroups was observed ◦ Beta=0. 266 (SE=0. 066) and 0. 079 (SE=0. 032) in Y-DNA haplogroup I and R respectively ◦ But result not replicated in 1958 BC �No sub-clustering observed in ALSPAC

Discussion � Population stratification is a potential confounder in genetic association studies. Haplotypic variation and sub-clustering can still be present even after accounting for principal components � Largest Y-DNA haplogroup analysis � Largest Y-DNA haplogroup profile for the (South-West of the) UK � First analysis of its kind ◦ i. e. a stratified approach using Y-DNA haplogroups � No sub-clustering observed in ALSPAC ◦ Structure in common variant analysis does not seem to be a problem after controlling for principal components � If result was replicated, Y-DNA haplogroup I could potential have been a marker for an (unknown) epigenetic (Gx. E) or epistatic (Gx. G) interaction – following slide

Discussion � Stratified analysis of Y-DNA haplogroups could be used to: (i) Inform genetic association studies by identifying which haplogroup(s) account for the association (ii) Assess the strength of known associations and observing whether it still holds in all Y-DNA haplogroups (iii) Observe an effect modification (iv) In relation to point (iii), determine whether the effect modification observed gives a hint about epigenetic (Gx. E) interactions or epistasis (Gx. G) occurring between the autosomal loci being analysed and the Y-DNA loci associated with the haplogroup(s) in which one observes that effect modification (i) Y-DNA (or mt. DNA) haplogroup info could be a way of identifying sub-groups of individuals for different traits/diseases. Real-life example: Bi. Dil (a drug for the treatment of heart failure in self-identified black patients)

Limitations of Y-DNA haplogroup studies �Y-DNA haplogroup analyses excludes females �Sample size, especially in deeper branches of the Y-DNA phylogenetic tree �Very hard to find independent cohorts to replicate studies ◦ ALSPAC and 1958 BC studies kids and adults ◦ Even Europe-based (non-UK) cohorts have remarkably different Y-DNA

Appendice s

European Y-DNA haplogroups

European mt. DNA haplogroups

My haplogroups �Y-DNA haplogroup = R 1 b 1 b 2 a R 1 b 1 b 2 is the most common haplogroup in western Europe (>50% of men). Ancient representatives of the haplogroup were among the first people to repopulate the western part of Europe after the Ice Age ended about 12, 000 years ago. Region: Europe Example Populations: Irish, Basques, British, French 23 andm

My haplogroups �mt. DNA haplogroup = H 1 Haplogroup H 1 is widespread in Europe, especially the western part of the continent. It originated about 13, 000 years ago (likely in the Iberian peninsula), not long after the Ice Age ended Region: Europe, Near East, Central Asia, Northwestern Africa Example Populations: Norway (30%), Spanish (25%), Berbers, Lebanese Highlight: H 1 appears to have been common in Doggerland, an ancient land now flooded by the North Sea. 23 andm