Hap Map PROJECT Basics Hap Map The International
Hap. Map PROJECT Basics
Hap. Map • The International Hap. Map Project is analyzing DNA from populations with African, Asian, and European ancestry
Multiple Populations • The DNA samples for the Hap. Map have come from a total of 270 people. – The Yoruba people of Ibadan, Nigeria, provided 30 sets of samples from two parents and an adult child (each such set is called a trio). – In Japan, 45 unrelated individuals from the Tokyo area provided samples. – In China, 45 unrelated individuals from Beijing provided samples. – Thirty U. S. trios provided samples, which were collected in 1980 from U. S. residents with northern and western European ancestry by the Centre d'Etude du Polymorphisme Humain (CEPH).
Methods • The blood samples are being converted into cell lines, DNA extracted. • The samples and cell lines are not linked to any individual in the populations studied. However, the samples and cell lines are identified as coming from one of the four populations participating in the study, which raises ethical issues associated with conducting genetic research in named populations.
SNP Nomenclature • http: //snp 500 cancer. nci. nih. gov/terms_snp _region. cfm
Hardy Weinberg Test • http: //innateimmunity. net/IIPGA 2/Bioinform atics/exacthweform
IIPGA
Exact HWE
Fishers Exact Test
Fishers Exact Test
Homework http: //www. hsph. harvard. edu/bioinfocore/Documents/Talk%20 slid es/Bioinfo_training_August_10_05_tutorial_Niu_T. pdf
SNPcutter http: //bioinfo. bsd. uchicago. edu/SNP_cutter. htm
SNP and Cancer • A SNP is defined as a genomic locus where two or more alternative bases occur with appreciable frequency (>1%). • Occurs every several hundred bases. • Whole genome SNP analysis is possible.
Applications • Direct Association Analysis: – Test association between putative functional variants and disease risk. • Evaluation of nonsynonymous SNPs or regulatory polymorphisms = functional SNPs. • Problem: there are not that many functional SNPs. • Uncharacterized de novo mutations? ? ?
Examples • 2 MMP 9 nonsynonymous SNPs associated with risk of lung cancer with metastasis (Hu et al. 2005 b) • Coding polymorphisms within UGT 1 A 7 predict response of colorectal patients to capecitabine (Carlini et al. 2005). • Functional MTHFR mutations linked to several different cancers.
Direct Association • Candidate gene or genomic region. – Linkage analysis – Expression array analysis – Knowledge of development and physiology – Comparative genomics
Tools • PANTHER database- evolutionary analysis of coding SNPs. • SNPEffect-estimate likelihood that a particular SNP is causing a functional effect. • SNPSeek->90 000 coding SNPs in the exons of known genes • SNP 500 Cancer – identification, validation, and characterization of polymorphisms.
PANTHER http: //www. pantherdb. org/tools/csnp. Score. Form. jsp
PANTHER http: //www. pantherdb. org/tools/csnp. Score. Form. jsp
PANTHER
ABCA 1
Poly. Phen • http: //genetics. bwh. harvard. edu/pph/
Poly. Phen • http: //genetics. bwh. harvard. edu/pph/
SNPEffect http: //snpeffect. vib. be/search. php
SNPSeek
Search for BRCA 1
Search for BRCA 1
Search for BRCA 1
Search for BRCA 1
SNP 500 Cancer Database
SNP 500 vs HDP
Test if SNP 500 and HDP differ
Do subpopulations differ?
Compare Caucasion vs African
Compare Caucasian vs Hispanic
Test whether in HWE
HWE
HWE
TDT
TDT
Hap. Map • Polymorphisms identified by Hap. Map are likely to be neural in phenotypic effect but can inform on nearby alleles that might play a role in disease.
Haplotype • SNP alleles tend to be correlated together in a predictable way-known as haplotype. – The linear, LD ordered arrangement of alleles on a chromosome • The correlation between SNPs is mediated by linkage disequilibrium (LD). – LD exists when alleles at distinctive loci occur together more frequently than expected given the known allele frequencies and recombination fraction between the loci.
Disease allele and haplotypes • In the presence of LD, polymorphisms that are in physical proximity to a causal polymorphism will show a difference between cases and controls.
Hap. Map • Three phases: I, III • I: completed in October 2005 -genotyping of 1 M SNPs at average spacing of 5 kb. An additional SNP finding in 48 samples from original populations across 10 specific 500 kb ENCODE regions (represent a genome wide rage of evolutionary conservation and gene density). Later this was extended to 269 samples.
Hap. Map • Phase II: 269 samples, 2. 9 M SNPs were genotyped, a total of 3. 9 M. • Phase III: other populations will be added.
Results of Phase I and Phase II • Intensity of SNP data across ENCODE regions 1 SNP/279 bp. • Intensity of phase II Hapmap 1 SNP/kb
Robust measures of LD • D’ and r 2 are the two major measures of LD. • D’, if two SNPs have not been separated by recombination during the history of the sample D’ is 1. • R 2 is the correlation between two SNPs; when two SNPs always observed together r 2 is 1. Generally is a better measure.
Linkage Studies • Family-based approaches to identify a disease gene. • A disease gene segregates in a family, genomic markers in close proximity to the disease will segregate in the same manner due to lack of recombination. – Identify families with disease; genotype each individual. – Compare the marker allele and disease distributions within the family. Assign a LOD score.
Linkage studies • Genome wide scans for linkage analysis performed using several hundred microsatellites at a 10 c. M density throughout genome. • SNP-based linkage studies use a panel of 10000 SNPs.
Examples • Multiple sclerosis • Neonatal diabetes • Familial glucocorticoid deficiency.
- Slides: 54