The International Hap Map Project a Rich Resource
- Slides: 52
The International Hap. Map Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The University of Tennessee Health Science Center jkrushka{at}utmem. edu
Hap. Map Population Samples Project launched in 2002 to provide a public resource for accelerating medical genetic research 270 Individuals from 4 Geographically Diverse Populations YRI: 90 Yorubans from Ibadan, Nigeria 30 parent-offspring trios CEU: 90 northern and western European-descent living in Utah, USA from the Centre d’Etude du Polymorphisme Humain (CEPH) collection 30 parent-offspring trios CHB: 45 unrelated Han Chinese from Beijing, China JPT: 45 unrelated Japanese from Tokyo, Japan http: //www. hapmap. org/ Hap. Map http: //www. genome. gov/page. cfm? page. ID=10001688 NHGRI
The International Hap. Map Project “…Determine the common patterns of DNA sequence variation in the human genome, by characterizing sequence variants, their frequencies, and correlations between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. ” Nature (2003) • • Population-specific sequence variation Allele frequencies Linkage disequilibrium patterns Haplotype information Tag SNPs Structural genome variation Better understanding of human population dynamics and of the history of human populations • Cell lines available from Coriell Inst. for Medical Research • A rich resource for biomedical genetic analysis
International Hap. Map Project Papers • The Int. Hap. Map Consortium. A second generation human haplotype map of over 3. 1 million SNPs. Nature 449, 851 -861. 2007 • The Int. Hap. Map Consortium. A Haplotype Map of the Human Genome. Nature 437, 1299 -1320. 2005 • The Int. Hap. Map Consortium. The International Hap. Map Project. Nature 426, 789 -796. . 2003 • The Int. Hap. Map Consortium. Integrating Ethics and Science in the International Hap. Map Project. Nature Reviews Genet 5, 467 -475. 2004 • Thorisson et al. The International Hap. Map Project Web site. Genome Res 15: 1591 -1593. 2005 Hap. Map-related papers • Sabeti et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913 -918. 2007. • Clark et al. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res, 15: 1496 -1502. 2005 • Clayton et al. Population structure, differential bias and genomic control in a large-scale, casecontrol association study. Nature Genet 37(11): 1243 -1246. 2005 • de Bakker et al. Efficiency and power in genetic association studies. Nature Genet, 37(11): 1217 -1223 2005 • Goldstein, Cavalleri. Genomics: Understanding human diversity. Nature 437: 1241 -1242. 2005. • Hinds et al. Whole genome patterns of common DNA variation in three human populations. Science 307: 1072 -1079. 2005. • Myers et al. A fine-scale map of recombination rates and hotspots across the human genome. Science, 310: 321 -324. 2005 • Nielsen R et al. Genomic scans for selective sweeps using SNP data. Genome Res 15: 1566 -1575. 2005 • Smith et al. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res 15: 1519 -1534. 2005 • Weir et al. Measures of human population structure show heterogeneity among genomic regions. Genome Res 15: 1468 -1476. 2005.
Nature (2003)
Human Chromosomes • Contain DNA • 22 pairs of autosomes + sex-chromosomes (X and Y) + mitochondrial genome • Contain functional units (genes) and other DNA Human genome sequence is available as a reference, as a result of the Human Genome Project A significant amount of inter-individual variation exists
Some Basic Definitions Locus - A site in the genome The DNA in the human genome is not a static entity. There are differences between different copies: Allele – a genetic variant, i. e. , a form (state) of a locus Mutation - a genetic change An individual carries two copies of each locus on autosomes Individual alleles are inherited from parents to offspring (1 from each parent) Genotype - A set of alleles an individual is carrying at a given locus
Chromosomes are sets of continuously linked genetic loci Example: Integrated map of chromosome 5 from the International Hap. Map Project, http: //www. hapmap. org
Genetic Variation • Some DNA loci vary among individuals • Linked genetic loci are inherited non-independently • Loci may change with time (mutation, selection, genetic drift) • Some DNA changes lead to quantitative changes in RNA expression and to quantitative or qualitative changes in protein production • Some genetic changes, even small, may lead to disease • A large amount of natural variation occurs in healthy individuals, i. e. , many changes are neutral • Loci genetically linked to the disease-causing locus can be used as genetic markers to search for the disease locus SNP 1 SNP 2 There are many types of DNA variation, e. g. Sequence variation AAAC/TGGCTA Microsatellite repeats …AATG AATG…
Polymorphic Site A locus with common DNA variation 2 alleles in a population Shows difference in DNA sequence among individuals In most definitions: the most common allele with frequency < 99%, or minor allele frequency (MAF) 1%, or MAF 2%, or at least two alleles have frequencies 1%. A rare allele that occurs in <1% of the population is usually non considered a polymorphic site.
SNP=Single Nucleotide Polymorphism A SNP locus on the distal end of the long arm of human chromosome 5 (data from Ensembl) SNP locus rs 6870660 http: //www. ensembl. org CAAATTCCATG[A or C]AGAAGGAAATACAT A and C are alleles at SNP locus rs 6870660
A SNP locus on the distal end of the long arm of chromosome 5 SNP locus rs 6870660 http: //www. hapmap. org
<> Regulatory Interactions: The ENCODE Project 2003 -Pilot project launched (1% of the genome) 2007 - Pilot project completed; production phase launched on the entire genome Production Scale Effort Pilot Scale Effort Data Coordination Center Technology Development Effort High-through-put experimental and computational approaches to studies of DNA regulatory sites, regulatory interactions, and DNA modification
Genome SNP Variation Size of human genome is 3. 2 109 bp 99. 9% identical 9 -10 mln SNPs may have MAF 5% 30, 000 genes Hap. Map SNP Density Coverage • Phase I (published in 2005) 1, 007, 329 SNPs that passed quality control 1 SNP / 3000 bp 11, 500 ns. SNP 10 ENCODE regions, 500 kb each The cumulative number of non 17, 944 SNPs redundant SNPs (each mapped to a single location in the genome) is 1 SNP / 279 bp shown as a solid line, as well as the • Phase II (published in 2007) number of SNPs validated by >3, 806, 000 SNPs genotyping (dotted line) and double 1 SNP / 875 bp hit status (dashed line). Years are 25 -30% of all SNPs with MAF 5% divided into quarters (Q 1–Q 4).
http: //www. hapmap. org/
SNP Differences among Individuals Far Exceed Differences among Populations Phase 1: Autosomes: Across the 1 million SNPs genotyped, only 11 have fixed differences between CEU and YRI, 21 between CEU and CHB/JPT, and 5 between YRI and CHB/JPT. X chromosome 123 SNPs were completely differentiated between YRI and CHB/JPT, but only 2 between CEU and YRI and 1 between CEU and CHB/JPT.
Haplotypes A haplotype is a set of alleles at multiple loci located on the same copy of the chromosome Genotype calls obtained from sequencing or DNA chip genotyping do not provide the information about which of the two chromosomal copies a particular allele belongs to. E. g. , genotypes for individual X: SNP# SNP A SNP B SNP C Genotypes A 1 A 2 A T B 1 B 2 T C C 1 C 2 G C Haplotype 1 Haplotype 2 Haplotypes A C C A 1 B 2 C 2 A 2 B 1 C 1 T T G
Recombination “Random” event Occurs during meiosis The larger the distance between loci or as more generations pass, the more likely recombination(s) will occur A 1 B 1 A 2 B 2 x A 2 A 1 B 2 B 1 A 2 Nonrecombinant Haplotypes B 2 A 1 Recombination (crossing-over) B 2 A 2 Recombinant Haplotypes B 1
Two ancestral chromosomes being scrambled through recombination over many generations to yield different descendant chromosomes. If an A allele on the ancestral chromosome increases the risk of a disease, the two individuals in the current generation who inherit that part of the ancestral chromosome will be at increased risk. Source: the International Hap. Map Project
Linkage Disequilibrium Associations among alleles at different loci A 1 B 1 A 2 B 2 Locus A Locus B Normalized disequilibrium coefficient Correlation coefficient D = Linkage disequilibrium coefficient Coefficient of association D=p. A 1 B 1 -p. A 1 p. B 1 D’=D/|D|max |D| max = | min(p. A 1 p. B 2, p. A 2 p. B 1)| -1 D’ 1 =D/ p. A 1 p. A 2 p. B 1 p. B 2 In case of no association, D=0 (linkage equilibrium) Practical implications in fine gene mapping: Search for locus B using association of marker loci with disease
The value of D decreases geometrically with each generation A B D(t)=(1 - ) D(t-1) D(t)=(1 - ) t. D(0) a b Unless the two loci are closely linked, the value of D should rapidly decrease to 0. The occurrence of association between two loci implies that they are closely linked.
Haplotype Maps Generated by The International Hap. Map Project 3 steps of construction the Hap. Map (a) SNPs are identified in DNA samples from multiple individuals. (b) Adjacent SNPs that are inherited together are compiled into haplotypes. (c)"Tag" SNPs are identified within haplotypes that uniquely describe those haplotypes. Source: The Hap. Map Project International
Haplotype Maps of the Human Genome Helmuth 2001, Science 293: 583 -585 Find correlations among groups of SNPs Haplotypes were inferred for the Hap. Map project from trios data and from unrelated individuals using Phase (Stephens 01; Stephens and Donnely 03)
Haplotype Maps of the Human Genome regions decomposed into discrete haplotype blocks, which capture similarity in haplotype organization Patil et al. 2001, Blocks of Limited Haplotype Diversity Revealed by High. Resolution Scanning of Human Chromosome 21. Science 294(5547): 1719 -23
Haplotype Block Partition Results for Three Populations 1, 586, 383 (SNPs) genotyped in 71 Americans of European, African, and Asian ancestry Population Blocks Average size, kb* Required SNPs African-American 235, 663 8. 8 570, 886 European-American 109, 913 20. 7 275, 960 Han Chinese 89, 994 25. 2 220, 809 * Average distance spanned by segregating sites in each block. Minimum number of SNPs required to distinguish common haplotype patterns with frequencies of 5% or higher. Hinds et al. 2005 Science
Hinds et al 2005 Extended LD bin and haplotype block structure around the CFTR gene. LD bins, where each bin has at least one SNP with r 2 > 0. 8 with every other SNP, are depicted as light horizontal bars, with the positions of constituent SNPs indicated by vertical tick marks as well as the extreme ends of the bars. Isolated SNPs are indicated by plain tick marks. Haplotype blocks, within which at least 80% of observed haplotypes could be grouped into common patterns with frequencies of at least 5%, are depicted as dark horizontal bars. Unlike haplotype blocks that are by design sequential and nonoverlapping, SNPs in one LD bin can be interdigitated with SNPs in multiple other overlapping bins Population differences in local bin structure Differences in allele and haplotype frequencies “Although analysis panels are characterized both by different haplotype frequencies and, to some extent, different combinations of alleles, both common and rare haplotypes are often shared across populations” (The Int. Hap. Map Project, Nature, 2005)
Tag SNP (ht. SNP) selection Pairwise LD-based and haploblock-based tagging methods Partition haplotypes into blocks Can use haplotype-based (haploblocks) or genotype-based (LD-blocks) partitioning Select representative ht. SNPs from each block Latest DNA microarrays aim to capture SNPs with r 2 0. 8 “Tags are the subset of variants genotyped in a disease study. SNPs that are not typed in the study but whose effect can be studied through LD with a tag are termed proxies. A tag with perfect correlation (r 2 = 1) to an untyped putative causal allele is termed a perfect proxy. ” De Bakker et al. , 2005
Tag SNP, Haplotypes, and LD The Int. Hap. Map Consortium, Nature, 2005
Use of Haplotypes in Association Analysis • Testing one marker at a time for associations is very timeconsuming • Problem of multiple testing • Testing individual SNPs, we are not utilizing information from other markers Benefits of Using Haplotypes • Haplotypes allow us to use information from multiple loci simultaneously • LD information between loci is captured
Benefits of Haplotype Analysis • Construct a single highly informative mega-locus from a number of less informative but closely linked loci • Identify genotyping or data entry errors. Likelihood ratio tests indicate which typings are more likely to be an error • Find boundaries of conserved haplotypes associated with a trait. • Employs recombinations from the entire history a population
Amount of Captured Sequence Variation in Hap. Map Phase II For common variants (MAF 0. 05) the mean maximum r 2 of any SNP to a typed one is 0. 90 in YRI, 0. 96 in CEU and 0. 95 in CHB /JPT. 1. 09 million SNPs capture all common Phase II SNPs with r 2 0. 8 in YRI. Very common SNPs with MAF 0. 25 are captured extremely well (mean maximum r 2 of 0. 93 in YRI to 0. 97 in CEU) Rarer SNPs with MAF, 0. 05 are less well covered (mean maximum r 2 of 0. 74 in CHB/JPT to 0. 76 in YRI).
Recombination Hot Spots
Structural Genome Variation Hap. Map samples are also used as a resource for CNV analysis • Large number of copy number variants (CNVs) and other genome rearrangements found among individuals • Some variation is assumed normal, other may cause disease • Genome databases, e. g. Database of Genomics Variants at the TCAG of the Toronto Hospital of Sick Children, the Copy Number Variation Project Map at the Sanger Center
• Segmental duplications are recombination hotspots, causing global genome rearrangements
Hap. Map Genome Browser
Perlegen Genotype Browser
UCSC Genome Browser http: //genome. ucsc. edu/
DNA Chips and Resequencing: High-through-put Analysis of Sequence Variation An easy way to access genome-wide variation Both Affymetrix and Illumina DNA chips contain representative SNP and CNV probes Affymetrix Gene. Chip 6. 0: 1. 8 million markers for genetic variation, including 906, 000 SNPs and 946, 000 copy number probes. Illumina 1 M Bead Chip and 1 M-duo Bead Chip: ~950, 000 genome-spanning tag SNPs; ~100, 000 additional non-Hap. Map SNPs, >565, 000 SNPs in and near coding regions such as ns. SNPs, promoter regions, 3’ and 5’ UTRs; dense coverage in ADME and MHC regions. ~260, 000 markers located in novel and reported copy number polymorphic regions. Sequenom mass arrays (based on Maldi-TOF)
Genome-Wide Association Select representative ht. SNPs from low diversity haplotype blocks Adjustment for multiple comparisons LD values highly variable: smoothing function needed Haplotypes in a sliding window OR screen for top SNPs likely functional SNPs in genes involved in pathways of interest
Use of Phase-Resolved Data in Association Analysis • Find association with haplotypes similar to analyses of individual SNP alleles; Need to consider multiple testing • Test for tendency of cases to ‘cluster’ around groups of ‘similar’ haplotypes • Extend log-linear approach to take haplotype structure into account Modifications also used for ambiguous phase
http: //www. genome. gov/26525384 As of 04/14/2008, GWAS of 150 traits posted
Special Thanks to • Ken Manly, whose presentation ideas for the Hap. Map module 2006 inspired and helped organized this presentation
- Map hap
- Hap farber
- Szte hap
- Hap python
- Hap introduction
- Hap griffin
- Hap ci
- Havadan gelir top gibi suda erir hap gibi
- Sơ đồ cơ thể người
- Hap
- George mason mha
- Resource leveling is the approach to even out the peaks of
- Perbedaan resource loading dan resource leveling
- International hrm definition
- International human resource management dowling 6th edition
- Laurent s-step-5 for true ihrm
- International resource services
- Resource requirements example
- Microsoft project server resource management
- Resource leveling
- Project human resource management pmbok ppt
- Pmbok human resource management
- Hrms.shanker group.com
- Dynamics 365 project resource hub
- Resource utilization definition in project management
- Oracle project resource management
- Resource loading chart example
- Resource and cost planning
- Resource leveling pmp
- Resource loading meaning
- Resource histogram vs responsibility assignment matrix
- Resource planning definition in project management
- Human resource management plan pmp
- Hát kết hợp bộ gõ cơ thể
- Frameset trong html5
- Bổ thể
- Tỉ lệ cơ thể trẻ em
- Chó sói
- Glasgow thang điểm
- Alleluia hat len nguoi oi
- Môn thể thao bắt đầu bằng chữ f
- Thế nào là hệ số cao nhất
- Các châu lục và đại dương trên thế giới
- Công thức tính thế năng
- Trời xanh đây là của chúng ta thể thơ
- Mật thư anh em như thể tay chân
- Phép trừ bù
- Phản ứng thế ankan
- Các châu lục và đại dương trên thế giới
- Thể thơ truyền thống
- Quá trình desamine hóa có thể tạo ra
- Một số thể thơ truyền thống
- Cái miệng bé xinh thế chỉ nói điều hay thôi