A multistrain highresolution mouse haplotype map reveals three

  • Slides: 22
Download presentation
A multi-strain, high-resolution mouse haplotype map reveals three distinctive genetic signatures Laboratory of Population

A multi-strain, high-resolution mouse haplotype map reveals three distinctive genetic signatures Laboratory of Population Genetics

Motivations 1. An accurate high-resolution haplotype map of the mouse genome enables prioritization of

Motivations 1. An accurate high-resolution haplotype map of the mouse genome enables prioritization of QTL candidate genes 2. Different haplotype block structures have been reported in different studies • • • >10 MB block size in GNF study (Wiltshire et al, PNAS 2003) 1. 0 -2. 0 Mb block size in WI study (Wade et al, Nature 2002) 100 -150 kb block size in a 8 MB region chr 19 (Park et al, Genome Research 2003) 3. Analysis of a 10 Mb region on chromosome 7 using the Celera mouse SNPs reveals a different genetic variation pattern 4. Celera mouse chromosome 16 SNP data are publicly available Laboratory of Population Genetics

Objectives 1. Develop an integrated, high resolution, multi-strain mouse haplotype map 2. Compare the

Objectives 1. Develop an integrated, high resolution, multi-strain mouse haplotype map 2. Compare the haplotype structure derived from highdensity SNPs with those derived from low density markers 3. Perform experimental validation in regions of conflict and in regions of interest across 20 inbred strains 4. Analyze biological factors that have contributed to the formation mouse genetic variation patterns Laboratory of Population Genetics

Data Sources 1. Chromosome 16 reference sequence • MGSCv 3 (NCBI build 30, Feb.

Data Sources 1. Chromosome 16 reference sequence • MGSCv 3 (NCBI build 30, Feb. 2003) 2. SNP Data Laboratory of Population Genetics

Construction of Multi-Strain Haplotype Blocks with High Density SNP Markers Method • • •

Construction of Multi-Strain Haplotype Blocks with High Density SNP Markers Method • • • Greedy algorithm that starts with two-haplotype per block Seed: a minimum of two adjacent SNPs with no-ambiguity in haplotype assignment Singleton SNP that breaks the two-haplotype configuration does not affect block extension Results • • 2, 083 blocks 65, 068 (95% ) Celera SNPs in 5 laboratory inbred strains. Laboratory of Population Genetics

Distribution of Haplotype Block Size Laboratory of Population Genetics

Distribution of Haplotype Block Size Laboratory of Population Genetics

Blocks with Different Size Have Similar SNP Density Distribution Laboratory of Population Genetics

Blocks with Different Size Have Similar SNP Density Distribution Laboratory of Population Genetics

A 2. 4 -Mb Haplotype Block with Varying SNP Density Haplotype DBA/2 J A/J

A 2. 4 -Mb Haplotype Block with Varying SNP Density Haplotype DBA/2 J A/J 129 X 1/Sv. J 129 S 1/Sv. Im. J C 57 BL/6 J #SNP/10 kb 0 #SNP/10 kb B 6 Allele Non-B 6 Allele 400000 >20 11 -20 800000 6 -10 2 -5 1200000 1 1600000 2000000 2400000 0 SNP Experimental Validation 374 SNPs over 2. 4 Mb. Avg Density=0. 156/kb. 153 of which were in hotspots (red and orange)

A 2. 4 -Mb Region with High SNP Density but Heterogeneous Variation Pattern (Erosion)

A 2. 4 -Mb Region with High SNP Density but Heterogeneous Variation Pattern (Erosion) Antaxin 2 binding protein 1 (nucleic acid binding, RNA binding) DBA/2 J A/J 129 X 1 129 S 1 B 6 #SNP/10 kb B 6 Allele Non-B 6 Allele >20 11 -20 6 -10 2 -5 1 0 Missing Data Laboratory of Population Genetics

Details of Haplotype Erosion Across 160 KB Location • 5, 721, 639 -5, 878,

Details of Haplotype Erosion Across 160 KB Location • 5, 721, 639 -5, 878, 633 bp on chr 16 Blocks • 179 SNPs in 14 blocks with the major pattern • 116 SNPs in 19 blocks with the other patterns • 49 Singleton SNPs DBA/2 J A/J 129 X 1/Sv. J 129 S 1/Sv. Im. J C 57 BL/6 J SNP Density Laboratory of Population Genetics

Other Heterogeneous Haplotype Patterns 2) Segmentation DBA/2 J A/J 129 X 1 129 S

Other Heterogeneous Haplotype Patterns 2) Segmentation DBA/2 J A/J 129 X 1 129 S 1 C 57 BL/6 J SNP Density Laboratory of Population Genetics

Other Heterogeneous Haplotype Patterns 3) Segmentation with Erosion 4) Random Laboratory of Population Genetics

Other Heterogeneous Haplotype Patterns 3) Segmentation with Erosion 4) Random Laboratory of Population Genetics

Three Major Variation Patterns 1. SNP Deserts: >1 Mb with <0. 5 SNP per

Three Major Variation Patterns 1. SNP Deserts: >1 Mb with <0. 5 SNP per 10 kb 2. Large Blocks: >300 kb “melded” haplotype blocks with consistent variation patterns 3. Block Breakers: regions with heterogeneous variation patterns

Predictive Power of Haplotype Structures • Test the ability to use the haplotype structure

Predictive Power of Haplotype Structures • Test the ability to use the haplotype structure in one study to predict allelic variations in another study • Our Haploytpe Blocks • 98% accuracy on WI B 6/129 S 1 SNPs that do not overlap with Celera SNPs • 92% accuracy on GNF B 6/129 S 1/AJ/DBA haplotypes • WI B 6/129 S 1 Haplotype Blocks • 74% accuracy on Celera B 6/129 S 1 genotypes • 85% accuracy on GNF B 6/129 S 1 genotypes • 80% GNF markers are non-polymorphic across inbred strains used in Celera and WI shotgun sequencing Laboratory of Population Genetics

SNP Deserts in Chromosome 16 • 6 >1 Mb SNP deserts in the five

SNP Deserts in Chromosome 16 • 6 >1 Mb SNP deserts in the five inbred strains used for Celera shotgun sequencing • All 6 SNP deserts overlap with WI SNP deserts conserved across all WI strains • 0. 21% WI B 6/Sv. J SNPs in our SNP deserts • 0. 97% WI all SNPs in our SNP deserts • 5 out of the 6 deserts have at least one end as part of large haplotype blocks • SNP deserts are not genetically homogeneous • There are STRP polymorphisms • There are indel polymorphisms Laboratory of Population Genetics

Validation of a SNP Desert 000010000000000000000000000000000000000000000000000000000000000 00 1000 11111 01 0000000000000000000000000000000000000000000000000000000000000 00 1110100011111111111111111111111111111111111111111111 011111101111101

Validation of a SNP Desert 000010000000000000000000000000000000000000000000000000000000000 00 1000 11111 01 0000000000000000000000000000000000000000000000000000000000000 00 1110100011111111111111111111111111111111111111111111 011111101111101 000111 N 0101 11 1111111101111111 NNNNNN 1111111111111 01111111011111 NN 011111 N 10001111111111111111111 0 110001 • • • 11 Other Lab inbred B 6, AKRJ Skive Czech 5 STRPs and 1 SNP in a 15 kb SNP desert in all laboratory inbred strains The STRPs and the 1 SNP have the same variation pattern as the neighboring regions with high SNP density among the laboratory inbred strains Additional 120 SNPs discovered between the laboratory inbred strains and feral inbred strains Laboratory of Population Genetics

A Gene-Coding Region with Varying SNP Density WI SNPs down Celera SNPs UTR 3

A Gene-Coding Region with Varying SNP Density WI SNPs down Celera SNPs UTR 3 e 4 silent e 12 Mis-sense? ? • m. RNA sequence is MGC clone: from mammary tissues metastasized to lung • The 10 kb region is included in a 77 kb haplotype block with 44 SNPs • Variations in the m. RNA sequence do not overlap with WI and Celera SNPs • >=2 haplotypes in the regions? ? Laboratory of Population Genetics

Results of Experimental Validation >down hap 1 011110 129 s 1; 129 x 1;

Results of Experimental Validation >down hap 1 011110 129 s 1; 129 x 1; AJ; BALB; C 3 HHe; DBA hap 2 000000 AKRJ; C 57 BL hap 3 110001 Czech; Skive >UTR 3 /num_SNP=25 /num_strain=10 /num_hap=4 hap 1 1100000010000010 129 s 1; 129 x 1; AJ; BALB; C 3 HHe; DBA 2 J; hap 4 110001010011100 N 11000 AKRJ; hap 3 111111001110101 Czech; Skive; hap 2 0000000000000 C 57 BL; >NM_145481_e 12 /num_SNP=7 /num_strain=10 /num_hap=4 >e 4 hap 1 0000100 129 s 1; 129 x 1; AJ; BALB; hap 1 0 Others hap 2 0010001 AKRJ; hap 2 1 Czech; Skive hap 3 0000000 C 3 HHe; C 57 BL; DBA 2 J; hap 4 1101011 Czech; Skive; Silent SNP validates Mis-sense SNP does not validate

Mouse c. SNPs • Synonymous: 185 • Non-synonymous: 100 Laboratory of Population Genetics

Mouse c. SNPs • Synonymous: 185 • Non-synonymous: 100 Laboratory of Population Genetics

Haplotype Diversity in 43 Target Regions Assayed by 94 Amplicons Laboratory of Population Genetics

Haplotype Diversity in 43 Target Regions Assayed by 94 Amplicons Laboratory of Population Genetics

Conclusions • We have compiled an accurate, multi-strain, high-resolution haplotype map for mouse chromosome

Conclusions • We have compiled an accurate, multi-strain, high-resolution haplotype map for mouse chromosome 16 • We have discovered three distinctive genetic variation patterns for laboratory inbred mouse: SNP deserts, large blocks and block breakers • Large haplotype blocks may consist regions with varying SNP density • Selection in inbreeding may have an effect on SNP distribution in protein coding regions as well as SNP rate in gene coding regions • Our method is scalable for whole-genome analysis Laboratory of Population Genetics

Acknowledgement Laboratory of Population Genetics • Ken Buetow • Kent Hunter • Michael Gandolph

Acknowledgement Laboratory of Population Genetics • Ken Buetow • Kent Hunter • Michael Gandolph • Bill Rowe • Michael Edmonson • Jenny Kelly University of Wisconsin • Rob Williams Laboratory of Population Genetics