Encode variation analysis Analysis goals Quantify genetic variation

  • Slides: 15
Download presentation
Encode variation analysis

Encode variation analysis

Analysis goals • Quantify genetic variation in ENCODE regions • Detect selective constraint in

Analysis goals • Quantify genetic variation in ENCODE regions • Detect selective constraint in ENCODE features • Develop rules for interpretation of functional variation • Motivate experiments to test functional variation

Data • • Encode SNPs (Hap. Map resequencing) 5 k. B Hap. Map SNPs

Data • • Encode SNPs (Hap. Map resequencing) 5 k. B Hap. Map SNPs DIPs Gene expression variation

Metrics of variation • • Derived allele frequency spectrum (Manolis) Diversity/Het (Ewan) SNP density

Metrics of variation • • Derived allele frequency spectrum (Manolis) Diversity/Het (Ewan) SNP density (Ewan, others) DIP density (Jim, Taane) LD/Recombination (Daryl/Oxford) Regions of contiguous DNA without variation (Manolis) Accelerated (positively selected? ) regions (Manolis) Standard tests of neutrality Mc. Donald Kreitman/Tajima’s D etc (Mike, others) • Other non-parametric tests of selection (Andy) • Tagging (Paul)

Analysis plans Analysis wrt to genomic features • Calculate variability in a large number

Analysis plans Analysis wrt to genomic features • Calculate variability in a large number of genomic features with all metrics • Correlate variability metrics with “intensity” of feature (e. g. levels conservation with levels of variability) • Variation, alternative spicing and expression • Distance effects from genomic features • Association of gene expression with SNPs (some is in UCSC and some will be provided by Manolis at the workshop) Analysis independent of genomic features (in principle) • Tag SNPs and comparison of resequencing data to 5 Kb map. Here it will be a good idea to see how the 5 Kb map captures variation within genomic elements. If we really aim to capture variation mainly in functional genomic elements (e. g. known regulatory regions, or nonsym SNPs) how can we modify the tag algorithms? • General description of levels of variation wrt to the functional content of the 44 ENCODE regions

Ewan Birney Diversity in features av 2 pq/SNP av 2 pq/pos #snps 0. 15

Ewan Birney Diversity in features av 2 pq/SNP av 2 pq/pos #snps 0. 15 0. 16 0. 00045 0. 00041 856 737 Completely Rnd: 0. 16 0. 00045 1584 Exons : RRnd Exons 0. 14 0. 15 0. 00039 0. 00040 635 636 0. 16 0. 00042 16609 Promoters : Region Rnd 2 : Overall : :

Derived allele frequency spectrum CNS intersection P = 0. 003

Derived allele frequency spectrum CNS intersection P = 0. 003

Derived allele frequency spectrum Transfrags union P = 0. 204

Derived allele frequency spectrum Transfrags union P = 0. 204

Taane Clark Heterozygosity

Taane Clark Heterozygosity

Indels

Indels

Regions accelerated in humans

Regions accelerated in humans

Nuria Lopez selective constrains differ for genes expressed in different tissues

Nuria Lopez selective constrains differ for genes expressed in different tissues

Genes expressed in more tissues have more selective constrains (lower d. N)

Genes expressed in more tissues have more selective constrains (lower d. N)

Paul de Baker Tagging • ENCODE is near-complete inventory of common (MAF≥ 5%) sites

Paul de Baker Tagging • ENCODE is near-complete inventory of common (MAF≥ 5%) sites • How well do tag SNPs picked from thinned versions of ENCODE (to mimic ascertainment of Phase I and II) capture: – all common variants – functional sites

Coverage of common variants by tags picked from simulated Phase I and II Hap.

Coverage of common variants by tags picked from simulated Phase I and II Hap. Map