Computational methods for the analysis of rare variants
- Slides: 32
Computational methods for the analysis of rare variants Shamil Sunyaev Harvard-M. I. T. Health Sciences & Technology Division
Combine all non-synonymous variants in a single test Theory: 1) Most new missense mutations are functional (mutagenesis, population genetics, comparative genomics) 2) Most new missense mutations are only weakly deleterious (population genetics) 3) Most functional missense mutations are likely to influence phenotype in the same direction (mutagenesis, medical genetics) Data: multiple candidate gene studies HDL-C, LDL-C, Triglycerides, BMI, Blood pressure, Colorectal adenomas Kryukov et al. , PNAS 2009
Combining variants in a single test Disease Control
Combining variants in a single test Disease Sequencing errors Control
Combining variants in a single test Disease Functional variants Control Neutral variants
We should focus on functionally significant variation Assign a genotypic score to each gene (or pathway) in each individual in the study. Genotypic scores take into account: • The probability that variation is real • The probability that variation is functionally significant
Most tests can be generalized to using genotypic scores Disease Control Score 1 Score 2 Score 3 For quantitative traits we regress trait values on genotypic scores Software prototypes exist
How do we know that the variant is real and functional? Probability that the variant is real is provided as part of sequencing quality assessment pipeline.
How do we know that the variant is real and functional? Population genetics Bioinformatics Probability that the variant is functional
Most functional mutations are under selective pressure even if the trait is not
Probability that a variant is functionally significant given its allele frequency However, this dependence is not robust with respect to s 0!
“Goldilocks” alleles • Special case in terms of study design: alleles of large effect that are frequent enough to be followed up individually in a larger population sample. • Such “goldilocks” alleles are observed in the simulations. There is no optimal and robust weighting scheme or optimal threshold!
Variable threshold (VT) approach
Variable threshold (VT) approach
Variable threshold (VT) approach
Variable threshold (VT) approach
Variable threshold (VT) approach
Z-score Variable Threshold (VT) approach max data permutations max Allele frequency z(T) is the z-score of a regression across samples of phenotypes vs. counts of alleles with frequency below threshold T. We maximize z(T) over T. Type I error is controlled by permutations.
Allelic age is informative even conditionally on frequency
Intuition behind the effect Allelic age can be measured by LD decay
Bioinformatics predictions
Does the mutation fit the pattern of past evolution? A human VVSTADLCAPSSTKLDER dog FVSTSELCAGSTTRLEER A fish FLSTSELCVPSTLKVNEK V Statistical issues: -sequences are related by phylogeny -generally, we have too few sequences
Does the mutation fit the pattern of past evolution? • • • We assume a constant fitness landscape: what is good for fish is good for human! We can estimate whether the mutation fits the pattern of amino acid changes. We can also estimate rate of evolution at the amino acid site
Predictions based on protein structure • Most of pathogenic mutations are important for stability (good news? ). • DDG is difficult to estimate. • Unfolded protein response pathway has to be taken into account. • Heuristic structural parameters help but less than comparative genomics.
Poly. Phen-2 www. genetics. bwh. harvard. edu/pph 2 Adzhubei, et al. Nature Methods 2010
Compensatory mutations
Incorporation of Poly. Phen-2 scores into VT-test Kumar S et al. Genome Research 2009 We incorporated weights approximating these distributions into the test for alleles with frequency below 1% Price, Kryukov et al. , AJHG 2010 (accepted)
This is a general approach • Prediction scores can be easily incorporated into other tests such as WSS, CMC, RVE etc. • Other available prediction methods include SIFT, Pmut, SNAP, SNPs 3 D etc.
We are likely to be underpowered to detect the effect of individual genes on traits • Combining signal from multiple genes can dramatically increase power • Although we do not know the right pathways, we can attempt constructing them automatically
SNIPE method http: //string. embl. de/
SNIPE method http: //string. embl. de/
Acknowledgments The lab: Gregory Kryukov, Alex Shpunt, Adam Kiezun, Ivan Adzhubei, Saurabh Asthana, Victor Spirin, Steffen Schmidt, David Nusinow, Daniel Jordan HSPH, BWH, MGH Lee-Jen Wei, Alkes Price, Paul de Bakker, Shaun Purcell
- Computational methods in plasma physics
- Variants of judaism
- Hemoglobin ranges
- Efficient variants of the icp algorithm
- Efficient variants of the icp algorithm
- Parallel random access memory
- Variants of english language
- Copy number variants
- F35 variants
- Indirect wax pattern
- Fish thieves take rare seals’ prey
- Corioretinita
- La marge sur coût variable par unité de ressource rare
- Croquis l'eau en espagne une ressource rare sous pression
- Label the different types of neuronal pools in the figure.
- Fingerprint principles
- Which group is alkali earth metals
- Animals in moldova
- Tout age porte ses fruits
- Basic concept of strategic management
- Rare event rule statistics
- Rare event rule for inferential statistics
- Yoko ono salvador dali
- Types of autism
- Slidetodoc.com
- Students from a journalism class ask only
- Rare hernias
- Sample size for rare events
- Why are big predatory animals rare
- Rare plaatsnamen nederland
- Rare variant
- Divisione in sillabe cielo
- Ron lifshitz