Using phylogenetic profiles to predict protein function and

  • Slides: 19
Download presentation
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso

Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso

Papers Pellegrini, et al. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles.

Papers Pellegrini, et al. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. (1999) PNAS 96, 42854288. l Marcotte, et al. Localizing proteins in the cell from their phylogenetic profiles. (2000) PNAS 97, 12115 -12120. l

Basic Idea: l Sequence alignment is a good way to infer protein function, when

Basic Idea: l Sequence alignment is a good way to infer protein function, when two proteins do the exact same thing in two different organisms. l Proteins with > 30% sequence identity have the same fold, and typically the same function.

Basic Idea: l But can we decide if two proteins function in the same

Basic Idea: l But can we decide if two proteins function in the same pathway, such as histidine biosynthesis, or the same biomolecular structure, such as the flagella or ribosome, even if they don’t do the exact same thing? l Yes. Assume that if the two proteins function together they must evolve in a correlated fashion: so every organism that has a homolog of one of the proteins must also have a homolog of the other protein.

Phylogenetic Profile l For a given protein, BLAST against N sequenced genomes. l Construct

Phylogenetic Profile l For a given protein, BLAST against N sequenced genomes. l Construct a vector with N coordinates. l If protein has a homolog in the organism n, set coordinate n to 1. Otherwise set it to 0. Protein P 1: 0 0 1 1 0 0

Functional Link l Assign a degree of functional linkage between P 1 and P

Functional Link l Assign a degree of functional linkage between P 1 and P 2 based on the number of positions (or bits) at which their profiles differ. Protein P 1: 0 0 1 1 0 0 Protein P 2: 0 1 1 0 0

What They Did: l Computed phylogenetic profiles for 4, 290 proteins in E. Coli.

What They Did: l Computed phylogenetic profiles for 4, 290 proteins in E. Coli. l Aligned each protein sequence Pi with the proteins from 16 other fully sequenced genomes. l Proteins coded by genome n are defined as including a homolog of Pi if they align to Pi with a score that is deemed statistically significant.

Conclusions Comparing profiles is useful tool for identifying the complex or pathway in which

Conclusions Comparing profiles is useful tool for identifying the complex or pathway in which a protein participates. l As the number of fully sequenced genomes increases scientists will be able to construct longer more informative profiles. l In 1999, 100 more genomes were due to be completed in next few months. l Suggests that as eukaryotic genomes come out profiles will be a useful tool for studying pathways in higher organisms. l

Evolutionary Origin of Eukaryotic Cell l Mitochondria, chloroplasts and perhaps other organelles descended from

Evolutionary Origin of Eukaryotic Cell l Mitochondria, chloroplasts and perhaps other organelles descended from microbes captured by progenitors of eukaryotic cells. l You exist because of a bad case of indigestion!

Evolutionary Origin of Eukaryotic Cell l This endosymbiosis was stabilized by shifting of genes

Evolutionary Origin of Eukaryotic Cell l This endosymbiosis was stabilized by shifting of genes of organelle into nuclear genome and transport systems being established to shuttle organellar proteins form cytoplasm into organelles. l Contemporary mitochondrial genome encode only a few genes (<20), primarily large integral membrane proteins which can’t be transported.

Evidence l Proteins of these organelles have molecular properties resembling prokaryotic rather than eukaryotic

Evidence l Proteins of these organelles have molecular properties resembling prokaryotic rather than eukaryotic proteins: 1. Average lengths 2. Domain composition 3. Amino acid composition 4. Homologs among prokaryotes

Phylogenetic profiles l Will show that proteins with similar phylogenetic profiles localize to similar

Phylogenetic profiles l Will show that proteins with similar phylogenetic profiles localize to similar subcellular locations. l Actually, will primarily show this for the mitochondria.

Calculating phylogenetic profiles l In this study, the value at each position of the

Calculating phylogenetic profiles l In this study, the value at each position of the profile is equal to -1/log E, where E is the BLAST expectation value of best matching protein in a genome. l Calculated only for E < 1 x 10 -6 and 1. 0 otherwise. So zero is a perfect match and one is no match.

Three Categories l Prokaryote Derived: Only has homologs in prokaryotes. l Eukaryote Derived: Only

Three Categories l Prokaryote Derived: Only has homologs in prokaryotes. l Eukaryote Derived: Only has homologs in eukaryotes. l Organism Specific: Has no homologs. Why split these categories? Should have different functions and roles in mitochondria.

Linear Discriminant Functions t MP Non-MP Varying t increases prediction accuracy at the expense

Linear Discriminant Functions t MP Non-MP Varying t increases prediction accuracy at the expense of coverage.

Testing Algorithm l First, predicted the location of yeast proteins of known location (open

Testing Algorithm l First, predicted the location of yeast proteins of known location (open diamonds). l Second, a jackknife test was performed. Repeated 100 times with different random sets (filled diamonds). Coverage 58% at 50% accuracy. l Third, used yeast proteins as training set and worm proteins as test set. Coverage 65% at 50% accuracy.

Prediction l Applied algorithm to all yeast proteins. Estimate ~630 total mitochondriontargeted genes in

Prediction l Applied algorithm to all yeast proteins. Estimate ~630 total mitochondriontargeted genes in yeast or 10% of genome. l Applied algorithm to all worm proteins. Estimate ~660 total mitochondriontargeted genes in worms of 4% of genome.

Verifications Tested whether functions of newly predicted mitochondrial proteins matched functions of known mitochondrial

Verifications Tested whether functions of newly predicted mitochondrial proteins matched functions of known mitochondrial protein better than the functions of a random set of proteins. (Jacard Coefficient, Pie Charts) l Fraction of predicted mitochondrial proteins with predicted transmembrane segments or signal peptides. l 2 D gel of whole rat liver and human placental mitochondria reveals ~250 -350 visible proteins. l

Conclusions l There is information in the phylogenetic profiles, but it is quite noisy.

Conclusions l There is information in the phylogenetic profiles, but it is quite noisy. l Yields approximate numbers of genes migrated to the nuclear genomes from the mitochondria. l Gives even more evidence for endosymbiotic theory. l However, verifications did not confirm results as much as one might like. l Perhaps fundamental assumption flawed.