EVOLUTION OF GLOBINS Evolution of Globins Evolution of
EVOLUTION OF GLOBINS Evolution of Globins Evolution of visual pigments and related molecules
Evolution of gene clusters • Many genes occur as multigene families (e. g. , actin, tubulin, globins, Hox) – Inference is that they evolved from a common ancestor – Families can be • clustered - nearby on chromosomes (αglobins, Hox. A) • Dispersed – on various chromosomes (actin, tubulin) • Both – related clusters on different chromosomes (α, β-globins, Hox. A, B, C, D) – Members of clusters may show stage or tissue-specific expression • Implies means for coregulation as well as individual regulation
Evolution of gene clusters • multigene families (contd) – Gene number tends to increase with evolutionary complexity • Globin genes increase in number from primitive fish to humans – Clusters evolve by duplication and divergence
• History of gene families can be traced by comparing sequences – Molecular clock model holds that rate of change within a group is relatively constant • Not totally accurate – check rat genome sequence paper – Distance between related sequences combined with clock leads to inference about when duplication took place
Classic phylogenetic studies of sequence conservation: the globins The globins are the best studied family in terms of sequence conservation, partly because they were one of the first families for which multiple members were sequenced, and partly because some of the earliest protein structures (in fact, the earliest) solved were globins. The classic papers of Perutz, Kendrew and Watson were the first to correlate sequence conservation with aspects of protein structure and function. They drew their conclusion based on only a few aligned sequences. Later globin studies, such as that of Bashford, Chothia and Lesk, expanded the analyses of globin sequence conservation to include hundreds of sequences. Perutz, Kendrew & Watson J Mol Biol 13, 669 (1965) Bashford, Chothia & Lesk J Mol Biol 196, 199 (1987) Scapharca inaequivalvis oxygenated hemoglobin
Conservation of functional residues There were only 2 perfectly conserved residues among the 8 known globin structures at the time of the Bashford et al study. These are residues critical in binding of heme and/or interaction w/heme-bound oxygen. It will often be found that the best conserved residues in related Phe 43 proteins are those involved in heme critical aspects of the general function. His 87 Residues involved in more specific aspects of function may or may not be conserved, depending upon the relationship between the proteins under consideration. For example, residues involved in substrate specificity for serine proteases may be conserved among orthologs, such as the chymotrypsins, but not between paralogs, such as chymotrypsins and trypsins.
Conservation at buried positions • core residues, which are usually hydrophobic, often tolerate conservative substitutions, i. e. to other hydrophobics • overall core volume is well-conserved (Lim & Ptitsyn, 1970) though individual core positions tolerate variation in volume • this reflects what we know about packing and the effects of core mutations on stability--thus sequence conservation is partly related to maintaining a stable structure portion of alignment of prokaryotic and eukaryotic globins Y 140 yellow = small neutral/polar green = hydrophobic red/pink = polar/acidic blue = basic buried H 156 human hemoglobin beta chain
Conservation at solvent-exposed positions • solvent-exposed (surface) positions are mutable and usually tolerate mutation to many residue types including hydrophobics. Bashford et al. , however, noted that for globins at least, some surface positions do not tolerate large hydrophobics. Since polar-to-hydrophobic mutations on protein surfaces do not reduce stability, this conservation could reflect constraints on solubility. Indeed, it is clear that the overall polar character of the surface is conserved for soluble, globular proteins, even though a certain number of hydrophobics may be tolerated. Y 140 yellow = small neutral/polar green = hydrophobic red/pink = polar/acidic blue = basic examples of surface residues H 156 human hemoglobin beta chain
Conservation of loops and turns • “Spacer” regions between secondary structures, such as loops and turns, are often hypermutable and vary not only in sequence but in length, tolerating insertion and deletion events (Insertions and deletions are much less often found within secondary structure elements. Why? ) part of alignment of animal hemoglobin a and b chains human a chain Are the a and b chains related to each other by paralogy or orthology?
Sequence identity and homology: poor coverage the two proteins have the same fold, both bind heme and oxygen in same place: good independent structural/functional evidence for homology. . . Yet alignments of their sequences reveal only 24% identity. There also many examples of related globins and other proteins with much lower identity than this. 1 MBO and 1 HBB hemoglobin and myoglobin Any reasonable sequence identity criterion, whether it is a flat percent cutoff or a length-dependent cutoff, will give incomplete coverage--in other words, it will fail to identify many distant but true relationships.
Evolutionary analysis: one step into the a priori predicti Synonymous Consensus Seq 1 Seq 2 Seq 3 Consensus: AAT GGC TCT TTT GAA AAA. . . N Seq 4 Seq 5 Seq 6 G F F N K . Seq 2: AAC GGA TGT TTC GAG AAA. . . N Seq 7 Seq 8 Seq 9 Seq 10 Seq 11 G C F E K . Non-synonymous Neutrally fixed Number of individuals Positive selection Purifying selection E Number of mutations AAT GGC TGT TTT GAA AAA. . . N G C F N K .
Neutral evolution vs selection Non-synonymous nucleotide substitution Amino acid replacements changes Protein function or structure Neutral Theory of molecular evolution Purifying selection Amino acid changes Neutrality Positive selection Biological fitness (W)
Measuring the strength of selection =1 <1 >1 Neutrality Purifying selection Positive selection
Two ways of testing the functional importance of peptide regions Experimental (Functional Biologists) Predictive (Evolutionary Biologists) Serial deletions and random directed Evolutionary and structural analysis mutagenesis Consensus: AAT GGC TCT TTT GAA AAA. . . N G F F N K . Seq 2: AAC GGA TGT TTC GAG AAA. . . N G C F E K .
Methods to detect adaptive evolution using DNA divergence data A B Maximum-likelihood models Multiple alignment Kimura-based models Sq 1: . . . ATGGGCGTC. . . Sq 2: . . . ATGGACGTA. . . A 1 B 1 Sq 3: . . . ATGGGAGAG. . . Sq 4: . . . ATGAGCGTC. . . Models to detect adaptive evolution at single codon sites Parsimony method to detect Selection at single sites Tree A 2 b Models to detect adaptive evolution at specific lineages of the tree Sq 3 6 1 a 2 4 B 3 A 4 Tree b Sliding-window based Methods Sq 4 A 3 5 B 2 Sq 1 Sq 2 a Sq 1 Sq 2 Sq 1: . . . ATGGGCGTC. . . Sq 2: . . . ATGGACGTA. . . Sq 4 Sq 3: . . . ATGGGAGAG. . . Sq 3 Sq 4: . . . ATGAGCGTC. . . Tree 5 b 6 1 a 2 Sq 1. . . ATGGGCGTC. . . ATGGACGTA. . . Sq 2 Sq 4 . . . ATGGGAGAG. . . Sq 3 . . . ATGAGCGTC. . .
Different levels of protein’s function and evolution Intra-molecular coevolution Inter-protein/gene coevolution Tully and Fares (2006) Evol. Bioinf. Co-evolution/interaction between two different biological systems
Covariation analysis Substitution patterns at different positions in a sequence alignment are not necessarily independent. This is sometimes referred to as covariation or correlated evolution. name A B C D sequence YADLGRIKS YSDLGSEKE IDDFGEIAA IDDFGVIGT For example, in the mini multiple alignment shown at left, the identity of the residue at the 4 th position is correlated to the identity of the residue at the 1 st position. A statistical perturbation analysis can be used to characterize this covariation. An alignment of related sequences is “perturbed” by only considering sequences at which, for example, the first position is Y. The effect of this perturbation on the residue distribution observed at other positions is then measured. If the distribution changes significantly, covariation between sequence changes at the first site and other sites in the alignment is inferred.
Covariation and hydrophobic core packing The hydrophobic core residues in related proteins tend to be covariant due to constraints on core packing. One sees compensatory volume changes at different positions. Davidson and coworkers found that for 266 aligned SH 3 domain sequences, the strongest covariation was observed for a cluster of central hydrophobic residues. For example, substitution of a smaller residue (Ala->Gly) at 39 was strongly correlated to substitution of a larger residue (Ile->Phe) at 50. Hydrophobic core of SH 3 domains, with most frequently covarying residues shown in yellow S. M. Larson, A. A. Di. Nardo and A. R. Davidson, J Mol Biol 303, 433 (2000)
Some recent studies (Suel et al) have suggested a connection between covarying clusters of residues and transduction of signals between distant sites in proteins. For example, G-protein coupled receptors bind a ligand on one side of a membrane, and then transduce that signal to the other side through conformational change. Suel et al showed that the main clusters of covarying residues tended to connect the ligand Gprotein binding sites. ligand covarying networks (brown) membrane G-protein binding sites Suel et al. Nat Struct Biol 2003
A novel method to detect co-evolution in protein-coding genes (Fares and Travers, Genetics 2006) AAMWCGPCPNDEE AAMWCGPCPNDEE CAMCCGMCMNDEE CAMCCGMCMNDEE CAMDCGACANDEE CAMDCGACANDEE AAMMCGCCCNDEE AAMMCGCCCNDEE (q ek )ij = æç Bek x 1 ö÷ è t øij (qek )ij æ 1ö = ç Bek x ÷ t øij è Testing the significance of the correlation coefficient 1 1000 å (R ), 1000 i =1 r -r "r ³ 0. 95 » Z = i P( r i > 0. 95) = s (r ) q. A = 1 T å (qek )S T S =1 q. B = 1 T å (qek )S T S =1 [ Dˆ ek = (q ek )ij -q A ] 2 [ Dˆ ek = (q ek )ij -q B AAMWCGPCPNDEE CAMCCGMCMNDEE CAMDCGACANDEE AAMMCGCCCNDEE [ 1 T DA = å (q ek )S - q A T S =1 2 ] [ ] 2 1 T DB = å (q ek )S - q B T S =1 2 ]
Clade 1 > 75% Sequence alignment Clade 2 > 75% 3 D Tree Molecular co-evolution analyses: CAPS (Fares and Mc. Nally, Bioinformatics 2006) Collate results from ‘re-sampling’ and ‘real’ data and sort by Calculate probabilities of R-values applying the step-down permutational correction Identify groups of co-evolving pairs with P > 0. 95 Re-sampling 1 = 0. 1 2 = 0. 15 3 = 0. 35. . . i = 0. 40 i+1 = 0. 55. . N-1 = 0. 98 N = 0. 99 Real 1 = 0. 55 2 = 0. 98 Flow of information in CAPS
TRUE POSITIVES SENSITIVITY Comparative analysis of sensitivities DISTANCE MICK Dependency CAPS ln. LCorr
Divergence CAPS Mean Sensitivity 100 90 80 70 60 50 40 30 20 10 0 MICK Dep. Ln. LCorr 0. 1 CAPS MICK DEPENDENCY ln. LCorr 0. 5 1 Distance Mean Sensitivity 100 90 80 70 60 50 40 30 20 10 0 0. 2 n. sequence CAPS MICK Dep. 10 20 30 Number of Sequences Ln. LCorr
Three-dimensional spheres to detect protein interfaces Co-evolving amino acid sites Spheres of 4Å radius Highly conserved sites at overlapping areas Co-evolving Amino acids share properties of hydrophobicity and molecular weight Protein-protein interfaces could be predicted with greater accuracy
- Slides: 24