Translation elongation amino acid usage and codon usage

  • Slides: 30
Download presentation
Translation elongation, amino acid usage, and codon usage indices Xuhua Xia xxia@uottawa. ca http:

Translation elongation, amino acid usage, and codon usage indices Xuhua Xia xxia@uottawa. ca http: // dambe. bio. uottawa. ca

Objectives • Understand how amino acid and codon usage biases affect translation efficiency and

Objectives • Understand how amino acid and codon usage biases affect translation efficiency and gene expression • Biomedical and biopharmaceutical relevance – Protein drug production in pharmaceutical industry – Transgenic experiments in agriculture • Factors affecting amino acid and codon usage bias • Indices measuring codon usage bias • Develop bioinformatic skills to study the genomic codon usage. Xuhua Xia Slide 2

Energetic Cost Amino acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu

Energetic Cost Amino acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr 1 -letter code A C D E F G H I K L M N P Q R S T V W Y Precursor metabolites pyr 3 pg oaa _kg 2 pep, ery. P 3 pg pen. P pyr, oaa, pyr 2 pyr, ac. Co. A oaa, Cys, _pyr oaa _kg _kg 3 pg oaa 2 pyr 2 pep, ery. P, PRPP, _pyr ery. P, 2 pep ~P 1. 0 7. 3 1. 3 2. 7 13. 3 20. 3 4. 3 2. 7 9. 7 3. 3 3. 7 10. 7 2. 3 3. 3 2. 0 27. 7 13. 3 Energetic cost H Total ~P 5. 3 11. 7 8. 7 24. 7 5. 7 12. 7 6. 3 15. 3 19. 3 52. 0 4. 7 11. 7 9. 0 38. 3 14. 0 32. 3 13. 0 30. 3 12. 3 27. 3 12. 3 34. 3 5. 7 14. 7 8. 3 20. 3 6. 3 16. 3 8. 3 27. 3 4. 7 11. 7 7. 7 18. 7 10. 7 23. 3 74. 3 18. 3 50. 0 Hiroshi Akashi and Takashi Gojobori 2002, PNAS 99: 3695– 3700 Xuhua Xia Slide 3

Numerical Prediction: Usage of energetically expensive (and also rare) amino acids should decrease with

Numerical Prediction: Usage of energetically expensive (and also rare) amino acids should decrease with gene expression. Large ~P/Copy should be associated with small Num. Copy and small ~P/Copy should be associated with large Num. Copy. Xuhua Xia Slide 4

AA usage and t. RNA abundance Saccharomyces cerevisiae Salmonella typhymurium Xia, X. 1998. Genetics.

AA usage and t. RNA abundance Saccharomyces cerevisiae Salmonella typhymurium Xia, X. 1998. Genetics. 149: 37: 44 Slide 5

AA usage and t. RNA gene copies A 1800 S 2 AA Freq in

AA usage and t. RNA gene copies A 1800 S 2 AA Freq in 11 ss. DNA coliphages 1600 G T 1400 D I 1200 F 1000 800 L 2 P Y K L 1 N E V R 2 Q 600 H S 1 W C 400 200 R 1 y = 231. 88 x + 244. 93 r = 0. 8426 p<0. 0001 0 0 1 2 3 4 5 Number of t. RNA genes in E. coli Chithambaram, S. et al. 2014. Genetics: 197: 301 -315 6 7

Summary of AA usage • Energetic cost: mass-produced proteins should use cheap amino acids.

Summary of AA usage • Energetic cost: mass-produced proteins should use cheap amino acids. • Translation efficiency: mass-produced proteins should use abundant amino acids • Much-used amino acids should have a more t. RNA (in gene copies and in abundance) to carry them than little-used amino acids Xuhua Xia Slide 7

Codon Usage Bias • • Observation: Strongly biased codon usage in a variety of

Codon Usage Bias • • Observation: Strongly biased codon usage in a variety of species ranging from viruses, mitochondria, plastids, prokaryotes and eukaryotes. Hypotheses: – Differential mutation hypothesis, e. g. , Transcriptional hypothesis of codon usage (Xia 1996 Genetics 144: 1309 -1320 ) – Different selection hypothesis, e. g. , (Xia 1998 Genetics 149: 37 -44) • Predictions: – From mutation hypothesis: Concordance between codon usage and mutation pressure – From Selection hypothesis: • Concordance between differential availability of t. RNA and differential codon usage. • The concordance is stronger in highly expressed genes than lowly expressed genes (CAI is positively correlated with gene expression). Gene 1 Polycistronic m. RNA Ribosome Protein Gene 2 Gene 3 RNA polymerase GCC~t. RNA~Gly UCC~t. RNA~Gly Xuhua Xia UCC~t. RNA~Gly Slide 8

Codon usage of HEGs in yeast Xuhua Xia 2007. Bioinformatics and the cell. Slide

Codon usage of HEGs in yeast Xuhua Xia 2007. Bioinformatics and the cell. Slide 9

Major and minor codons • Major codon: the codon in a synonymous codon family

Major and minor codons • Major codon: the codon in a synonymous codon family that can be most efficiently translated in a species, typically with three associated properties: – it is over-represented in highly expressed genes relative to lowly expressed genes. – it corresponds to the most abundant t. RNA – replacing it with another codon leads to reduced translation efficiency (reduced protein production) • Minor codon is the opposite • Their identification is NOT based on the codon frequencies of all coding sequences in a species • Different species may have different major and minor codons in the same synonymous codon family. Xuhua Xia Slide 10

Calculation of RSCU and proportion: Different scaling. Xuhua Xia RSCU (Sharp et al. 1986)

Calculation of RSCU and proportion: Different scaling. Xuhua Xia RSCU (Sharp et al. 1986) is codon-specific Slide 11

Codon adaptation: E. coli & phage Phage TLS RSCU 2 1. 5 1 y

Codon adaptation: E. coli & phage Phage TLS RSCU 2 1. 5 1 y = 0. 4046 x + 0. 5954 2 R = 0. 672 0. 5 0 0. 5 1. 0 1. 5 2. 0 E. coli RSCU 2. 5 3. 0 3. 5

Calculation of CAI N 2, 3, 4: Number of 2 -, 3 -, 4

Calculation of CAI N 2, 3, 4: Number of 2 -, 3 -, 4 -fold codon families Compound 6 - or 8 -fold codon families should be broken into two codon families CAI is gene-specific. 0 CAI 1 CAI computed with different reference sets are not comparable. Problem with computing w as Fi/Fi. max: Suppose an amino acid is rarely used in highly expressed genes, then there is little selection on it, and the codon usage might be close to even, with wi 1. Now if we have a lowly expressed gene that happen to be made of entire of this amino acid, then the CAI for this lowly expressed gene would be 1, which is misleading. Xuhua Xia There has been no good alternative. Further research is needed. Slide 13

Weak m. RNA predictive power 80 Protein abundance 70 y = 5. 6507 x

Weak m. RNA predictive power 80 Protein abundance 70 y = 5. 6507 x + 4. 1367 R 2 = 0. 1936 60 50 ENO 1 40 30 20 10 FRS 2 0 0. 5 1. 5 2. 5 3. 5 4. 5 m. RNA abundance Xuhua Xia Slide 14

Effect of Codon Usage Bias 80 Protein abundance 70 y = 70. 398 x

Effect of Codon Usage Bias 80 Protein abundance 70 y = 70. 398 x - 11. 739 60 R 2 = 0. 5668 50 40 ENO 1 30 20 FRS 2 10 0 0. 05 0. 25 0. 45 0. 65 0. 85 Codon usage bias Xuhua Xia Slide 15

Hypothesis and Predictions Met Leu Glu Lys Gln Arg Trp t. RNAMet/CAU t. RNALeu/UAA

Hypothesis and Predictions Met Leu Glu Lys Gln Arg Trp t. RNAMet/CAU t. RNALeu/UAA t. RNAGlu/UUC t. RNALys/UUU t. RNAGln/UUG t. RNAArg/UCU t. RNATrp/UCA AUG UUG GAG AAG CAG AGG UGG AUA UUA GAA AAA CAA AGA UGA AUA is favoured by mutation, but not by t. RNA-mediated selection A-ending codons are favoured by both mutation and t. RNA-mediated selection. Predictions: 1. Proportion of A-ending codons (PNNA = NNNA/NNNG) or RSCU should be smaller in the Met codon family than in other R-ending codon families: 2. Availability of t. RNAMet/UAU should increase PAUA. Xuhua Xia et al. 2007

Testing prediction 1 Carullo, M. and Xia, X. 2008 J Mol Evol 66: 484–

Testing prediction 1 Carullo, M. and Xia, X. 2008 J Mol Evol 66: 484– 493. Xuhua Xia Slide 17

Testing prediction 2 Fig. 5. Relationship between PAUA and PUUA, highlighting the observation that

Testing prediction 2 Fig. 5. Relationship between PAUA and PUUA, highlighting the observation that PAUA is greater when both a t. RNAMet/CAU and a t. RNAMet/UAU are present than when only t. RNAMet/CAU is present in the mt. DNA, for bivalve species (a) and chordate species (b). The filled squares are for mt. DNA containing both t. RNAMet/CAU and t. RNAMet/UAU genes, and the open triangles are for mt. DNA without a t. RNAMet/UAU gene.

Why a systems biology perspective? No aphorism is more frequently repeated in connection with

Why a systems biology perspective? No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will respond to a logical and carefully thoughtout questionnaire; indeed, if we ask her a single question, she will often refuse to answer until some other topic has been discussed. --Ronald A. Fisher (1926). Journal of the Ministry of Agriculture of Great Britain 33: 503 – 513

Simpson’s paradox Treatment A Treatment B Small Stones 93% (81/87) 87% (234/270) Large Stones

Simpson’s paradox Treatment A Treatment B Small Stones 93% (81/87) 87% (234/270) Large Stones 73% (192/263) 69% (55/80) Pooled 78% (273/350) 83% (289/350) C. R. Charig et al. 1986. Br Med J (Clin Res Ed) 292 (6524): 879– 882 Treatment A: all open procedures Treatment B: percutaneous nephrolithotomy Question: which treatment is better?

RSCU (HIV-1 vs Human) 2. 5 V 2 RSCU (HIV-1) R S A I

RSCU (HIV-1 vs Human) 2. 5 V 2 RSCU (HIV-1) R S A I 1. 5 L E K L (a) G P T A-ending C-ending G-ending R Q 1 U-ending 0. 5 Fig. 1. Relative synonymous codon usage (RSCU) of HIV 1 compared to RSCU of highly expressed human genes. Data points for codons ending with A, C, G or U are annotated with different combinations of colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and human and are annotated with their coded amino acids. 0 0 0. 5 1 1. 5 2 2. 5 RSCU (Human) Xuhua Xia van Weringh et al. 2011. MBE. Slide 21

RSCU (HTLV-1 vs Human) Relative synonymous codon usage (RSCU) of HTLV-1 compared to RSCU

RSCU (HTLV-1 vs Human) Relative synonymous codon usage (RSCU) of HTLV-1 compared to RSCU of highly expressed human genes. Data points for codons ending with A, C, G or U are annotated with different combinations of colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and human and are annotated with their coded amino acids. Xuhua Xia Slide 22

Differential adaptation: early & late genes

Differential adaptation: early & late genes

Any problem with the mutation hypothesis? Table 2. Frequency of A residues, length and

Any problem with the mutation hypothesis? Table 2. Frequency of A residues, length and codon adaptation index (CAI) for the three HIV-1 early (tat, rev and nef) and five late (gag-pol, vif, vpu, vpr, and env) coding sequences (CDS). Gene CDS (bp) CAI tat 261 0. 66875 rev 351 0. 66211 nef 621 0. 67523 gag 1503 0. 62784 pol 3012 0. 58139 vif 579 0. 61941 vpr 291 0. 64272 vpu 249 0. 49068 env 2571 0. 61924 van Weringh et al. 2011. Molecular Biology and Evolution 28: 1827 -1834. CAI values may change depending on what reference set of highly expressed genes is used, but the relative magnitude should be maintained (unless the reference set is not of highly expressed genes)

t. RNA van Weringh et al. 2011. MBE. Xuhua Xia Slide 26

t. RNA van Weringh et al. 2011. MBE. Xuhua Xia Slide 26

I/A wobble pair is error-prone

I/A wobble pair is error-prone

Translation rate & codon adaptation Kudla et al. (2009, Science) engineered a synthetic library

Translation rate & codon adaptation Kudla et al. (2009, Science) engineered a synthetic library of 154 genes, all encoding the same protein but differing in degrees of codon adaptation, to quantify the effect of differential codon usage on protein production in E. coli. They concluded that “codon bias did not correlate with gene expression” and that “translation initiation, not elongation, is rate-limiting for gene expression” R 2 = 0, 0052 10000 Protein abundance 8000 6000 4000 2000 0 0, 35 0, 45 0, 55 0, 6 Codon adaptation index (CAI) 0, 65 0, 75 0, 8 Slide 28 of x

Problem with CAI and a new ITE AA A A Codon GCA GCG Cfnon-HEG

Problem with CAI and a new ITE AA A A Codon GCA GCG Cfnon-HEG 20 80 CFHEG 40 60 t. RNA 3 Identification of major and minor codons CAI ITE AA Codon A GCA A GCG CFnon-HEG 20 80 CFHEG 40 60 w 2/3 1 p. HEG 0. 4 0. 6 pnon-HEG 0. 2 0. 8 s 2 0. 75 w 1 0. 375 AA Codon A GCA A GCG CFnon-HEG 50 50 CFHEG 40 60 w 2/3 1 p. HEG 0. 4 0. 6 pnon-HEG 0. 5 s 0. 8 1. 2 w 2/3 1 CAI is a special case of ITE (when there is no background codon usage bias) Xuhua Xia Slide 29

Problem with CAI and a new ITE AA Codon A GCA A GCG CFnon-HEG

Problem with CAI and a new ITE AA Codon A GCA A GCG CFnon-HEG 20 80 CFHEG 40 60 w 2/3 1 Gene 1 10 40 Gene 2 20 30 CAI 1 = 0. 9221; CAI 2 = 0. 8503 Wrong conclusions: 1. Excellent codon adaptation in the codon family (high CAI values) 2. Gene 1 has better codon adaptation than Gene 2. AA Codon A GCA A GCG CFnon-HEG CFHEG 20 40 80 60 p. HEG pnon-HEG s w Gene 1 Gene 2 0. 4 0. 2 2 1 10 20 0. 6 0. 8 0. 75 0. 375 40 30 ITE. 1 = 0. 4563;ITE. 2 = 0. 5552 Correct conclusions: 1. Poor codon adaptation in the codon family (low ITE values) 2. Gene 2 has better codon adaptation than Gene 1. Xuhua Xia Slide 30

160 140 R 2 = 0, 1814 R 2 = 0, 1686 Ranked protein

160 140 R 2 = 0, 1814 R 2 = 0, 1686 Ranked protein abundance (r. Prot) 120 100 R 2 = 0, 1509 80 60 R 2 = 0, 0203 40 20 0 0, 65 0, 75 0, 8 Index of Translation Elongation (ITE) 0, 85 0, 9