Translation elongation amino acid usage and codon usage

  • Slides: 40
Download presentation
Translation elongation, amino acid usage, and codon usage indices Prof. Xuhua Xia xxia@uottawa. ca

Translation elongation, amino acid usage, and codon usage indices Prof. Xuhua Xia xxia@uottawa. ca http: // dambe. bio. uottawa. ca University of Ottawa

Objectives • Understand how amino acid and codon usage biases affect translation efficiency and

Objectives • Understand how amino acid and codon usage biases affect translation efficiency and gene expression • Biomedical and biopharmaceutical relevance – Protein drug production in pharmaceutical industry – Transgenic experiments in agriculture • Factors affecting amino acid and codon usage bias • Indices measuring codon usage bias • Learn to be critical and to develop a coherent conceptual framework. – Changing t. RNA pool during HIV-1 infection – Interaction between initiation and elongation Xuhua Xia 2

Amino acid usage • Prediction: In rapidly proliferating unicellular organisms, mass-produced proteins should maximize

Amino acid usage • Prediction: In rapidly proliferating unicellular organisms, mass-produced proteins should maximize the usage of abundant and energetically cheap amino acids and avoid rare and costly ones • Difficulties: – The prediction is ambiguous: • Prediction 1: Maximize the usage of abundant amino acids instead of rare ones • Prediction 2: Maximize the usage of energetically cheap amino acids instead of costly ones • Unless abundant AA are also cheap, the two predictions conflict with each other and would require different empirical support – Relevant data: • Protein abundance • Concentration of the 20 amino acid in cells • Energetic costs of synthesizing each of the 20 amino acids 3

Energetic Cost Amino acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu

Energetic Cost Amino acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr 1 -letter code A C D E F G H I K L M N P Q R S T V W Y Precursor metabolites pyr 3 pg oaa _kg 2 pep, ery. P 3 pg pen. P pyr, oaa, pyr 2 pyr, ac. Co. A oaa, Cys, _pyr oaa _kg _kg 3 pg oaa 2 pyr 2 pep, ery. P, PRPP, _pyr ery. P, 2 pep ~P 1. 0 7. 3 1. 3 2. 7 13. 3 20. 3 4. 3 2. 7 9. 7 3. 3 3. 7 10. 7 2. 3 3. 3 2. 0 27. 7 13. 3 Energetic cost H Total ~P 5. 3 11. 7 8. 7 24. 7 5. 7 12. 7 6. 3 15. 3 19. 3 52. 0 4. 7 11. 7 9. 0 38. 3 14. 0 32. 3 13. 0 30. 3 12. 3 27. 3 12. 3 34. 3 5. 7 14. 7 8. 3 20. 3 6. 3 16. 3 8. 3 27. 3 4. 7 11. 7 7. 7 18. 7 10. 7 23. 3 74. 3 18. 3 50. 0 Table 1 in Hiroshi Akashi and Takashi Gojobori 2002, PNAS 99: 3695– 3700 Xuhua Xia 4

Data compilation Prediction: Gene (protein) expression (GE) should be negatively correlated with mean energetic

Data compilation Prediction: Gene (protein) expression (GE) should be negatively correlated with mean energetic cost (Mean. EC) Gene Sequence Mean. EC G 1 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRL. . . 33 G 2 VYPWTQRFFESFGDLSTPDAVMGNPKVKAHGK. . . 50 G 3 KVLGAFSDGLAHLDNLKGTFATLSELHCDKLH. . . 30 G 4 VDPENFRLLGNVLVCVLAHHFGKEFTPPVQAA. . . 44 G 5 KVVAGVANALAHKYHAAVNGLWGKVNPDDVGG. . . 67 G 6 EALGRLLVVYPWTQRYFDSFGDLSSASAIMGN. . . 48 G 7 VKAHGKKVINAFNDGLKHLDNLKGTFAHLSEL. . . 54 G 8 HCDKLHVDPENFRLLGNMIVIVLGHHLGKEFS. . . 62 G 9 AQAAFQKVVAGVASALAHKYHMVHLTPEEKNA. . . 57 G 10 VTTLWGKVNVDEVGGEALGRLLVVYPWTQRFC. . . 37. . . . GE 50 45 51 48 32 45 49 29 37 49. . AA Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr AA A C D E F G H I K L M N P Q R S T V W Y EC 11. 7 24. 7 12. 7 15. 3 52. 0 11. 7 38. 3 32. 3 30. 3 27. 3 34. 3 14. 7 20. 3 16. 3 27. 3 11. 7 18. 7 23. 3 74. 3 50. 0 5

Data compilation: 2 Prediction: Gene (protein) expression (GE) should be negatively correlated with mean

Data compilation: 2 Prediction: Gene (protein) expression (GE) should be negatively correlated with mean energetic cost (Mean. EC) Gene Sequence Mean. EC G 1 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRL. . . 33 G 2 VYPWTQRFFESFGDLSTPDAVMGNPKVKAHGK. . . 50 G 3 KVLGAFSDGLAHLDNLKGTFATLSELHCDKLH. . . 30 G 4 VDPENFRLLGNVLVCVLAHHFGKEFTPPVQAA. . . 44 G 5 KVVAGVANALAHKYHAAVNGLWGKVNPDDVGG. . . 67 G 6 EALGRLLVVYPWTQRYFDSFGDLSSASAIMGN. . . 48 G 7 VKAHGKKVINAFNDGLKHLDNLKGTFAHLSEL. . . 54 G 8 HCDKLHVDPENFRLLGNMIVIVLGHHLGKEFS. . . 62 G 9 AQAAFQKVVAGVASALAHKYHMVHLTPEEKNA. . . 57 G 10 VTTLWGKVNVDEVGGEALGRLLVVYPWTQRFC. . . 37. . . . GE Bin <50 GE Bin 50 to <60 GE Bin 60 to <70. . . N 100 100 Mean. EC Mean. GE 52 47 38 56 30 65 GE 50 45 51 48 32 45 49 29 37 49. . AA Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr AA A C D E F G H I K L M N P Q R S T V W Y EC 11. 7 24. 7 12. 7 15. 3 52. 0 11. 7 38. 3 32. 3 30. 3 27. 3 34. 3 14. 7 20. 3 16. 3 27. 3 11. 7 18. 7 23. 3 74. 3 50. 0 6

Testing Prediction Gene expression Prediction: Gene (protein) expression (GE) should be negatively correlated with

Testing Prediction Gene expression Prediction: Gene (protein) expression (GE) should be negatively correlated with mean energetic cost (Mean. EC) 90 85 80 75 70 65 60 55 50 45 40 14 24 34 44 54 Mean energetic cost 7

AA usage and t. RNA abundance Saccharomyces cerevisiae Salmonella typhymurium Xia, X. 1998. Genetics.

AA usage and t. RNA abundance Saccharomyces cerevisiae Salmonella typhymurium Xia, X. 1998. Genetics. 149: 37: 44 8

AA usage and t. RNA gene copies A 1800 S 2 AA Freq in

AA usage and t. RNA gene copies A 1800 S 2 AA Freq in 11 ss. DNA coliphages 1600 G T 1400 D I 1200 F 1000 800 L 2 P Y K L 1 N E V R 2 Q 600 H S 1 W C 400 200 R 1 y = 231. 88 x + 244. 93 r = 0. 8426 p<0. 0001 0 0 1 2 3 4 5 Number of t. RNA genes in E. coli Chithambaram, S. et al. 2014. Genetics: 197: 301 -315 6 7

Number of synonymous codons 140000 Amino acid count in coding sequences R 2 =

Number of synonymous codons 140000 Amino acid count in coding sequences R 2 = 0, 4947 120000 100000 80000 60000 40000 20000 0 0 Xuhua Xia 1 2 3 4 Number of synonymous codons 5 6 10

Mutation bias and AA usage Amino acid usage in Mycoplasma pneumoniae (NZ_CP 010546) and

Mutation bias and AA usage Amino acid usage in Mycoplasma pneumoniae (NZ_CP 010546) and M. pulmonis (NC_002771) coding sequences. Nucleotide frequencies at the third codon reflect mutation bias. Percentage of nucleotides A, C, G and T at third codon sites: A C G T Mycoplasma pulmonis 0. 4200 0. 0891 0. 0583 0. 4326 Mycoplasma pneumoniae 0. 2630 0. 2257 0. 1894 0. 3218 Observation: Mutation is more AT-biased in Mycoplasma pulmonis than in M. pneumoniae Prediction: Amino acids encoded by AT-rich codons should be more frequent in Mycoplasma pulmonis than in M. pneumoniae. Amino acids encoded by GC-rich codons should behave the opposite. 11

Mutation bias and AA usage Prediction: Amino acids encoded by AT-rich codons should be

Mutation bias and AA usage Prediction: Amino acids encoded by AT-rich codons should be more frequent in Mycoplasma pulmonis than in M. pneumoniae. Amino acids encoded by GC-rich codons should behave the opposite. Amino acids encoded by AT-rich codons are in red, and those encode by GC-rich codons are blue. AA 3 Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr AA 1 A C D E F G H I K L M N P Q R S T V W Y Codon GCA, GCC, GCG, GCU UGC, UGU GAC, GAU GAA, GAG UUC, UUU GGA, GGC, GGG, GGU CAC, CAU AUA, AUC, AUU AAA, AAG CUA, CUC, CUG, CUU, UUA, UUG AAC, AAU CCA, CCC, CCG, CCU CAA, CAG AGA, AGG, CGA, CGC, CGG, CGU AGC, AGU, UCA, UCC, UCG, UCU ACA, ACC, ACG, ACU GUA, GUC, GUG, GUU UGA, UGG UAC, UAU Mpul 13413 840 15395 19905 17255 12874 4051 27080 31464 27407 4329 21791 7619 10583 8227 21792 13282 15266 2736 10740 Mpne 16265 1917 12185 13901 13875 13468 4455 16338 21315 25545 3843 15278 8641 13196 8740 15888 14701 15868 2999 7962 Mpul% 4. 6891 0. 2937 5. 3819 6. 9586 6. 0322 4. 5006 1. 4162 9. 4669 10. 9995 9. 5812 1. 5134 7. 6179 2. 6635 3. 6997 2. 8761 7. 6183 4. 6433 5. 3368 0. 9565 3. 7546 Mpne% 6. 6016 0. 7781 4. 9456 5. 6421 5. 6315 5. 4664 1. 8082 6. 6312 8. 6513 10. 3681 1. 5598 6. 2010 3. 5072 5. 3560 3. 5474 6. 4486 5. 9668 6. 4405 1. 2172 3. 2316 12

Summary of AA usage • Selection: – Minimizing energetic cost: mass-produced proteins should use

Summary of AA usage • Selection: – Minimizing energetic cost: mass-produced proteins should use cheap amino acids. – Maximizing translation efficiency: mass-produced proteins should use • abundant amino acids • amino acids carried by many t. RNAs (need to control for number of synonymous codons to evaluate its effect) • Mutation – AA encoded by AT-rich codons increases with AT-biased mutation – AA encoded by GC-rich codon increases with GC-biased mutation Xuhua Xia 13

Codon Usage Bias • • Observation: One codon in a synonymous codon family is

Codon Usage Bias • • Observation: One codon in a synonymous codon family is often used far more frequently than other synonymous codons. NCAA >> NCAG in yeast highly expressed genes. Hypotheses: – Mutation bias, e. g. , Transcriptional hypothesis of codon usage (Xia 1996 Genetics 144: 13091320 ) 14

Codon Usage Bias • • Observation: One codon in a synonymous codon family is

Codon Usage Bias • • Observation: One codon in a synonymous codon family is often used far more frequently than other synonymous codons. NCAA >> NCAG in yeast highly expressed genes. Hypotheses: – Mutation bias, e. g. , Transcriptional hypothesis of codon usage (Xia 1996 Genetics 144: 13091320 ) – t. RNA-mediated selection, e. g. , (Ikemura 1981) Gene 1 Gene 2 Gene 3 Polycistronic m. RNA polymerase Ribosome t. RNAGly/GCC Protein t. RNAGly/GCC Favor GGC usage 15

Codon Usage Bias • • Observation: One codon in a synonymous codon family is

Codon Usage Bias • • Observation: One codon in a synonymous codon family is often used far more frequently than other synonymous codons. NCAA >> NCAG in yeast highly expressed genes. Hypotheses: – Mutation bias, e. g. , Transcriptional hypothesis of codon usage (Xia 1996 Genetics 144: 13091320 ) – t. RNA-mediated selection, e. g. , (Ikemura 1981) • Predictions: – From mutation hypothesis: Concordance between codon usage and mutation pressure – From selection hypothesis: • High codon usage associated with high availability of t. RNA. • The association is stronger in highly expressed genes than lowly expressed genes. Gene 1 Gene 2 Gene 3 Polycistronic m. RNA polymerase Ribosome t. RNAGly/GCC Protein t. RNAGly/GCC Favor GGC usage 16

Codon usage in three microbials Xia, X. 1998. Genetics. 149: 37: 44 17

Codon usage in three microbials Xia, X. 1998. Genetics. 149: 37: 44 17

Codon usage in three microbials Xia, X. 1998. Genetics. 149: 37: 44 18

Codon usage in three microbials Xia, X. 1998. Genetics. 149: 37: 44 18

Codon usage of HEGs in yeast Xuhua Xia 2007. Bioinformatics and the cell. 19

Codon usage of HEGs in yeast Xuhua Xia 2007. Bioinformatics and the cell. 19

Major and minor codons • Major codon: the codon in a synonymous codon family

Major and minor codons • Major codon: the codon in a synonymous codon family that can be most efficiently translated in a species, typically with three associated properties: – it is over-represented in highly expressed genes relative to lowly expressed genes. – it corresponds to the most abundant isoacceptor t. RNA – replacing it with another codon leads to reduced translation efficiency (reduced protein production) • Minor codon is the opposite • Their identification is NOT based on the codon frequencies of all coding sequences in a species because sometimes mutation and selection goes in opposite directions. • Different species may have different major and minor codons in the same synonymous codon family. Mutation is AT-biased: t. RNAGln/CUG Xuhua Xia t. RNAGln/UUG Ct. RNA Codon HEG LEG 5 CAG 45% 30% 1 CAA 55% 70% 20

Objectives • Understand how amino acid and codon usage biases affect translation efficiency and

Objectives • Understand how amino acid and codon usage biases affect translation efficiency and gene expression • Biomedical and biopharmaceutical relevance – Protein drug production in pharmaceutical industry – Transgenic experiments in agriculture • Factors affecting amino acid and codon usage bias • Indices measuring codon usage bias • Learn to be critical and to develop a coherent conceptual framework. – Changing t. RNA pool during HIV-1 infection – Interaction between initiation and elongation Xuhua Xia 21

Calculation of RSCU Codon GCU GCC GCA GCG GAA GAG GGU GGC GGA GGG

Calculation of RSCU Codon GCU GCC GCA GCG GAA GAG GGU GGC GGA GGG UUA UUG CUU CUC CUA CUG AA Ala Ala Glu Gly Gly Leu Leu Leu N 52 91 103 2 78 17 29 62 97 31 110 16 62 95 285 29 RSCU 0. 84 1. 47 1. 66 0. 03 1. 64 0. 36 0. 53 1. 13 1. 77 0. 57 1. 11 0. 16 0. 62 0. 95 2. 86 0. 29 Codon CCU CCC CCA CCG CAA CAG CGU CGC CGA CGG AUA AUG UCU UCC UCA UCG AA Pro Pro Gln Arg Arg Met Ser Ser N 42 63 85 3 79 8 7 11 42 3 218 44 51 65 99 5 RSCU 0. 87 1. 31 1. 76 0. 06 1. 82 0. 18 0. 44 0. 7 2. 67 0. 19 1. 66 0. 34 1. 11 1. 42 2. 16 0. 11 Codon UAA UAG AGA AGG AAA AAG ACU ACC ACA ACG UGA UGG GUU GUC GUA GUG AA * * Lys Thr Thr Trp Val Val N 8 1 1 0 90 11 44 96 153 15 92 12 40 48 87 15 RSCU 3. 2 0. 4 0 1. 78 0. 22 0. 57 1. 25 1. 99 0. 19 1. 77 0. 23 0. 84 1. 01 1. 83 0. 32 22

Codon adaptation: E. coli & phage Phage TLS RSCU 2 1. 5 1 y

Codon adaptation: E. coli & phage Phage TLS RSCU 2 1. 5 1 y = 0. 4046 x + 0. 5954 2 R = 0. 672 0. 5 0 0. 5 1. 0 1. 5 2. 0 E. coli RSCU 2. 5 3. 0 3. 5

Calculation of CAI Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA

Calculation of CAI Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA UUU UUC … Xuhua Xia AA * * * A A C C D D E E F F … Cref 6 4 16 195 322 81 242 123 112 69 40 289 335 118 213 24

Calculation of CAI Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA

Calculation of CAI Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA UUU UUC … Xuhua Xia AA * * * A A C C D D E E F F … Cref w 6 0. 375 4 0. 250 16 1. 000 195 0. 606 322 1. 000 81 0. 252 242 0. 752 123 1. 000 112 0. 911 69 1. 000 40 0. 580 289 0. 863 335 1. 000 118 0. 554 213 1. 000 … 25

Calculation of CAI Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA

Calculation of CAI Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA UUU UUC … Xuhua Xia AA * * * A A C C D D E E F F … Cref w 6 0. 375 4 0. 250 16 1. 000 195 0. 606 322 1. 000 81 0. 252 242 0. 752 123 1. 000 112 0. 911 69 1. 000 40 0. 580 289 0. 863 335 1. 000 118 0. 554 213 1. 000 … Gene X Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA UUU UUC … AA * * * A A C C D D E E F F … … C 0 0 0 1 15 0 8 3 3 9 11 11 14 3 9 26

Calculation of CAI Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA

Calculation of CAI Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA UUU UUC … Xuhua Xia AA * * * A A C C D D E E F F … Cref w 6 0. 375 4 0. 250 16 1. 000 195 0. 606 322 1. 000 81 0. 252 242 0. 752 123 1. 000 112 0. 911 69 1. 000 40 0. 580 289 0. 863 335 1. 000 118 0. 554 213 1. 000 … Gene Perfect Codon AA UGA * UAG * UAA * GCA A GCU A GCG A GCC A UGC C UGU C GAU D GAC D GAG E GAA E UUU F UUC F … … … C 0 0 24 0 0 6 0 20 0 0 25 0 12 27

Calculation of CAI N 2, 3, 4: Number of 2 -, 3 -, 4

Calculation of CAI N 2, 3, 4: Number of 2 -, 3 -, 4 -fold codon families Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA UUU UUC … Xuhua Xia AA * * * A A C C D D E E F F … Cref w 6 0. 375 4 0. 250 16 1. 000 195 0. 606 322 1. 000 81 0. 252 242 0. 752 123 1. 000 112 0. 911 69 1. 000 40 0. 580 289 0. 863 335 1. 000 118 0. 554 213 1. 000 … Codon UGA UAG UAA GCU GCG GCC UGU GAC GAG GAA UUU UUC … AA * * * A A C C D D E E F F … … C 0 0 0 1 15 0 8 3 3 9 11 11 14 3 9 Compound 6 - or 8 -fold codon families should be broken into two codon families CAI is gene-specific. 0 CAI 1 CAI computed with different reference sets are not comparable. Problem with computing w as Fi/Fi. max: Suppose an amino acid is rarely used in highly expressed genes, then there is little selection on it, and the codon usage might be close to even, with wi 1. Now if we have a lowly expressed gene that happen to be made entirely of this amino acid, then the CAI for this lowly expressed gene would be 1, which is misleading. 28

Objectives • Understand how amino acid and codon usage biases affect translation efficiency and

Objectives • Understand how amino acid and codon usage biases affect translation efficiency and gene expression • Biomedical and biopharmaceutical relevance – Protein drug production in pharmaceutical industry – Transgenic experiments in agriculture • Factors affecting amino acid and codon usage bias • Indices measuring codon usage bias • Learn to be critical and to develop a coherent conceptual framework: – Changing t. RNA pool during HIV-1 infection – Interaction between initiation and elongation Xuhua Xia 29

We are in a multifactorial world No aphorism is more frequently repeated in connection

We are in a multifactorial world No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will respond to a logical and carefully thoughtout questionnaire; indeed, if we ask her a single question, she will often refuse to answer until some other topic has been discussed. --Ronald A. Fisher (1926). Journal of the Ministry of Agriculture of Great Britain 33: 503 – 513

Simpson’s paradox Treatment A Treatment B Small Stones 93% (81/87) 87% (234/270) Large Stones

Simpson’s paradox Treatment A Treatment B Small Stones 93% (81/87) 87% (234/270) Large Stones 73% (192/263) 69% (55/80) Pooled 78% (273/350) 83% (289/350) C. R. Charig et al. 1986. Br Med J (Clin Res Ed) 292 (6524): 879– 882 Treatment A: all open procedures Treatment B: percutaneous nephrolithotomy Question: which treatment is better?

Codon adaptation: E. coli & phage Phage TLS RSCU 2 1. 5 1 y

Codon adaptation: E. coli & phage Phage TLS RSCU 2 1. 5 1 y = 0. 4046 x + 0. 5954 2 R = 0. 672 0. 5 0 0. 5 1. 0 1. 5 2. 0 E. coli RSCU 2. 5 3. 0 3. 5

RSCU (HIV-1 vs Human) 2. 5 V 2 RSCU (HIV-1) R S A I

RSCU (HIV-1 vs Human) 2. 5 V 2 RSCU (HIV-1) R S A I 1. 5 L E K L (a) G P T A-ending C-ending G-ending R Q 1 U-ending 0. 5 Fig. 1. Relative synonymous codon usage (RSCU) of HIV 1 compared to RSCU of highly expressed human genes. Data points for codons ending with A, C, G or U are annotated with different combinations of colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and human and are annotated with their coded amino acids. 0 0 0. 5 1 1. 5 2 2. 5 RSCU (Human) Xuhua Xia van Weringh et al. 2011. MBE. 33

Codon usage in human and HIV-1 AA(Codon) RSCUHum RSCUHIV Arg(AGA) 0. 97 1. 44

Codon usage in human and HIV-1 AA(Codon) RSCUHum RSCUHIV Arg(AGA) 0. 97 1. 44 Arg(AGG) 1. 03 0. 56 Ile(AUA) 0. 24 1. 59 Ile(AUY) 2. 76 1. 41 Leu(UUA) 0. 68 1. 38 Leu(UUG) 1. 32 0. 62 Lys(AAA) 0. 76 1. 27 Lys(AAG) 1. 24 0. 73 Gly(GGA) 0. 93 2. 08 Gly(GGB) 3. 07 1. 92 Val(GUA) 0. 39 2. 08 Val(GUB) 3. 61 1. 92 Thr(ACA) 0. 97 1. 94 Thr(ACB) 3. 03 2. 06 14 t. RNAIle/IAU 3 t. RANIle/GAU 5 t. RNAIle/UAU (very rare in cell) 34

Codon usage in human and HIV-1 AA(Codon) RSCUHum RSCUHIV Arg(AGA) 0. 97 1. 44

Codon usage in human and HIV-1 AA(Codon) RSCUHum RSCUHIV Arg(AGA) 0. 97 1. 44 Arg(AGG) 1. 03 0. 56 Ile(AUA) 0. 24 1. 59 Ile(AUY) 2. 76 1. 41 Leu(UUA) 0. 68 1. 38 Leu(UUG) 1. 32 0. 62 Lys(AAA) 0. 76 1. 27 Lys(AAG) 1. 24 0. 73 Gly(GGA) 0. 93 2. 08 Gly(GGB) 3. 07 1. 92 Val(GUA) 0. 39 2. 08 Val(GUB) 3. 61 1. 92 Thr(ACA) 0. 97 1. 94 Thr(ACB) 3. 03 2. 06 14 t. RNAIle/IAU 3 t. RANIle/GAU 5 t. RNAIle/UAU (very rare in cell) AUC: 14 t. RNAIle/IAU, 3 t. RANIle/GAU AUU: 14 t. RNAIle/IAU, 3 t. RANIle/GAU AUA: 5 t. RNAIle/UAU Highly expressed human genes should encode Ile by AUY, and they do. Modifying HIV-1 codon usage according to host codon usage has been shown to increase the production of viral proteins (Haas et al. 1996; Ngumbela et al. 2008) 35

Codon usage in human and HIV-1 AA(Codon) RSCUHum RSCUHIV Arg(AGA) 0. 97 1. 44

Codon usage in human and HIV-1 AA(Codon) RSCUHum RSCUHIV Arg(AGA) 0. 97 1. 44 Arg(AGG) 1. 03 0. 56 Ile(AUA) 0. 24 1. 59 Ile(AUY) 2. 76 1. 41 Leu(UUA) 0. 68 1. 38 Leu(UUG) 1. 32 0. 62 Lys(AAA) 0. 76 1. 27 Lys(AAG) 1. 24 0. 73 Gly(GGA) 0. 93 2. 08 Gly(GGB) 3. 07 1. 92 Val(GUA) 0. 39 2. 08 Val(GUB) 3. 61 1. 92 Thr(ACA) 0. 97 1. 94 Thr(ACB) 3. 03 2. 06 14 t. RNAIle/IAU 3 t. RANIle/GAU 5 t. RNAIle/UAU (very rare in cell) AUC: 14 t. RNAIle/IAU, 3 t. RANIle/GAU AUU: 14 t. RNAIle/IAU, 3 t. RANIle/GAU AUA: 5 t. RNAIle/UAU Highly expressed human genes should encode Ile by AUY, and they do. Modifying HIV-1 codon usage according to host codon usage has been shown to increase the production of viral proteins (Haas et al. 1996; Ngumbela et al. 2008) Mutation hypothesis for poor codon adaptation in HIV-1 genes, with HTLV-1 data as support 36

RSCU (HTLV-1 vs Human) Relative synonymous codon usage (RSCU) of HTLV-1 compared to RSCU

RSCU (HTLV-1 vs Human) Relative synonymous codon usage (RSCU) of HTLV-1 compared to RSCU of highly expressed human genes. Data points for codons ending with A, C, G or U are annotated with different combinations of colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and human and are annotated with their coded amino acids. Xuhua Xia 37

Differential adaptation: early & late genes Table 2. Frequency of A residues, length and

Differential adaptation: early & late genes Table 2. Frequency of A residues, length and codon adaptation index (CAI) for the three HIV-1 early (tat, rev and nef) and five late (gag-pol, vif, vpu, vpr, and env) coding sequences (CDS). Gene tat rev nef CDS (bp) CAI 261 0. 66875 351 0. 66211 621 0. 67523 Early genes gag pol vif vpr vpu env 1503 3012 579 291 249 2571 Late genes 0. 62784 0. 58139 0. 61941 0. 64272 0. 49068 0. 61924 van Weringh et al. 2011. MBE.

Translation rate & codon adaptation Kudla et al. (2009, Science) engineered a synthetic library

Translation rate & codon adaptation Kudla et al. (2009, Science) engineered a synthetic library of 154 genes, all encoding the same protein but differing in degrees of codon adaptation, to quantify the effect of differential codon usage on protein production in E. coli. They concluded that “codon bias did not correlate with gene expression” and that “translation initiation, not elongation, is rate-limiting for gene expression” R 2 = 0, 0052 10000 Protein abundance 8000 6000 4000 2000 0 0, 35 0, 45 0, 55 0, 6 Codon adaptation index (CAI) 0, 65 0, 75 0, 8 39 of

160 140 R 2 = 0, 1814 R 2 = 0, 1686 Ranked protein

160 140 R 2 = 0, 1814 R 2 = 0, 1686 Ranked protein abundance (r. Prot) 120 100 R 2 = 0, 1509 80 60 R 2 = 0, 0203 40 20 0 0, 65 0, 75 0, 8 Index of Translation Elongation (ITE) 0, 85 Xia, 2015. Genetics 0, 9