Coding Theory and Protein Synthesis AvogadroScale Engineering Form

  • Slides: 22
Download presentation
Coding Theory and Protein Synthesis Avogadro-Scale Engineering: Form and Function November 18, 19 2003

Coding Theory and Protein Synthesis Avogadro-Scale Engineering: Form and Function November 18, 19 2003 Elebeoba E. May Computational Biology Department Sandia National Laboratories *eemay@sandia. gov Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC 04 -94 AL 85000.

Agenda It is the glory of God to conceal a matter; to search out

Agenda It is the glory of God to conceal a matter; to search out a matter is the glory of kings. Proverbs 25: 2 (NIV) • Error Control at Diverse Molecular Scales • Coding Theory Models of Protein Synthesis – Gatlin – Yockey – May et al. • Applications of Coding Theory to – Genetic Classification – Molecular Computation – Construction and Control in Protein Synthesis

Nucleotides: Did nature select a parity check code? D. A. Mac Dónaill : “

Nucleotides: Did nature select a parity check code? D. A. Mac Dónaill : “ Numerical Interpretation of nucleotides depicted as positions on a B^4 hypercube: (a) even-parity nucleotides; (b) odd-parity nucleotides. The natural alphabet is structured as an error-checking code. ” *D. A. Mac Dónaill, “A parity code interpretation of nucleotide alphabet composition, ” Chem. Comm. (2002) 2062 -2063 and http: //www. tcd. ie/Chemistry/People/macdonaill/

Protein: Degeneracy of the genetic code http: //www. people. virginia. edu/~rjh 9 u/code. html

Protein: Degeneracy of the genetic code http: //www. people. virginia. edu/~rjh 9 u/code. html B. Hayes : “how quickly a biochemical puzzle … was reduced to an abstract problem in symbol manipulation. ” B. Hayes, “The Invention of the Genetic Code, ” Sc. Am. 1998 (Physicist George Gamow and coding-theorist Solomon W. Golomb. Experimental evidence from Marshall W. Nirenberg and J. Heinrich Matthaei, NIH)

Protein: Information theory and binding sites T. D. Schneider : “Strong minor groove base

Protein: Information theory and binding sites T. D. Schneider : “Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation, ” Nucleic Acids Research, 2001, Vol. 29, No. 23 4881 -4891

Genome: Increased length, increased fidelity Mutation Rates • RNA viruses: 1 - 0. 1

Genome: Increased length, increased fidelity Mutation Rates • RNA viruses: 1 - 0. 1 • DNA microbes: 1/300 • Higher eukaryotes: 1/300 Ef. Gn Comparison of microbial genome base mutation rate to genome size: exhibits power law behavior; inverse relation between genome size and base mutation rate.

G. Battail: “… increasing the codeword length results in a decreasing probability of error…”

G. Battail: “… increasing the codeword length results in a decreasing probability of error…” Comparison of higher eukaryotic genome base mutation rate to genome size: inverse relation between genome size and base mutation rate.

Evidence: Is there evidence of error control in protein synthesis process? ØLiebovitch et al.

Evidence: Is there evidence of error control in protein synthesis process? ØLiebovitch et al. 1996, Rosen and Moore 2003 computational experiments did not find evidence for linear block codes ØApproach not comprehensive, did not consider convolutional coding or noise ØMay et al. Looked for optimal generator for translation initiation sites ØHighly probable for encoding model not to conform to known error control codes.

Agenda It is the glory of God to conceal a matter; to search out

Agenda It is the glory of God to conceal a matter; to search out a matter is the glory of kings. Proverbs 25: 2 (NIV) • Error Control at Diverse Molecular Scales • Coding Theory Models of Protein Synthesis – Gatlin – Yockey – May et al. • Applications of Coding Theory to – Genetic Classification – Molecular Computation – Construction and Control in Protein Synthesis

Central Dogma of Genetics = Genetic Information Transmission A Encode (eukaryotes) Channel Decode B

Central Dogma of Genetics = Genetic Information Transmission A Encode (eukaryotes) Channel Decode B (http: //www-stat. stanford. edu/~susan/courses/s 166/central. gif)

Coding Theory Models of Protein Synthesis Gatlin, LL. , Information theory and the Living

Coding Theory Models of Protein Synthesis Gatlin, LL. , Information theory and the Living System. 1972. Yockey, Hubert, Information Theory and Molecular Biology. 1992

Coding Theory View of Protein Synthesis, May et al. , JFI 2004 Genetic Encoder

Coding Theory View of Protein Synthesis, May et al. , JFI 2004 Genetic Encoder Channel Genetic Information Errors Principal Hypothesis: If m. RNA is viewed as a noisy encoded signal, it is feasible to use Genetic Decoder principles of error control coding theory to interpret the genetic translation initiation mechanism m. RNA AUG UAA 3’

Engineering Communication System B A Error Control Encoder n-bit Information 111 -000 -111 k-bit

Engineering Communication System B A Error Control Encoder n-bit Information 111 -000 -111 k-bit 1 -0 -0 -1 Information Channel 111 -000 -110 Errors! Decoder Noise+n-bit Information 111 -000 -110 1 -0 -0 -1 ~ k-bit 1 -0 -0 -1 Information

Engineering Communication System B A Error Control Encoder n-bit Information 111 -000 -111 k-bit

Engineering Communication System B A Error Control Encoder n-bit Information 111 -000 -111 k-bit 1 -0 -0 -1 Information Channel 111 -000 -110 Errors! Decoder Noise+n-bit Information ? ? 111 -000 -110 1 -0 -0 -1 ~ k-bit 1 -0 -0 -1 Information

Agenda It is the glory of God to conceal a matter; to search out

Agenda It is the glory of God to conceal a matter; to search out a matter is the glory of kings. Proverbs 25: 2 (NIV) • Error Control at Diverse Molecular Scales • Coding Theory Models of Protein Synthesis – Gatlin – Yockey – May et al. • Applications of Coding Theory to – Genetic Classification – Molecular Computation – Construction and Control in Protein Synthesis

Biological Coding Theory Ø David Loewenstern, et. al • Compression for DNA sequence classification

Biological Coding Theory Ø David Loewenstern, et. al • Compression for DNA sequence classification Ø Leonard Adleman, et al. ; Lila Kari, et al. • Molecular computation • Encoding for DNA computing • Error-control coding ØThomas Schneider, et al. • Biological information theory • Error-control via sphere packing Storage Transmission Error-Control Coding Based Methods • Efficient Coding for the Desoxyribonucleic Channel (S. W. Golomb 1962) – Applied Biorthogonal codes to genetic coding problem (the codon to amino acid mapping challenge) • Andrzej K. Konopka (1984) • Gerard Battail • Table-Based Convolutional Code for E. coli Promoter (P. Bermel) – Based on the informational content of E. coli promoter, approximates the coding rate for promoter region as 1/9. – Developed a possible 1/5 binary code for E. coli promoter region.

Coding Theory in RBS Classification DB NRD SD AUG Horizontal axis is position relative

Coding Theory in RBS Classification DB NRD SD AUG Horizontal axis is position relative to the first base of the initiation codon. Vertical axis is the mean of the aligned minimum Hamming distance values by position, for the 3 sequence groups (Hamming distance = # of positions where two vectors differ) May et al. , Bio. Systems 2004

Coding Theory in RBS Classification b-15, b-14, …, b-11, … , b-1, A U

Coding Theory in RBS Classification b-15, b-14, …, b-11, … , b-1, A U G b-15 b-14 b-13 b-12 b-11 b-10 b-9 b-8 b-7 Davg-15 Davg-14 ………. 11 s Davg-

Coding Theory and Molecular Computation ØLeonard M. Adleman, et al. ; Lila Kari, et

Coding Theory and Molecular Computation ØLeonard M. Adleman, et al. ; Lila Kari, et al. • Molecular computation • Encoding for DNA computing • Error-control coding v 1 v v 2 1 v 2 ligase v v 1 ØM. Stojanovic and D. Stefanovic, “A deoxyribozyme-based molecular automaton. ” Nature Biotech. 2003 • Can achieve computational robustness using coding theory http: //www. scs. uiuc. edu/~scott/index_files/ligation. gif 2

Construction and control: Quantify and Optimize Protein Translation Polypeptide Protein 50 s sub-unit Initiation

Construction and control: Quantify and Optimize Protein Translation Polypeptide Protein 50 s sub-unit Initiation Factors 5’ UAA 3’ AUG 5’ 30 s sub-unit Leader* Messenger RNA (m. RNA) AUG GUG UUG Coding Region UAA UAG UGA *Ribosome binding site contained in leader region • Phases of translation: initiation, elongation, termination • Initiation is most time consuming, affects overall gene expression level • Qualitative outline for initiation process exists: 1) 30 S + Ifs bind to m. RNA and f. Mett. RNA; 2) Ternary complex binds 50 S subunit; 3) IFs released prior to elongation. Øm. RNA is the only variable aspect of translation initiation. ØInformation encoded in m. RNA determines specificity and efficiency 3’

Construction and control : 5’ Quantify and Optimize Protein Translation m. RNA Leader Region

Construction and control : 5’ Quantify and Optimize Protein Translation m. RNA Leader Region (UTR) Non-random Ribosome Binding Site domain AUG GUG UUG 3’. . AUUCCUCCACUAG…. 5’ Modify E. coli Intergenic Downstream box 3’

Acknowledgments • Collaborators – NCSU: Mladen Vouk, Donald Bitzer, and Winser Alexander, Ann Stomp

Acknowledgments • Collaborators – NCSU: Mladen Vouk, Donald Bitzer, and Winser Alexander, Ann Stomp – SNL: Anna Johnston, William Hart, Jean-Paul Watson, Richard Pryor • NIEHS: John Drake (Mutagenesis data) • Support: – SNL Tier 1 Seniors Council LDRD/DOE – NSF, Ford Foundation