Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine
Molecular Biomedical Informatics 分 子 生 醫 資 訊 實 驗 室 Machine Learning and Bioinformatics 機 器 學 習 與 生 物 資 訊 學 Machine Learning & Bioinformatics 1
Molecular biology n n Nucleic acid n Protein – DNA – Amino acid – RNA – Primary structure Central dogma – Transcription – Secondary structure – Tertiary structure – Translation Machine Learning & Bioinformatics 2
Nucleic acid n n n A nucleic acid is a macromolecule composed of chains of monomeric nucleotide In biochemistry these molecules carry genetic information or form structures within cells The most common nucleic acids are deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) Machine Learning & Bioinformatics 3
http: //juang. bst. ntu. edu. tw/BC 2008/images/NA%20 Fig 1. jpg
Nucleic acid components Sugar 5 http: //www. mun. ca/biology/scarr/i. Gen 3_02 -07. html
Nucleic acid components Base n Purine – Adenine (A) and guanine (G) n Pyrimidine – Thymine (T), cytosine (C) – Uracil (U, only in RNA) Machine Learning & Bioinformatics 6
http: //www. elmhurst. edu/~chm/vchembook/images/580 bases. gif
Phosphodiester bond 8 http: //www. uic. edu/classes/bios 100/lectures/chemistry. htm
DNA n n n Chemically, DNA is a long polymer of simple units called nucleotides, with a backbone made of sugars and phosphate groups joined by ester bonds Attached to each sugar is one of four types of molecules called bases It is the sequence of these four bases along the backbone that encodes information http: //upload. wikimedia. org/wikipedia/commons/8/87/DNA_orbit_animated_small. gif Machine Learning & Bioinformatics 9
DNA Base pairing n n n Each type of base on one strand forms a bond with just one type of base on the other strand Here, purines form hydrogen bonds to pyrimidines, with A bonding only to T, and C bonding only to G DNA sequence – 5’Cp. Gp. Cp. Ap. T 3’Tp. Ap. Cp. Gp. C – CGCGAATT Machine Learning & Bioinformatics 10
http: //www. ucl. ac. uk/~sjjgsca/Nucleotide. Pairing. jpg
Double helix http: //www. coe. drexel. edu/ret/personalsites/2005/dayal/curriculum 1_files/image 001. jpg
Hydrogen bond n n n A hydrogen bond exists between an electronegative atom and a hydrogen atom bonded to another electronegative atom This type of force always involves a hydrogen atom and the energy of this attraction is close to that of weak covalent bonds (155 k. J/mol), thus the name – Hydrogen Bonding Biological functions – – DNA/RNA base paring protein secondary/tertiary structure formation some properties of water molecule antibody-antigen (and other protein-protein) binding Machine Learning & Bioinformatics 13
Hydrogen bond is resulted from electronegativity http: //upload. wikimedia. org/wikipedia/commons/4/43/Liquid_water_hydrogen_bond. png
Grooves http: //courses. biology. utah. edu/horvath/biol. 3525/1_DNA/Fig 2/marty_1. jpg
DNA structure http: //www. youtube. com/watch? v=qy 8 d k 5 i. S 1 f 0&NR=1 Machine Learning & Bioinformatics 16
Any Questions? about DNA Machine Learning & Bioinformatics 17
Central dogma http: //fig. cox. miami. edu/~cmallery/255 hist/mcb 4. 1. dogma. jpg
Central dogma n n The process by witch information is extracted from the nucleotide sequence of a gene and then used to make a protein is essentially the same for all living things on Earth and is described by the grandly named central dogma of molecular biology Information in cells passes from DNA to RNA to proteins http: //upload. wikimedia. org/wikipedia/commons/3/3 a/Crick's_1958_central_dogma. svg Machine Learning & Bioinformatics 19
RNA n n Information stored from DNA is used to make a more transient, single-stranded polynucleotide called RNA (Ribonucleic Acid) RNA is very similar to DNA, but differs in a few important structural details – in the cell RNA is usually single stranded, while DNA is usually double stranded – RNA nucleotides contain ribose while DNA contains deoxyribose (a type of ribose that lacks one oxygen atom) – in RNA the nucleotide uracil substitutes for thymine, which is present in DNA Machine Learning & Bioinformatics 20
http: //commons. wikimedia. org/wiki/File: Difference_DNA_RNA-DE. svg
Central dogma Transcription n Transcription is the synthesis of RNA under the direction of DNA Both nucleic acid sequences use the same language, and the information is simply transcribed, or copied DNA sequence is copied by RNA polymerase to produce a complementary nucleotide RNA strand, called messenger RNA (m. RNA) Machine Learning & Bioinformatics 22
DNA transcription http: //www. youtube. com/watch? v=v. JSm Z 3 Dsnt. U Machine Learning & Bioinformatics 23
Transcription detail http: //wwwclass. unl. edu/biochem/gp 2/m_biology/an imation/m_animations/gene 2. swf Machine Learning & Bioinformatics 24
RNA Various types n m. RNA – messenger RNA (m. RNA) is the RNA that carries information from DNA to the ribosome – the coding sequence of the m. RNA determines the amino acid sequence in the protein that is produced n Non-coding RNA Machine Learning & Bioinformatics 25
Various RNA types Non-coding RNA n n Many RNAs do not code for protein These nc. RNAs encode in specific genes (RNA genes) or m. RNA introns The most common nc. RNAs are transfer RNA (t. RNA) and ribosomal RNA (r. RNA) Other nc. RNAs such as micro. RNA (mi. RNA) involve in post-transcriptional gene regulation Machine Learning & Bioinformatics 26
http: //eurheartj. oxfordjournals. org/content/vol 0/issue 2010/images/large/ehp 57301. jpeg
Central dogma Translation n Translation is the second stage of protein biosynthesis Translation occurs in the cytoplasm where the ribosomes are located In translation, m. RNA is decoded to produce a specific polypeptide according to the rules specified by the genetic code Machine Learning & Bioinformatics 28
From RNA to protein synthesis http: //www. youtube. com/watch? v=NJxob gk. PEAo Machine Learning & Bioinformatics 29
Protein translation http: //www. youtube. com/watch? v=nl 8 p. S lonm. A 0 Machine Learning & Bioinformatics 30
http: //www. lucasbrouwers. nl/blog/wp-content/uploads/2010/04/genetic-code. jpg
Any Questions? about central dogma Machine Learning & Bioinformatics 32
Protein Machine Learning & Bioinformatics 33
Protein n n Proteins are large organic compounds made of amino acids arranged in a linear chain and joined together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues Proteins can also work together to achieve a particular function, and they often associate to form stable complexes Machine Learning & Bioinformatics 34
Protein Amino acid n n In chemistry, an amino acid is a molecule that contains both amine and carboxyl functional groups In biochemistry, this term refers to alphaamino acids with the general formula H 2 NCHRCOOH, where R is an organic substituent Machine Learning & Bioinformatics 35
http: //upload. wikimedia. org/wikipedia/commons/thumb/c/ce/Amino. Acidball. svg/702 px-Amino. Acidball. svg. png
Amino acid Various side chains n n The various alpha amino acids differ in which side chain (R group) is attached to their alpha carbon They can vary in size from just a hydrogen atom in glycine through a methyl group in alanine to a large heterocyclic group in tryptophan Machine Learning & Bioinformatics 37
http: //amit 1 b. wordpress. com/the-molecules-of-life/about/amino-acids/
http: //juang. bst. ntu. edu. tw/BC 2008/images/Amino%281%29%202007/A 1 -7. JPG
http: //dyerfitness. ca/2013/04/26/supplements-with-dyerfitness-branch-amino-acids-for-those-why-are-trying-to-muscle-up/
http: //juang. bst. ntu. edu. tw/BC 2008/images/Amino%281%29%202007/A 1 -9. JPG
http: //www. biomedcentral. com/1471 -2105/10/113/figure/F 3? highres=y
Amino acid The building blocks of proteins n n n Amino acids combine in a condensation reaction and the new “amino acid residue” are held together by peptide bonds Proteins are defined by their unique sequence of residues (primary structure) As the letters form various words, amino acids form a vast variety of sequences/proteins Machine Learning & Bioinformatics 43
http: //upload. wikimedia. org/wikipedia/commons/thumb/6/6 d/Peptidformationball. svg/2000 px-Peptidformationball. svg. png
http: //juang. bst. ntu. edu. tw/BC 2008/images/Amino(1)%202007/A 1 -11. JPG
http: //juang. bst. ntu. edu. tw/BC 2008/images/Amino(1)%202007/A 1 -13. JPG
Protein After knowing amino acids n n Amino acids form short polymer chains called peptides or longer chains called either polypeptides or proteins The process of such formation from an m. RNA template (obeying genetic code) is known as translation, which is part of protein biosynthesis Machine Learning & Bioinformatics 47
Protein structure hierarchy Machine Learning & Bioinformatics 48
http: //cropandsoil. oregonstate. edu/classes/css 430/lecture%209 -07/figure-09 -03. JPG
http: //juang. bst. ntu. edu. tw/BC 2008/images/Protein(1)%202007/P 1 -4. JPG
http: //juang. bst. ntu. edu. tw/BC 2008/images/Protein(1)%202007/P 1 -8. JPG
http: //juang. bst. ntu. edu. tw/BC 2008/images/Protein(1)%202007/P 1 -9. JPG
Protein structure hierarchy Secondary structure n n In biochemistry and structural biology, secondary structure is the general threedimensional form of local segments of biopolymers such as proteins and nucleic acids It does not, however, describe specific atomic positions in three-dimensional space, which are considered to be tertiary structure Machine Learning & Bioinformatics 53
http: //juang. bst. ntu. edu. tw/BC 2008/images/Protein(2)%202007/P 2 -3. JPG
Protein structure hierarchy Tertiary structure n n n The three-dimensional structure of a protein or any other macromolecule, as defined by the atomic coordinates Describe the spatial relations among it secondary structures Tertiary structure is considered to be largely determined by the protein’s primary sequence Machine Learning & Bioinformatics 55
Protein tertiary structure Experiment techniques n n The majority of protein structures have been solved with X-ray crystallography The second common way is NMR (Nuclear Magnetic Resonance) – lower resolution – limited to small proteins – provide time-dependent information in solution Machine Learning & Bioinformatics 56
http: //campusapps. fullerton. edu/news/arts/2003/photos/protein-art. jpg
Protein structure hierarchy Quaternary structure n n Many proteins are actually assemblies of more than one polypeptide chain, which in the context of the larger assemblage are known as protein subunits In addition to the tertiary structure of the subunits, multiple-subunit proteins possess a quaternary structure, which is the arrangement into which the subunits assemble http: //courses. cm. utexas. edu/jrobertus/ch 339 k/overheads-1/ch 6_quat-struct 1. jpg Machine Learning & Bioinformatics 58
Protein sub-structure Machine Learning & Bioinformatics 59
Protein sub-structure Domain n A part of protein sequence and structure that can evolve, function, and exist independently About 25– 500 aa Often form functional units http: //upload. wikimedia. org/wikipedia/commons/6/67/1 pkn. png Machine Learning & Bioinformatics 60
Zinc fingers are small protein structural motifs that can coordinate zinc ions to help stabilize their folds 61 http: //upload. wikimedia. org/wikipedia/commons/7/79/Zinc_finger_DNA_complex. png
Protein sub-structure Motif n A sequence motif indicate a nucleotide or aminoacid sequence pattern that is widespread and often has a biological significance – one may say that a regular expression is a string motif – actually, regular expression can be used to describe sequence motifs n For proteins, a sequence motif is distinguished from a structural motif, a motif formed by the three dimensional arrangement of amino acids, which may not be adjacent Machine Learning & Bioinformatics 62
Protein sub-structure Structure motif n n n A 3 D structural element or fold, which appears also in a variety of other molecules In the context of proteins, the term is sometimes used interchangeably with “structure domain, ” although a domain need not be a motif nor, if it contains a motif, need not be made up of only one Roughly – domain is something with obvious boundaries – motif is something frequently observed Machine Learning & Bioinformatics 63
http: //www. biomedcentral. com/content/figures/1471 -2164 -8 -60 -8. jpg
http: //juang. bst. ntu. edu. tw/BC 2008/images/Protein(1)%202007/P 1 -3. JPG
Molecular biology Reference n 台大莊榮輝教授網站 – http: //juang. bst. ntu. edu. tw/BC 2008/index. htm n 交大分子生物學網站 – http: //www. life. nctu. edu. tw/~mb/c 40101. htm Machine Learning & Bioinformatics 67
Any Questions? about molecular biology Machine Learning & Bioinformatics 68
- Slides: 68