An Introduction to Bioinformatics Algorithms Molecular Biology Primer
An Introduction to Bioinformatics Algorithms Molecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen, Mike Daly, Hoa Dinh, Erinn Hama, Robert Hinman, Julio Ng, Michael Sneddon, Hoa Troung, Jerry Wang, Che Fung Yung
An Introduction to Bioinformatics Algorithms Outline: • • • 1. What Is Life Made Of? 2. What Is Genetic Material? 3. What Carries Information between DNA and Proteins 4. How are Proteins Made? 5. How to analysis genome (some lab techniques)
An Introduction to Bioinformatics Algorithms 1. What is Life made of?
An Introduction to Bioinformatics Algorithms Life begins with Cell • • A cell is a smallest structural unit of an organism that is capable of independent functioning All cells have some common features
An Introduction to Bioinformatics Algorithms All Cells have common Cycles • Born, eat, replicate, and die
An Introduction to Bioinformatics Algorithms Two types of cells: Prokaryotes v. s. Eukaryotes
An Introduction to Bioinformatics Algorithms Prokaryotes and Eukaryotes • According to the most recent evidence, there are three main branches to the tree of life. • Prokaryotes include Archaea (“ancient ones”) and bacteria. • Eukaryotes are kingdom Eukarya and includes plants, animals, fungi and certain algae.
An Introduction to Bioinformatics Algorithms Prokaryotes and Eukaryotes, continued Prokaryotes Eukaryotes Single cell Single or multi cell No nucleus No organelles One piece of circular DNA Chromosomes No m. RNA post Exons/Introns splicing transcriptional modification
An Introduction to Bioinformatics Algorithms Section 2: Genetic Material of Life
An Introduction to Bioinformatics Algorithms DNA: The Code of Life • • The structure and the four genomic letters code for all living organisms Adenine, Guanine, Thymine, and Cytosine which pair A-T and C-G on complimentary strands.
An Introduction to Bioinformatics Algorithms DNA, continued • DNA has a double helix structure which composed of • • sugar molecule phosphate group and a base (A, C, G, T) DNA always reads from 5’ end to 3’ end for transcription replication 5’ ATTTAGGCC 3’ 3’ TAAATCCGG 5’
An Introduction to Bioinformatics Algorithms DNA the Genetics Makeup Genes are inherited and are expressed • • genotype (genetic makeup) phenotype (physical expression) On the left, is the eye’s phenotypes of green and black eye genes.
An Introduction to Bioinformatics Algorithms Genetic Information: Chromosomes • • • (1) Double helix DNA strand. (2) Chromatin strand (DNA with histones) (3) Condensed chromatin during interphase with centromere. (4) Condensed chromatin during prophase (5) Chromosome during metaphase
An Introduction to Bioinformatics Algorithms Chromosomes Organism Number of base pair number of Chromosomes ----------------------------------------------------Prokayotic Escherichia coli (bacterium) 4 x 106 1 Eukaryotic Saccharomyces cerevisiae (yeast) Drosophila melanogaster(insect) Homo sapiens(human) Zea mays(corn) 1. 35 x 107 1. 65 x 108 2. 9 x 109 5. 0 x 109 17 4 23 10
An Introduction to Bioinformatics Algorithms The organization of genes on a human chromosome
An Introduction to Bioinformatics Algorithms Human genome sequence
An Introduction to Bioinformatics Algorithms Comparison of genomes
• An Introduction to Bioinformatics Algorithms DNA Sequences • Chargaff and Vischer, 1949 • DNA consisting of A, T, G, C • Adenine, Guanine, Cytosine, Thymine • Chargaff Rule • Noticing #A #T and #G #C • A “strange but possibly meaningless” phenomenon. Wow!! A Double Helix • Watson and Crick, Nature, April 25, 1953 Discovery of DNA • • • 1 Biologist Rich, 1973 1 Physics Ph. D. Student • 900 Structural biologist at MIT. words Nobel Prize • DNA’s structure in atomic resolution. Crick Watson
An Introduction to Bioinformatics Algorithms Watson & Crick – “…the secret of life” • Watson: a zoologist, Crick: a physicist • “In 1947 Crick knew no biology and practically no organic chemistry or crystallography. . ” – www. nobel. se • Applying Chagraff’s rules and the X-ray image from Rosalind Franklin, they constructed a “tinkertoy” model showing the double helix • Watson & Crick with DNA model Their 1953 Nature paper: “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. ” Rosalind Franklin with X-ray image of DNA
An Introduction to Bioinformatics Algorithms • Humans have about 3 billion base pairs. DNA: The Basis of Life • • How do you package it into a cell? How does the cell know where in the highly packed DNA where to start transcription? • • • Special regulatory sequences DNA size does not mean more complex Complexity of DNA • Eukaryotic genomes consist of variable amounts of DNA • • Single Copy or Unique DNA Highly Repetitive DNA
An Introduction to Bioinformatics Algorithms Human Genome Composition
An Introduction to Bioinformatics Algorithms DNA, continued DNA has a double helix structure. However, it is not symmetric. It has a “forward” and “backward” direction. The ends are labeled 5’ and 3’ after the Carbon atoms in the sugar component. 5’ AATCGCAAT 3’ 3’ TTAGCGTTA 5’ DNA always reads 5’ to 3’ for transcription replication •
An Introduction to Bioinformatics Algorithms Basic Structure Phosphate Sugar
An Introduction to Bioinformatics Algorithms DNA - replication • • DNA can replicate by splitting, and rebuilding each strand. Note that the rebuilding of each strand uses slightly different mechanisms due to the 5’ 3’ asymmetry, but each daughter strand is an exact replica of the original strand. http: //users. rcn. com/jkimball. ma. ultranet/Biology. Pages/D/DNAReplication. html
An Introduction to Bioinformatics Algorithms Section 6: What carries information between DNA to Proteins
An Introduction to Bioinformatics Algorithms The flow of genetic information
An Introduction to Bioinformatics Algorithms DNA RNA: Transcription • • • DNA gets transcribed by a protein known as RNApolymerase This process builds a chain of bases that will become m. RNA and DNA are similar, except that RNA is single stranded and thus less stable than DNA • Also, in RNA, the base uracil (U) is used instead of thymine (T), the DNA counterpart
An Introduction to Bioinformatics Algorithms DNA RNA A T A G C C G C G A T A C G T A U G C G A=T G=C G T U C
An Introduction to Bioinformatics Algorithms Definition of a Gene • Regulatory regions: up to 50 kb upstream of +1 site • Exons: protein coding and untranslated regions (UTR) 1 to 178 exons per gene (mean 8. 8) 8 bp to 17 kb per exon (mean 145 bp) • Introns: splice acceptor and donor sites, junk DNA average 1 kb – 50 kb per intron • Gene size: Largest – 2. 4 Mb (Dystrophin). Mean – 27 kb.
An Introduction to Bioinformatics Algorithms Transcription: DNA pre m. RNA § Transcription occurs in the nucleus. § σ factor from RNA polymerase reads the promoter sequence and opens a small portion of the double helix exposing the DNA bases. § RNA polymerase II catalyzes the formation of phosphodiester bond that link nucleotides together to form a linear chain from 5’ to 3’ by unwinding the helix just ahead of the active site for polymerization of complementary base pairs. • The hydrolysis of high energy bonds of the substrates (nucleoside triphosphates ATP, CTP, GTP, and UTP) provides energy to drive the reaction. • During transcription, the DNA helix reforms as RNA forms. • When the terminator sequence is met, polymerase halts and releases both the DNA template and the RNA.
An Introduction to Bioinformatics Algorithms Central Dogma Revisited DNA Transcription Nucleus protein • • Splicing Pre m. RNA Spliceosome Translation Ribosome in Cytoplasm Base Pairing Rule: A and T or U is held together by 2 hydrogen bonds and G and C is held together by 3 hydrogen bonds. Note: Some m. RNA stays as RNA (ie noncoding RNA).
An Introduction to Bioinformatics Algorithms RNA processing: pre-RNA mature RNA • 5’ Cap • Poly-A • Splicing • Editing
An Introduction to Bioinformatics Algorithms Splicing
An Introduction to Bioinformatics Algorithms Alternative splicing
An Introduction to Bioinformatics Algorithms 5’ Cap of RNA
An Introduction to Bioinformatics Algorithms Poly. A addition
An Introduction to Bioinformatics Algorithms 3 How are Proteins Made?
An Introduction to Bioinformatics Algorithms Revisiting the Central Dogma • • In going from DNA to proteins, there is an intermediate step where m. RNA is made from DNA, which then makes protein • This known as The Central Dogma Why the intermediate step? • DNA is kept in the nucleus, while protein sythesis happens in the cytoplasm, with the help of ribosomes
An Introduction to Bioinformatics Algorithms The Central Dogma (cont’d)
An Introduction to Bioinformatics Algorithms Translation • • • The process of going from RNA to polypeptide. Three base pairs of RNA (called a codon) correspond to one amino acid based on a fixed table. Always starts with Methionine and ends with a stop codon
An Introduction to Bioinformatics Algorithms t. RNA
An Introduction to Bioinformatics Algorithms Translation, continued • Catalyzed by Ribosome • Using two different sites, the Ribosome continually binds t. RNA, joins the amino acids together and moves to the next location along the m. RNA • ~10 codons/second, but multiple translations can occur simultaneously http: //wong. scripps. edu/PIX/ribosome. jpg
An Introduction to Bioinformatics Algorithms The genetic code
An Introduction to Bioinformatics Algorithms Reading frames
An Introduction to Bioinformatics Algorithms Protein Synthesis: Summary • There are twenty amino acids, each coded by threebase-sequences in DNA, called “codons” • • The central dogma describes how proteins derive from DNA • • This code is degenerate DNA m. RNA (splicing? ) protein The protein adopts a 3 D structure specific to it’s amino acid arrangement and function
An Introduction to Bioinformatics Algorithms Simultaneous translation
An Introduction to Bioinformatics Algorithms Proteins • • Complex organic molecules made up of amino acid subunits 20* different kinds of amino acids. Each has a 1 and 3 letter abbreviation. Proteins are often enzymes that catalyze reactions. Also called “poly-peptides” *Some other amino acids exist but not in humans.
An Introduction to Bioinformatics Algorithms Proteins • Composed of a chain of amino acids. R 20 possible groups | H 2 N--C--COOH | H
An Introduction to Bioinformatics Algorithms 20 amino acids
An Introduction to Bioinformatics Algorithms Proteins R | H 2 N--C--COOH | H
An Introduction to Bioinformatics Algorithms Dipeptide This is a peptide bond R O R | II | H 2 N--C--C--NH--C--COOH | | H H
An Introduction to Bioinformatics Algorithms Protein structure • • Linear sequence of amino acids folds to form a complex 3 -D structure. The structure of a protein is intimately connected to its function.
An Introduction to Bioinformatics Algorithms How to Analyze DNA?
An Introduction to Bioinformatics Algorithms Analyzing a Genome • How to analyze a genome in four easy steps. • Cut it • • Copy it • • Use special chemical techniques to read the small fragments. Assemble it • • Copy it many times to make it easier to see and detect. Read it • • Use enzymes to cut the DNA in to small fragments. Take all the fragments and put them back together. This is hard!!! Bioinformatics takes over • • What can we learn from the sequenced DNA. Compare interspecies and intraspecies.
An Introduction to Bioinformatics Algorithms Polymerase Chain Reaction (PCR) • • Used to massively replicate DNA sequences. How it works: • • Separate the two strands with low heat Add some base pairs, primer sequences, and DNA Polymerase • Creates double stranded DNA from a single strand. • Primer sequences create a seed from which double stranded DNA grows. Now you have two copies. Repeat. Amount of DNA grows exponentially. • 1→ 2→ 4→ 8→ 16→ 32→ 64→ 128→ 256…
An Introduction to Bioinformatics Algorithms • DNA Cloning Insert the. DNA fragment into the genome of Cloning a living organism and watch it multiply. • • • Once you have enough, remove the organism, keep the DNA. Use Polymerase Chain Reaction (PCR) Vector DNA
An Introduction to Bioinformatics Algorithms Cutting DNA Restriction Enzyme “A” Cutting Sites Restriction Enzyme “B” Cutting Sites “A” and “B” fragments overlap • Restriction Enzymes cut DNA • • • Only cut at special sequences DNA contains thousands of these sites. Applying different Restriction Enzymes creates fragments of varying size. Restriction Enzyme “A” & Restriction Enzyme “B” Cutting Sites
An Introduction to Bioinformatics Algorithms Pasting DNA • Two pieces of DNA can be fused together by adding chemical bonds • • Hybridization – complementary basepairing Ligation – fixing bonds with single strands
An Introduction to Bioinformatics Algorithms Electrophoresis • A copolymer of mannose and galactose, agaraose, when melted and recooled, forms a gel with pores sizes dependent upon the concentration of agarose • The phosphate backbone of DNA is highly negatively charged, therefore DNA will migrate in an electric field • The size of DNA fragments can then be determined by comparing their migration in the gel to known size standards.
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info DNA Microarray Millions of DNA strands build up on each location. 12/25/2021 May, 11, 2004 Tagged probes become hybridized to the DNA chip’s microarray. http: //www. affymetrix. com/corporate/media/image_library_1. affx 60
An Introduction to Bioinformatics Algorithms www. bioalgorithms. info DNA Microarray Affymetrix Microarray is a tool for analyzing gene expression that consists of a glass slide. Each blue spot indicates the location of a PCR product. On a real microarray, each spot is about 100 um in diameter. 12/25/2021 May, 11, 2004 www. geneticsplace. com 61
An Introduction to Bioinformatics Algorithms Affymetrix Gene. Chip® Arrays Data from an experiment showing the expression of thousands of genes on a single Gene. Chip® probe array. May 11, 2004 http: //www. affymetrix. com/corporate/media/image_library_1. affx 14
An Introduction to Bioinformatics Algorithms • Beta globin chains of closely related species are highly similar: Observe simple alignments below: Human β chain: MVHLTPEEKSAVTALWGKV NVDEVGGEALGRLL Beta globins: • Mouse β chain: MVHLTDAEKAAVNGLWGKVNPDDVGGEALGRLL Human β chain: VVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG Mouse β chain: VVYPWTQRYFDSFGDLSSASAIMGNPKVKAHGKK VIN Human β chain: AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGN Mouse β chain: AFNDGLKHLDNLKGTFAHLSELHCDKLHVDPENFRLLGN Human β chain: VLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH Mouse β chain: MI VI VLGHHLGKEFTPCAQAAFQKVVAGVASALAHKYH There a total of 27 mismatches, or (147 – 27) / 147 = 81. 7 % identical
Human β chain: MVH L TPEEKSAVTALWGKVNVDEVGGEALGRLL An Introduction to Bioinformatics Algorithms Chicken β chain: MVHWTAEEKQL I TGLWGKVNVAECGAEALARLL Human β chain: VVYPWTQRFF ESFGDLSTPDAVMGNPKVKAHGKKVLG Beta globins: Cont. Chicken β chain: IVYPWTQRFF ASFGNLSSPTA I LGNPMVRAHGKKVLT AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGN Chicken β chain: SFGDAVKNLDNIK NTFSQLSELHCDKLHVDPENFRLLGD Human β chain: Mouse β chain: VLVCVLAHHFGKEFTPPVQAAY QKVVAGVANALAHKYH I L I I VLAAHFSKDFTPECQAAWQKLVRVVAHALARKYH -There a total of 44 mismatches, or (147 – 44) / 147 = 70. 1 % identical - As expected, mouse β chain is ‘closer’ to that of human than chicken’s.
An Introduction to Bioinformatics Algorithms Molecular evolution can be visualized with phylogenetic tree.
An Introduction to Bioinformatics Algorithms Origins of New Genes. • All animals lineages traced back to a common ancestor, a protish about 700 million years ago.
An Introduction to Bioinformatics Algorithms How Do Different Species Differ? • • As many as 99% of human genes are conserved across all mammals The functionality of many genes is virtually the same among many organisms It is highly unlikely that the same gene with the same function would spontaneously develop among all currently living species The theory of evolution suggests all living things evolved from incremental change over millions of years
An Introduction to Bioinformatics Algorithms Mouse and Human overview • • Mouse has 2. 1 x 109 base pairs versus 2. 9 x 109 in human. About 95% of genetic material is shared. 99% of genes shared of about 30, 000 total. The 300 genes that have no homologue in either species deal largely with immunity, detoxification, smell and sex* *Scientific American Dec. 5, 2002
An Introduction to Bioinformatics Algorithms Human and Mouse Significant chromosomal rearranging occurred between the diverging point of humans and mice. Here is a mapping of human chromosome 3. It contains homologous sequences to at least 5 mouse chromosomes.
An Introduction to Bioinformatics Algorithms Comparative Genomics • What can be done with the full Human and Mouse Genome? One possibility is to create “knockout” mice – mice lacking one or more genes. Studying the phenotypes of these mice gives predictions about the function of that gene in both mice and humans.
An Introduction to Bioinformatics Algorithms Future reading and references • Molecular Cell Biology Lodish, Harvey; Berk, Arnold; Zipursky, S. Lawrence; Matsudaira, Paul; Baltimore, David; Darnell, James E. New York: W. H. Freeman & Co. ; c 1999
- Slides: 71