Things that may help with comprehension of bioinformatics
Things that may help with comprehension of bioinformatics issues in general and Rosalind problems in particular
Problems 1. 2. 3. 4. Counting DNA nucleotides Transcribing DNA into RNA Complementing a strand of DNA Rabbits and Recurrence Relations • Population growth model 5. GC Content
Problems 1 • Counting DNA nucleotides • • 2 • 4 5 • • • What are they? Put together in genome as a string Some cool features of genome structure Transcribing DNA into RNA • • 3 Intro to nucleotides The protein coding parts The central Dogma Codons and other structural elements of genes Complementing a strand of DNA Issues of complementarity Rabbits and Recurrence Relations Population growth model GC Content Signatures of different parts of genomes and differences among genomes
This course pays a bit of extra attention to data applications in the life sciences, such as DNA sequencing.
Bioinformatics • “the science of collecting and analyzing complex biological data such as genetic codes. ” • “conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand organize the information associated with these molecules, on a large-scale. ” http: //www. ncbi. nlm. nih. gov/news/01 -23 -2015 -genbank-trillion-bases/
http: //www. nature. com/scitable/resource? action=show. Full. Image. For. Topic &img. Src=/scitable/content/ne 0000/ne 0000/78429/Databa ses_Fig 1_FULL. jpg
Characteristics of DNA relevant for bioinformatics • A linear string • Exists as double stranded helix – Each strand has directionality – Rules for pairing • Genome has different regions – Protein coding • Genetic code – RNA coding
>Rosalind_6404 CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCACTAATAATTCTGAGG >Rosalind_5959 CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT ATATCCATTTGTCAGCAGACACGC >Rosalind_0808 CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC TGGGAACCTGCGGGCAGTAGGTGGAAT
RNA and DNA each contain 4 Nitrogenous Bases
Bases, sugars, and phosphates combine to be “nucleotides”
RNA and DNA differ in the nature of the sugar molecule that they contain. 5 Carbons – (5’ and 3’)
The building blocks of DNA (and RNA) are Nucleotides (=nucleoside triphosphates)
Nucleoside bases are linked together in chains of RNA or DNA by phosphodiester (phosphate-sugar) bonds
RNA and DNA differ in two ways — 1. sugar molecule they use 2. one baseuracil in RNA, but thymine in DNA. The other bases (adenine, guanine, and cytosine) occur in both molecules.
Hydrogen bonds connect A and T and G and C. Watson and Crick
Based on X-ray crystallography data from Franklin and Wilkins, W&C proposed a doublehelix model of DNA Rosalind Franklin
THE CENTRAL DOGMA http: //www. ncbi. nlm. nih. gov/Class/MLACourse/Modules/Mol. Bio. Review/central_dogma. html
Transcribing a gene in more detail: Making sense of anti-sense Sense strand: What we think of as the coding sequence for a gene. Sequence matches m. RNA sequence. Also called “plus strand” or “non-template strand”. Anti-sense strand: The strand actually read by RNA polymerase to create the m. RNA in the 5’ to 3’ direction. (This strand is read in the anti-parallel direction to build RNA 5’ to 3’. ) Also called “minus strand” or “template strand”.
Problem solving The DNA for a given gene reads as follows: 3’ TACGGTACTATC 5’ 5’ ATGCCATGATAG 3’ The bottom strand shows the coding region/nontemplate strand. A. What should the newly synthesized RNA read? 5’ AUGCCAUGAUAG 3’ B. Which strand will RNA polymerase attach to, and in which direction will it read? The top strand, reading 3’ to 5’
THE CENTRAL DOGMA!
The ‘universal’ degenerate code
Bioinformatics sites • Translate DNA to protein – http: //web. expasy. org/translate/ • Search NCBI database to identify sequence – http: //blast. ncbi. nlm. nih. gov/blast/Blast. cgi
An unknown sequence for you to play around with as you wish GGCACGAGAAAAGACTAGTTGCTCACTGGAAAAAGTC TAAAAATGAGGTTTCTCGTTGGAGCAGTATTAGTTGTT GTGTTGGTGGCTTGTGCCACGGCATTCGAAAGTGATG CCGAAACTTTTAAATCTCTTGTTGTAGAAGAAAA TGCCACGGAGATGGTTCCAAGGGCTGTGCCACAAAGC CTGATGACTGGTGCTGCAAGAATACACCTTGCAAGTG CCCCGCCTGGTCCTCCACAAGTGCAGGTGCGCA ATGGACTGCAGCCGAAGATGCAAAGGCAAACGAGCA TTGTTGTTGCCAGTTGAGACTCACCGACTACTCTTCCCT GAACAATGGTGAAGCCATTGACATCGATATATCATCTA CTGTTATGTACTGTAAAAACAAATAAAGTTACTTATGC AGTAAAAAAAAAAAAA
- Slides: 23