Sequence Analysis with Artemis and Artemis Comparison Tool
- Slides: 38
Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18 th-29 th January , 2010
atcttttacttttttcatcatctatacaaaaaatcatagaatattcatcatgttgtttaaaataatgtattccattatgaactttattacaaccctcgtt tttaattcacattttatatctttaagtataatatcatttaacattatgttatcttcctcagtgtttttcattattatttgcatgtacagtttatca tttttatgtaccaaactatatcttatattaaatggatctctacttataaagttaaaatctttttttaattttttcacttccaattttatattccg cagtacatcgaattctaaaaaataaataatatataatatataataaataatatataatatataataatatataatatataataaataatatataatatataatactttggaaagattattt atatgaatatatacacctttaataggatacacacatcatatttatatacatataaatattccataaatatttatacaacctcaaataaaca tacatatataaatatatacatatatgtatcattacgtaaaaacatcaaagaaatatactggaaaacatgtcacaaaactaaaaaaggtattagg agatatatttactgattcctcatttttataaatgttaaaattattatccctagtccaaatatccacatttattaaattcacttgaatattgttttttaaa ttgctagatatattaatttgagatttaaaattctgacctatataaacctttcgagaatttataggtagacttaaacttatttcatttgataaactaatat tatcatttatgtccttatcaaaattttctccatttcagttattttaaacatattccaaatattgttattaaacaagggcggacttaaacgaagtaa ttcaatcttaactccttcactcattttatatattccttaatttttactatgtttattaacatataaacaaatatgtcactaa taatatatatatatatattataaatgttttactctattttcacatcttgtcctttttaaaaatcccaattcttattcat taaataataatgtattttttttttttattattatgttactgttttattatatacactcttaatcatatatttatatatatatattattcccttttcatgttttaaacaagaaaactaaaaaaataataaaatatatttttataacatatgt attattaaaatgtataaaaatattccatttattattatttttttatatacattgttataagagtatcttctcccttctggtttatattacta ccatttcactttgaacttttcataaaaattaatagaatatcaaatatgtataatatataacaaaaaaaaaaaaaaaaaaata tatatatatacatatatatttcatctaatcatttaaaattattattatatattttttaaaaaatatatttatgataacataaaaaga atttaattaaatataattacatctaatattattatataataagttttccaaatagaatacttatatatatatattcttccataaaaagaataaaataaaaacaccttaaaagtatttgtaaaaaattccccacattgaatatatagttgtattt ataaaattaaagaaaaagcataaagttaccatttaatagtggagattagtaacattttcttcattatcaaaaatatttcctaattttttg taaaatatatttaaaaatgtaatagattatgtattaaataatatagcaaaatgttcaattttagaaatttgcctctttttgacaaggataattc aaaagatacaggtaaaaaaataaagtaaaacaaaacaaaaaaaaaaaaatgacatgttataatataa taaaaattatgtaatatatcataatcgaagaaacatatatgaaaccaaaaagaaacagatcttgatttattaatacatatataacattcata tctttatttttgtagatgatataaaaaattttataaactcttatgaagggatatatttttcatcatccaataaatttataaatgtatttctagacaaaat tctgatcattgatccgtcttccttaaatgttattacaatacagatctgtagttgatttcctttttaatgagaaaaataagaatcttattgtt ttagggtaatgaaatatagatttatatttttatttattatatattattttttaatttttcttttatatattttatttagtgtataaaa tgatatcctttatatttacatgggatattcaaataataacaaaaatgagtatacacatatatatatatatgtatatttttttatgttcctataggaaagggaagaattcactgatttgtagtgtttacaatattagggaatgcaactttacacttttgaaaaaaattcagtta agcaaaaatattaataacattaaaaagacactgatagcaaaatgtaatgaatatataataacattagaaaataagaaaattactttttatttcttaaata aagattatagtataaatcaaagtgaattaatagaagacggaaaagaacttattgaaaatatctatttgtcaaaaaatcatatcttgttagtaataaaaaa ttcatatgtatataccaattagatattaaaaattcccatattagttatacacttattgatagtttcaatttatcctacctcagagaatct ataataaaagcatataaaataaatgatgtatcaaataatgacccaaaaaaggataataatgaaaaaaatacttcatctaataatataa
atcttttacttttttcatcatctatacaaaaaatcatagaatattcatcatgttgtttaaaataatgtattccattatgaactttattacaaccctcgtt tttaattcacattttatatctttaagtataatatcatttaacattatgttatcttcctcagtgtttttcattattatttgcatgtacagtttatca tttttatgtaccaaactatatcttatattaaatggatctctacttataaagttaaaatctttttttaattttttcacttccaattttatattccg cagtacatcgaattctaaaaaataaataatatataatatataataaataatatataatatataataatatataatatataataaataatatataatatataatactttggaaagattattt atatgaatatatacacctttaataggatacacacatcatatttatatacatataaatattccataaatatttatacaacctcaaataaaca tacatatataaatatatacatatatgtatcattacgtaaaaacatcaaagaaatatactggaaaacat gtcacaaaactaaaaaaggtattagg agatatatttactgattcctcatttttataaatgttaaaattattatccctagtccaaatatccacatttattaaattcacttgaatattgttttttaaa ttgctagatatattaatttgagatttaaaattctgacctatataaacctttcgagaatttataggtagacttaaacttatttcatttgataaactaatat tatcatttatgtccttatcaaaattttctccatttcagttattttaaacatattccaaatattgttattaaacaagggcggacttaaacgaagtaa ttcaatcttaactccttcactcattttatatattccttaatttttactatgtttattaacatataaacaaatatgtcactaa taatatatatatatatattataaatgttttactctattttcacatcttgtcctttttaaaaatcccaattcttattcat taaataataatgtattttttttttttattattatgttactgttttattatatacactcttaatcatatatttatatatatatattattcccttttcatgttttaaacaagaaaactaaaaaaataataaaatatatttttataacag atgt attattaaaatgtataaaaatattccatttattattatttttttatatacattgttataagagtatcttctcccttctggtttatattacta Extracting information & interpreting ccatttcactttgaacttttcataaaaattaatagaatatcaaatatgtataatatataacaaaaaaaaaaaaaaaaaaata tatatatatacatatatatttcatctaatcatttaaaattattattatatattttttaaaaaatatatttatgataacataaaaaga atttaattaaatataattacatctaatattattatataataagttttccaaatagaatacttatatatatata What´s there tatatatattcttccataaaaagaataaaataaaaacaccttaaaagtatttgtaaaaaattccccacattgaatatatagttgtattt ataaaattaaagaaaaagcataaagttaccatttaatagtggagattagtaa gtttttcttcattatcaaaaatatttcctaattttttg where are the genes taaaatatatttaaaaatgtaatagattatgtattaaataatatagcaaaatgttcaattttagaaatttgcctctttttgacaaggataattc which genes aaaagatacaggtaaaaaaataaagtaaaacaaaacaaaaaaaaaaaaatgacatgttataatataa taaaaattatgtaatatatcataatcgaagaaacatatatgaaaccaaaaagaaacagatcttgatttattaatacatatataacattcata how to find them? tctttatttttgtagatgatataaaaaattttataaactcttatgaagggatatatttttcatcatccaataaatttataaatgtatttctagacaaaat tctgatcattgatccgtcttccttag gtgttattacaatacagatctgtagttgatttcctttttaatgagaaaaataagaatcttattgtt ttagggtaatgaaatatagatttatatttttatttattatatattattttttaatttttcttttatatattttatttagtgtataaaa SEQUENCE ANNOTATION tgatatcctttatatttacatgggatattcaaataataacaaaaatgagtatacacatatatatatatatgtatatttttttatgttcctataggaaagggaagaattcactgatttgtagtgtttacaatattagggaatgcaactttacacttttgaaaaaaattcagtta agcaaaaatattaataacattaaaaagacactgatagcaaaatgtaatgaatatataataacattagaaaataagaaaattactttttatttcttaaata aagattatagtataaatcaaagtgaattaatagaagacggaaaagaacttattgaaaatatctatttgtcaaaaaatcatatcttgttagtaataaaaaa ttcatatgtatataccaattag atattaaaaattcccatattagttatacacttattgatagtttcaatttatcctacctcagagaatct ataataaaagcatataaaataaatgatgtatcaaataatgacccaaaaaaggataataatgaaaaaaatacttcatctaataatataa Sequencing is just the beginning of the process
Strategies for sequence annotation Ø Predictive methods Interpretation of the DNA sequence into genes according to rules Ø Comparative methods Ø Experimental methods
Strategies for sequence annotation å Predictive methods Interpretation of the DNA sequence into genes according to rules å Comparative methods Interpretation of the DNA sequence into genes according to similarities with other sequences å Experimental methods
Strategies for sequence annotation å Predictive methods Interpretation of the DNA sequence into genes according to rules å Comparative methods Interpretation of the DNA sequence into genes according to similarities with other sequences å Experimental methods Interpretation of the DNA sequence into genes according to experimental results (e. g. c. DNA)
EST Blast Hit
Gene prediction programs: ORFs and CDSs ORFs are not equivalent to CDSs Not all open reading frames are coding sequences
Gene prediction Orpheus PHAT Gene. Mark Glimmer Gene finder
Gene finding programs • Genefinding software packages use Hidden Markov Models. • Predict coding, intergenic and intron sequences • Need to be trained on a specific organism. • Never perfect!
Gene prediction programs: Problems • ORFs are not equivalent to CDSs • Gene prediction programs find new genes that share properties with a given set of genes. • They can be confounded by: – – – Sequence constraints (ribosomal proteins etc. ) Sequence biases Different sets of genes Horizontal gene transfer Non-coding DNA
Gene prediction programs: Problems Different gene training sets: Plasmodium falciparum Original annotation Updated annotation
Gene prediction programs: Problems Non-protein coding regions: S. typhi ribosomal RNA genes final genefinder orpheus glimmer orpheus genefinder final
Gene prediction programs: Problems Non-protein coding regions: N. meningitidis DNA repeats final orpheus glimmer orpheus final
Gene prediction programs: Problems Pseudogenes M. leprae
Gene prediction programs: Problems Pseudogenes: M. leprae Glimmer
Gene prediction programs: Problems Pseudogenes: M. leprae ORPHEUS
Gene prediction programs: Problems Pseudogenes: M. leprae WUBLASTX vs. M. tuberculosis
Gene prediction programs: Problems Pseudogenes: M. leprae Final annotation
The Gene Prediction Process ESTs ANNALYSIS SOFTWARE DNA SEQUENCE FASTA Blast. X Gene finders Codon Usage AT content Annotator Usefull CDS Prediction
Eukaryotic gene 5’UTR Exon I intron ATG GT AG stop Exon III 3’UTR Exon II GT AG CAP AAAAAAAAAA m. RNA TTTTT c. DNA TTTTT EST
AT content • Coding regions have higher GC content in AT rich genomes
AT content
CODON USAGE • Codon bias is different for each organism. • DNA content in coding regions is restricted – but it is not restricted in non coding regions. • The codon usage for any particular gene can influence expression.
Codon usage • All organisms have a preferred set of codons. Malaria GUU GUC GUA GUG 0. 41 0. 06 0. 42 0. 11 Trypanosoma GUU GUC GUA GUG 0. 28 0. 19 0. 14 0. 39
Codon Usage • http: //www. kazusa. or. jp/codon/
Codon Usage in Artemis Forward frames Reverse frames
Codon usage & gene finding in : Leishmania
GC frame plot • Plots the third position GC content of each frame of a DNA sequence. • In coding DNA the GC content of the 3 rd base is often higher. • Good prediction of coding in malaria and trypanosomes.
GC frame plot of tubulin gene cluster on T. brucei Chr 1
Homology Data • Coding regions are more conserved than non coding regions due to selective pressure. • Comparing all possible translations against all known proteins will give clues to known genes. • Blastx
Gene finding: using ACT P. yoelii P. falciparum P. knowlesi TBLASTX comparisons
Gene finding by RNA-Seq (Transcriptional landscape of Neospora caninum Tachyzoites Day 3 Tachyzoites (RNAseq) Day 4 Tachyzoites (RNAseq)
Transcriptome sequencing in Neospora (RNAseq is useful for predicting/confirming UTR boundaries) Day 3 Tachyzoites (RNAseq) Day 4 Tachyzoites (RNAseq) N. caninum Chr 08 TBLASTX matches visualised in ACT T. gondii Chr 08 5’ UTR 3’ UTR
RNA-Seq: correcting gene models Before %GC __16 hr, __32 hr, __48 hr After %GC
- Artemis comparison tool
- Act artemis
- Artemis comparison tool
- Artemis comparison tool
- Direct comparison test
- Difference between finite and infinite sequence
- Poseidon weakness
- Gods and goddesses names
- Ares realm and symbol
- Realm of zeus
- Hades roman name and symbol
- Kembaran dewa apollo
- Amino acid nucleotide
- Selection pseudocode example
- Convolutional sequence to sequence learning.
- What is file format in bioinformatics
- Potter's tool is data cleaning tool
- Artemis records retention
- What are artemis's weaknesses
- Traditional story about gods and goddesses
- Artemis six wishes
- Domain of artemis
- Actaeon
- Artemis de nicandra
- Artemis sphere of influence
- Artemis
- Artemis oca
- Facts about athena
- Dionysose kaaslane
- Themis
- Artemis constellation
- Samira hassani
- Artemis kirk
- Griekse goden olympus
- Artemis ia
- 7 wonders of the ancient world map
- Artemis industry association
- Hms artemis
- Hephaestus symbol