INTRODUCTION TO BIOINFORMATICS Dilvan Moreira based on Prof
INTRODUCTION TO BIOINFORMATICS Dilvan Moreira (based on Prof. André Carvalho presentation)
Objectives 2 Learning of: Basic concepts Main tecniques Future prospects and Application in Bioinformatics André de Carvalho - ICMC/USP 25/09/2020
Content 3 Molecular Biology Sequence analysis Gene recognition Sequence alignment Use of Markov chains in biological data Variation among and inside specie André de Carvalho - ICMC/USP 25/09/2020
Content 4 Natural selection in molecular level Phylogenetic analysis Genome comparison Gene expression analysis Identification of regulatory sequences André de Carvalho - ICMC/USP 9/25/2020
5 In the end of the course, you will be able to Understand statistical and algorithmic approaches to sequence and gene expression analyses. Understand the role of computation in the modern biology Read and comprehend recent articles about genome, in particular the aspects associated to data analysis Become familiar with standard problems and tools, beyond the area current and future objectives Access and handle real genomic data André de Carvalho - ICMC/USP 9/25/2020
About the course 6 Course directed to problems The book reading is indispensable! Experimental Discussion classes based on the didactic material Weekly homework André de Carvalho - ICMC/USP 9/25/2020
Bibliography 7 Nello Cristianini and Matthew W. Hahn: Introduction to Computational Genomics: A Case Studies Approach, Cambridge, 2007 Neil C. Jones and Pavel A. Pevzner: An Introduction to Bioinformatics Algorithms, MIT Press, 2004 André de Carvalho - ICMC/USP 9/25/2020
Bibliography 8 Site: www. computational-genomics. net André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 11 Research and development of computational, mathematical and statistical tools to solve problems from Biology Molecular Biology A Computação está para a Biologia da mesma forma que a Matemática está para a Física. Harold Morowitz André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 12 Areas from Bioinformatics Development of new techniques and algorithms for research in biological database Development and implementation of tools which allow efficient management and access of several kinds of information André de Carvalho - ICMC/USP 9/25/2020
Bioinformatics 13 Historical 1960: first aminoacid sequences databases 1960 -1970: development of algorithms do analyze those data 1980: launch of Gene. Bank and other public database With analyses tools 1990: Huge growth of DBs Gen. Bank and PDB André de Carvalho - ICMC/USP 9/25/2020
Bioinformatics 14 Research benefits several areas Medicine - Pharmacy - Agriculture Medicine Improve disease diagnosis Detect genetic propensity to diseases Use gene therapy as medicine Allow the development of “personal medicine”, based on individual genetical profile André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 15 First step to understand the cell operation: To It know of its nucleotide sequence (DNA) defines the entire organism genome Set of all its DNA André de Carvalho - ICMC/USP 9/25/2020
Bioinformatics 16 Genomics ground zero Publication of the hole sequencing of a free-living organism (Science, 1995) Haemophilus influenza bacterium The Institute for Genomic Research (TIGR) Craig Venter Until then, only small viral organism or small parts of other genomes First genomic sequence (Phage virus phi. X 174) in 1978 André de Carvalho - ICMC/USP 9/25/2020
Bioinformatics 17 Haemophilus influenza Causer of several clinical diseases Menigite and septicemia, both usually occur in children, infeccion of middle ear, etc Until 1933, it was incorrectly consider as the cause of common flu (influenza) Chosen because one of the leaders of the project had been working with it from decades They could build high qualities DNA libraries 1. 830. 140 DNA base pairs in one single circular chromosome André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 18 Few months later, TIGR published another bacterium genome Mycoplasma genitalium Causer of pelvic inflamation TIGR created revolutionary computational methods to obtain and assemble genome sequences Soon after, by other group, it was published the first eukaryote sequence Sacharomyces cerevisiae fungi (yeast) André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 19 January 2008 TIGR creates synthetic bacterium genome Utilizes parts of Mycoplasma genitalium Plan: insert in a living cell and create the first artificial organism André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 21 In the last years several genomes were sequenced Generating Until a great amount of data January 2008: More than 3500 sequencing projects Around 700 organisms had been completely sequenced André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 22 Source: http: //www. genomesonline. org/gold_statistics. htm André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 23 Source: http: //www. genomesonline. org/gold_statistics. htm André de Carvalho - ICMC/USP 25/09/2020
Growth of Nucleotide Sequences Database 24 Moore's Law DNA André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 25 Source: http: //www. genomesonline. org/gold_statistics. htm André de Carvalho - ICMC/USP 25/09/2020
Ongoing Genome Projects 26 2007 Archaea Procaryote Eucaryote Total 12 193 81 EUA UE Japão Outros Indústrias 7 2 88 53 47 18 3 4 1 13 6 Archaebacteria: Unicell procariotes organisms Considered an intermediate group between eucaryote and procaryote André de Carvalho - ICMC/USP 25/09/2020 2 36 6
Bioinformatics 27 Pace of genome projects Examples of complete published genomes Human Mouse Drosophila melanogaster Arabidopsis thaliana Yeast Domestic animals genome André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 28 Most known sequencing: Humans: Yeast: trillions of cells 1 cell Made by two groups Human genome project Public International Celera consortia Genomics Private André de Carvalho - ICMC/USP 9/25/2020
The Human Genome Project 29 Officially initiated in 1990 Originally planned for during 15 years Technological advances brought the conclusion to 2003 Coordinated by U. S. Department of Energy and for National Institutes of Health André de Carvalho - ICMC/USP 9/25/2020
30 G 16 Human Genome Sequencing Consortium 1. Baylor College of Medicine, Houston, Texas, USA 2. Beijing Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, China 3. Gesellschaft für Biotechnologische Forschung mb. H, Braunschweig, Germany 4. Genoscope, Evry, France 5. Genome Therapeutics Corporation, Waltham, MA, USA 6. Institute for Molecular Biotechnology, Jena, Germany 7. Joint Genome Institute, U. S. Department of Energy, Walnut Creek, CA, USA 8. Keio University, Tokyo, Japan 9. Max Planck Institute for Molecular Genetics, Berlin, Germany 10. RIKEN Genomic Sciences Center, Saitama, Japan 11. The Sanger Centre, Hinxton, U. K. 12. Stanford DNA Sequencing and Technology Development Center, Palo Alto, CA, USA 13. University of Washington Genome Center, Seattle, WA, USA 14. University of Washington Multimegabase Sequencing Center, Seattle, WA, USA 15. Whitehead Institute for Biomedical Research, MIT, Cambridge, MA, USA 16. Washington University Genome Sequencing Center, St. Louis, MO, USA Funding: National Institutes of Health (US) , Department of Energy (US) , Medical Research Council of Great Britain and Northern Ireland (UK) Wellcome Trust (UK) André de Carvalho - ICMC/USP 25/09/2020
The Human Genome Project 31 Main objectives: To identify all human DNA genes To determine the base pars sequence which compose the human DNA Store this information in a DB To improve tools to data analysis To transfer acquired technology to the private sector To address the ethical, legal and social themes that would arise from the project André de Carvalho - ICMC/USP 9/25/2020
The Human Genome Project 32 Some results Number A's, The of bases: 3 billions C's, G's and T's exact number of genes is still unknown Currently between 20. 000 and 25. 000 genes André de Carvalho - ICMC/USP 9/25/2020
Human Genome Cromosomes 33 André de Carvalho - ICMC/USP 25/09/2020
The Human Genome Project 34 More recently numbers Released in October 2004 It was identified 19. 599 genes + 2. 188 possibles genes Less A than the rice 50. 000 to 60. 000 (shorter genes) little more than the nematode 20. 000 genes André de Carvalho - ICMC/USP 9/25/2020
Bioinformatics 35 Nematode (Caenorhabditis elegans) Worm More than 20. 000 genes Already gave hints about diabetes, aging process and cancer development Has a gene that regulates the organ formation Can increase the development of artificial organs André de Carvalho - ICMC/USP 9/25/2020
Bioinformatics 36 Some results of human genome project One gene have around 3. 000 bases The sizes vary a lot Major known human gene has 2. 4 millions of bases The genes are concentrated in random areas through the genome The function of more than 50% of the genes is unknown André de Carvalho - ICMC/USP 9/25/2020
Bioinformatics 37 Some results of human genome project Around 2% of the genome codes for instructions for the protein synthesis More than 40% of the expected human proteins show similarities with worm and house flies proteins The genomes from human and chimp are 98. 5% genetically identical André de Carvalho - ICMC/USP 9/25/2020
Bioinformatics 38 Objetctive of Genetic community: Equivalent to the sequencing of all human genome for US$1, 000 News from “Terra” website, January 22, 2008 The American genetic research company 23 and. Me, sponsored by google, offers on Europe the internet dispatch of a personalized copy of DNA by internet around US$ 999 http: //tecnologia. terra. com. br/interna/0, , OI 2260729 EI 4802, 00. html André de Carvalho - ICMC/USP 9/25/2020
Genomes 39 Several specie genomes and individuals from the same specie are available Scientists are comparing them André de Carvalho - ICMC/USP 9/25/2020
Global Market 40 Millions of dollars Global sells in Bioinformatics $6, 000 $5, 000 $4, 000 $3, 000 $2, 000 $1, 000 $0 2001 2002 2003 2004 2005 2006 2007 Year André de Carvalho - ICMC/USP 2008 2009 25/09/2020 2010
Bioinformatics 41 Generated data need to be analyzed Progressive emphasis change interpretation Laboratorial Analysis: expensive and hardworking Accumulation Need of sophisticated computational tools André de Carvalho - ICMC/USP 25/09/2020
Bioinformatics 42 Data Structure Computer Network Artificial Computational Intelligence Bio inspired Theory Computing Bioinformatics Parallel Computing Software Engineering Optimization Internet Graphic Computing Database Image Processing André de Carvalho - ICMC/USP 9/25/2020
Questions?
- Slides: 43