Bioinformatics Email d 7526010csie ntu edu tw Please
生物資訊 (Bioinformatics) 蔡懷寬 E-mail: d 7526010@csie. ntu. edu. tw
Please tell me n Why you are here? n Make a definition of bioinformatics
Introduction n What is bioinformatics? n Why bioinformatics? n The past, current, and future in bioinformatics
REVOLUTION IN BIO-MEDICAL RESEARCH CLASSICAL APPROACH Northern Hybridization Western Hybridization Southern Hybridization RFPD HIGH-THROUGHPUT APPROACH Differential Display Subtraction Library Real-Time PCR Microarray 2 -Dimensional Protein Electrophoresis Serial Analysis of Gene Expression Sequence Tags EXPERIMENT DRIVEN Hypothesis Experiment INFORMATION DRIVEN Experiment Hypothesis
The Gene. Bank Data (9/25/2002) Year Base Pairs Sequences 1982 680338 606 1983 2274029 2427 1984 3368765 4175 1985 5204420 5700 1986 9615371 9978 1987 15514776 14584 1988 23800000 20579 1989 34762585 28791 1990 49179285 39533 1991 71947426 55627 1992 101008486 78608 1993 157152442 143492 1994 217102462 215273 1995 384939485 555694 1996 651972984 1021211 1997 1160300687 1765847 1998 2008761784 2837897 1999 3841163011 4864570 2000 11101066288 10106023 2001 15849921438 14976310
Protein Data. Bank Data (9/25/2002)
基因組(genome) n n All the genetic material in the chromosomes of a particular organism Its size is generally given as its total number of base pairs.
基因組的大小 n n n Human: 3000 million bases Mouse: 3000 million bases Drosophila (fruit fly): 165 million bases Nematode (roundworm): 100 million bases Yeast (fungus): 14 million bases E. coli (bacteria) 4. 67 million bases
人類基因組解讀計畫 n n 簡稱為HGP (Human Genome Project) 主要目標有: n n n identify all the genes in human DNA, determine the sequences of the 3 billion chemical bases that make up human DNA store this information in databases develop tools for data analysis transfer related technologies to the private sector address the ethical, legal, and social issues (ELSI) that may arise from the project
Human Genome
HGP的沿革與進展(續) n 2001年 2月: n n Initial sequencing and analysis of the human genome (Nature, Vol. 409, 15 Feb. 2001, by International Human Genome Sequencing Consortium) The sequence of the human genome (Science, Vol. 291, 16 Feb. 2001, by J. C. Venter, et al. )
Biology moves into the silicon stage in vivo in vitro in silico
Before HGP n String analysis n Pair-wise, multiple sequence alignment
Sequence Analysis Alignment n n Pair-wise alignment SURVIVE SURIUE SUR- IUE Multiple sequence alignment RPCVCPVLRQAAQ RPCACCPVLRQVVQ KPCLCPRQLRQV KPCCPRQAAQ S s 1 s 2 RPCACCP__VLRQVVQ a 2 s 3 s 4 RPCVC_ P__VLRQAAQ a 1 KPCLC_ P RQLRQV_ _ a 3 KPC_C_ P____ RQAAQ a 4 A
Before HGP n String alignment n n Pair-wise, multiple alignment Linkage analysis
Linkage Analysis
Before HGP n String alignment n n n Pair-wise, multiple alignment Linkage analysis Phylogenetic tree
Phylogenetic Tree
Phylogenetic Tree
Before HGP n String alignment n n n Pair-wise, multiple alignment Linkage analysis Phylogenetic tree Protein structure prediction …
Protein Structure Prediction
During HGP n Sequencing n n Physical mapping Fragment assembly
Sequencing Strategies (1) • Map-Based Assembly: • Create a detailed complete fragment map • Time-consuming and expensive • Provides scaffold for assembly • Original strategy of Human Genome Project
Sequencing Strategies (2) • Shotgun: • Quick, highly redundant – requires 7 -9 X coverage • • • for sequencing reads of 500 -750 bp. This means that for the Human Genome of 3 billion bp, 21 -27 billion bases need to be sequence to provide adequate fragment overlap. Computationally intensive Troubles with repetitive DNA Original strategy of Celera Genomics
Shotgun Sequencing: Assembly of Random Sequence Fragments • To sequence a Bacterial Artificial Chromosome (100 -300 Kb), millions of copies are sheared randomly, inserted into plasmids, and then sequenced. If enough fragments are sequenced, it will be possible to reconstruct the BAC based on overlapping fragments.
During HGP n Sequencing n n n Physical mapping Fragment assembly Gene Prediction
During HGP n Sequencing n n Physical mapping Fragment assembly Gene Prediction …
After HGP (Post Genomic) n Microarray
Microarray
After HGP (Post Genomic) n n Microarray Regulatory network
Regulatory Network Simplified representation of the NF- B network.
After HGP (Post Genomic) n n Microarray Regulatory network Proteomics …
生物資訊相關主題(6) n 蛋白體學(proteomics) methodological developments in protein separation and characterization n advances in bioinformatics, and n novel applications of proteomics in all areas of the life sciences and industry. (These endeavours give new insights into protein functions, interactions and pathways. ) n
生物資訊相關主題(8) n 其他課題: n n n RNA二維結構預測(RNA secondary structures) 比較基因組學(comparative genomics) 基因網路(genetic networks) 微陣列晶片(microarrays 或稱基因晶片) 分子計算機(molecular computers)
Bioinformatics and Computational Biology -Related Journals: n n n n n Bioinformatics (期刊原名為CABIOS) Bulletin of Mathematical Biology Computers and Biomedical Research Genome Research Genomics Journal of Computational Biology Journal of Molecular Biology Nature Science
Bioinformatics and Computational Biology -Related Conferences: n n the first IEEE Computer Society Bioinformatics Conference (CSB 2002, CA, USA) Intelligent Systems for Molecular Biology (ISMB 2003, Brisbane, Australia) Pacific Symposium on Biocomputing (PSB 2003, Kauai, Hawaii, USA) The Seventh Annual International Conference on Research in Computational Molecular Biology (RECOMB 2003, Berlin, Germany)
Bioinformatics and Computational Biology-Related Books: n n n Calculating the Secrets of Life: Applications of the Mathematical Sciences in Molecular Biology, by Eric S. Lander and Michael S. Waterman (1995) Introduction to Computational Biology: Maps, Sequences, and Genomes, by Michael S. Waterman (1995) Introduction to Computational Molecular Biology, by Joao Carlos Setubal and Joao Meidanis (1996) Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997) Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000) Introduction to Bioinformatics, by Arthur M. Lesk (2002)
生物資訊學相關網頁 n MIT Biology Hypertextbook n n n The International Society for Computational Biology: n n n http: //www. ncbi. nlm. nih. gov/ (NCBI, EBI 及 DDBJ是目前生物序列的三大集散中心, 它們互相傳遞資料) European Bioinformatics Institute (EBI): n n http: //www. iscb. org/ National Center for Biotechnology Information (NCBI, NIH): n n http: //www. mit. edu: 8001/afs/athena/course/other/esgbio/www/700 1 main. html 很不錯的on-line生物學 http: //www. ebi. ac. uk/ DNA Data Bank of Japan (DDBJ): n http: //www. ddbj. nig. ac. jp/
- Slides: 64