bioinformatics Bioinformatics Sequence alignment Pairwise Alignment Sequence A
生物資訊 bioinformatics 林育慶
Bioinformatics
Sequence alignment
Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT An alignment of A and B: C---TTAACT CGGATCA--T Sequence A Sequence B
Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT An alignment of A and B: Mismatch Match C---TTAACT CGGATCA--T Insertion gap Deletion gap
Alignment Graph Sequence A: CTTAACT Sequence B: CGGATCAT C C T T A A C T G G A T C---TTAACT CGGATCA--T
如何判斷相似度 l l l Match: +8 (w(x, y) = 8, if x = y) Mismatch: -5 (w(x, y) = -5, if x ≠ y) Each gap symbol: -3 (w(-, x)=w(x, -)=-3) C - - - T T A A C T C G G A T C A - - T +8 -3 -3 -3 +8 -5 +8 -3 -3 Alignment score +8 = +12
k best local alignments l Smith-Waterman (Smith and Waterman, 1981; Waterman and Eggert, 1987) l FASTA (Wilbur and Lipman, 1983; Lipman and Pearson, 1985) l BLAST (Altschul et al. , 1990; Altschul et al. , 1997)
FASTA 1) 2) 3) 4) Find runs of identities, and identify regions with the highest density of identities. Re-score using PAM matrix, and keep top scoring segments. Eliminate segments that are unlikely to be part of the alignment. Optimize the alignment in a band.
FASTA Step 1: Find runes of identities, and identify regions with the highest density of identities. Sequence B Sequence A
FASTA Step 2: Re-score using PAM matrix, and keep top scoring segments.
FASTA Step 3: Eliminate segments that are unlikely to be part of the alignment.
FASTA Step 4: Optimize the alignment in a band.
BLAST ü Basic Local Alignment Search Tool (by Altschul, Gish, Miller, Myers and Lipman) ü The central idea of the BLAST algorithm is that a statistically significant alignment is likely to contain a high-scoring pair of aligned words.
BLAST Step 1: Build the hash table for Sequence A. (3 -tuple example) For DNA sequences: Seq. A = AGATCGAT 12345678 AAA AAC. . AGA. . ATC. . CGA. . GAT. . TCG. . TTT 1 3 5 2 4 6
BLAST Step 2: Scan sequence B for hits. Step 3: Extend hits. hit
Bioinformatics and Computational Biology-Related Journals: l l l Bioinformatics (previously called CABIOS) Bulletin of Mathematical Biology Computers and Biomedical Research Genome Research Genomics Journal of Bioinformatics and Computational Biology Journal of Molecular Biology Nature Nucleic Acid Research Science
Bioinformatics and Computational Biology-Related Conferences: l l l Intelligent Systems for Molecular Biology (ISMB) Pacific Symposium on Biocomputing (PSB) The Annual International Conference on Research in Computational Molecular Biology (RECOMB) The IEEE Computer Society Bioinformatics Conference (CSB). . .
主要生物資訊網站 l l 主要之生物資訊網站,已漸漸將資料庫、搜尋 引擎及分析軟體合而為一 NCBI (National Center for Biotechnology Information) Ex. PASy (Expert Protein Analysis System) EMBnet (European Molecular Biology network)
reference l l l l EBI (European Bioinformatics Institute):http: //www. ebi. ac. uk/ Ex. PASy (Expert Protein Analysis System): http: //www. expasy. org/ Genome. Net (Japanese Bioinformatics Center): http: //www. genome. ad. jp/ NCBI (National Center for Biotech Information): http: //www. ncbi. nlm. nih. gov/ NIH (National Institute of Health):http: //www. mh. nih. gov/ 國家衛生研究院(National Health Research Institute): http: //www. nhri. org. tw/ http: //www. csie. ntu. edu. tw/~kmchao 趙坤茂老師投影片 前人投影片…
- Slides: 30