CISC 667 Intro to Bioinformatics Fall 2005 Support
CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications CISC 667, F 05, Lec 23, Liao 1
CISC 667, F 05, Lec 23, Liao 2
CISC 667, F 05, Lec 23, Liao 3
CISC 667, F 05, Lec 23, Liao 4
Combining pairwise similarity with SVMs for protein homology detection Positive train Negative train Protein homologs Protein nonhomologs 1 Positive Negative pairwise score vectors 2 Support vector machine Testing data Target protein of unknown function 3 CISC 667, F 05, Lec 23, Liao Binary classification 5
Experiment: known protein families Jaakkola, Diekhans and Haussler 1999 CISC 667, F 05, Lec 23, Liao 6
Vectorization CISC 667, F 05, Lec 23, Liao 7
CISC 667, F 05, Lec 23, Liao 8
A measure of sensitivity and specificity 5 6 ROC = 1 ROC = 0. 67 ROC = 0 ROC: receiver operating characteristic score is the normalized area under a curve the plots true positives as a function of false positives CISC 667, F 05, Lec 23, Liao 9
Performance Comparison (1) CISC 667, F 05, Lec 23, Liao 10
CISC 667, F 05, Lec 23, Liao 11
Using Phylogenetic Profiles & SVMs YAL 001 C E-value CISC 667, F 05, Lec 23, Liao Phylogenetic profile 0. 122 1 1. 064 0 3. 589 0 0. 008 1 0. 692 1 8. 49 0 14. 79 0 0. 584 1 1. 567 0 0. 324 1 0. 002 1 3. 456 0 2. 135 0 0. 142 1 0. 001 1 0. 112 1 1. 274 0 0. 234 1 4. 562 0 3. 934 0 0. 489 1 0. 002 1 2. 421 0 0. 112 1 12
phylogenetic profiles and Evolution Patterns 1 1 0 0 x 1 1 1 0 10 0 0 1 1 0 Impossible to know for sure if the gene followed exactly this evolution pattern CISC 667, F 05, Lec 23, Liao 13
Tree Kernel (Vert, 2002) § For a phylogenetic profile x and an evolution pattern e: • P(e) quantifies how “natural” the pattern is • P(x|e) quantifies how likely the pattern e is the “true history” of the profile x § Tree Kernel : K tree(x, y) = Σe p(e)p(x|e)p(y|e) § Can be proved to be a kernel § Intuition: two profiles get closer in the feature space when they have shared common evolution patterns with high probability. CISC 667, F 05, Lec 23, Liao 14
Tree-Encoded Profile (Narra & Liao, 2004) 0. 55 0. 34 Post-order traversal 0. 75 0. 67 1 0. 33 1 1 0. 5 0 0 0 1 1 1 0. 33 0. 67 0. 34 0. 5 0. 75 0. 55 CISC 667, F 05, Lec 23, Liao 15
CISC 667, F 05, Lec 23, Liao 16
Using Support Vector Machines CISC 667, F 05, Lec 23, Liao 17
Kernel function: where r = 0. 10 Soft margin regularization C = 1. 50 L( ) = i ½ i j yi yj (K(xi · xj) + ij /C) Coding scheme: BIN 21 Evaluation: Q 3 = (P 1+P 2+P 3)/N C = (TP TN - FP FN) / ( PP PN AP AN) SOV: segment overlap accuracy CISC 667, F 05, Lec 23, Liao 18
Design tertiary classifiers CISC 667, F 05, Lec 23, Liao 19
CISC 667, F 05, Lec 23, Liao 20
Nguyen & Rajapakse, Genome Informatics 14: 218 -227 (2003) CISC 667, F 05, Lec 23, Liao 21
A two-stage SVM CISC 667, F 05, Lec 23, Liao 22
CISC 667, F 05, Lec 23, Liao 23
- Slides: 23