Kernel HPV HPV Risk Classification Using Kernel Based
Kernel 기반 학습을 이용한 HPV의 위험군 분류 HPV Risk Classification Using Kernel Based Learning 정제균 E-mail : jgjoung@cbit. snu. ac. kr
Example: The Genetic Map of HPV-31 7912/1 HPV 31 Transformation Important genes for genotyping 7000 E 7 URR 1000 E 6 E 1 Major capsid protein L 1 6000 Early Late 2000 E 4 L 2 E 5 E 2 3000 5000 4000
Methods l Profile learning by string kernel method 4 To define feature map from space of protein sequences to vector space 4 Mismatch string kernels 4 Support Vector Machines (SVM) classifier l Search profile to maximize classification 4 HMMs (Hidden Markov Models) 4 Log-odd score
Data Pre-processing HMMs Find more informative sites ! 12 residues … Positive Set (High Risk) … Negative Set (Low Risk) 206 residue (E 6)
Log-Likelihood Ratio for Each Position P 3 P 1 P 2 P P 16 4 P 11 P 17 P 75 P 134 P 149 P 150 P P 138 151
Simple Concept of SVMs (Support Vector Machines) Maximum margin X: Input Space X Z: Feature Space (X) Z W·Z ŷ
The k-Spectrum Kernel MHQKRTAMFQ MHQ HQK QKR KRT RTA TAM AMF MFQ K-Spectrum Feature Map ( 0, 0 , … 1, …, 0 ) AAA AAC … HQK … TAM … YYY Dimension of feature space: 20 k F (k)(x) = ( a (x))a k
(k, m)-Mismatch Feature Map l Feature map for k-spectrum, allowing m mismatches: AKQ DKQ … EKQ … AKY AAQ F (k, m)(a) = ( b (a))b k b (a)=P(b 1|a 1) P(b 2|a 2)…P(bk|ak)
Example: Traversing the Mismatch Tree l Traversal for input sequence: AVLALKAVLL, k=8, m=1 0 0 0 AVL VL A LAL AL K LKA KAV AVL VL L A 0 1 1 VL A LAL AL K LKA KAV AVL VL L L 1 1 LL AA LL KK AA VV
ROC (Receiver-Operating Characteristic) Result Scores in Best Positions (SVMs Run)
ROC (Reciever Operating Characteristic)
Worst vs. Best Position
Whole Sequence?
Conclusion l l l HPV risk classification using SVM learning by string kernel method Combining with prediction information of generative model (HMMs) The computational efficiency in sequence classification: fast prediction Classification of risk type is important to understand the mechanisms in infection and to develop novel instruments for medical examination Can be applied to other viral organisms to identify risk types Considering additional feature in improving the prediction accuracy ex) protein structure
- Slides: 17