Kernel HPV HPV Risk Classification Using Kernel Based

  • Slides: 17
Download presentation
Kernel 기반 학습을 이용한 HPV의 위험군 분류 HPV Risk Classification Using Kernel Based Learning

Kernel 기반 학습을 이용한 HPV의 위험군 분류 HPV Risk Classification Using Kernel Based Learning 정제균 E-mail : [email protected] snu. ac. kr

Example: The Genetic Map of HPV-31 7912/1 HPV 31 Transformation Important genes for genotyping

Example: The Genetic Map of HPV-31 7912/1 HPV 31 Transformation Important genes for genotyping 7000 E 7 URR 1000 E 6 E 1 Major capsid protein L 1 6000 Early Late 2000 E 4 L 2 E 5 E 2 3000 5000 4000

Methods l Profile learning by string kernel method 4 To define feature map from

Methods l Profile learning by string kernel method 4 To define feature map from space of protein sequences to vector space 4 Mismatch string kernels 4 Support Vector Machines (SVM) classifier l Search profile to maximize classification 4 HMMs (Hidden Markov Models) 4 Log-odd score

Data Pre-processing HMMs Find more informative sites ! 12 residues … Positive Set (High

Data Pre-processing HMMs Find more informative sites ! 12 residues … Positive Set (High Risk) … Negative Set (Low Risk) 206 residue (E 6)

Log-Likelihood Ratio for Each Position P 3 P 1 P 2 P P 16

Log-Likelihood Ratio for Each Position P 3 P 1 P 2 P P 16 4 P 11 P 17 P 75 P 134 P 149 P 150 P P 138 151

Simple Concept of SVMs (Support Vector Machines) Maximum margin X: Input Space X Z:

Simple Concept of SVMs (Support Vector Machines) Maximum margin X: Input Space X Z: Feature Space (X) Z W·Z ŷ

The k-Spectrum Kernel MHQKRTAMFQ MHQ HQK QKR KRT RTA TAM AMF MFQ K-Spectrum Feature

The k-Spectrum Kernel MHQKRTAMFQ MHQ HQK QKR KRT RTA TAM AMF MFQ K-Spectrum Feature Map ( 0, 0 , … 1, …, 0 ) AAA AAC … HQK … TAM … YYY Dimension of feature space: 20 k F (k)(x) = ( a (x))a k

(k, m)-Mismatch Feature Map l Feature map for k-spectrum, allowing m mismatches: AKQ DKQ

(k, m)-Mismatch Feature Map l Feature map for k-spectrum, allowing m mismatches: AKQ DKQ … EKQ … AKY AAQ F (k, m)(a) = ( b (a))b k b (a)=P(b 1|a 1) P(b 2|a 2)…P(bk|ak)

Example: Traversing the Mismatch Tree l Traversal for input sequence: AVLALKAVLL, k=8, m=1 0

Example: Traversing the Mismatch Tree l Traversal for input sequence: AVLALKAVLL, k=8, m=1 0 0 0 AVL VL A LAL AL K LKA KAV AVL VL L A 0 1 1 VL A LAL AL K LKA KAV AVL VL L L 1 1 LL AA LL KK AA VV

ROC (Receiver-Operating Characteristic) Result Scores in Best Positions (SVMs Run)

ROC (Receiver-Operating Characteristic) Result Scores in Best Positions (SVMs Run)

ROC (Reciever Operating Characteristic)

ROC (Reciever Operating Characteristic)

Worst vs. Best Position

Worst vs. Best Position

Whole Sequence?

Whole Sequence?

Conclusion l l l HPV risk classification using SVM learning by string kernel method

Conclusion l l l HPV risk classification using SVM learning by string kernel method Combining with prediction information of generative model (HMMs) The computational efficiency in sequence classification: fast prediction Classification of risk type is important to understand the mechanisms in infection and to develop novel instruments for medical examination Can be applied to other viral organisms to identify risk types Considering additional feature in improving the prediction accuracy ex) protein structure