Protein Fold Recognition with Relevance Vector Machines Patrick

  • Slides: 11
Download presentation
Protein Fold Recognition with Relevance Vector Machines Patrick Fernie COMS 6772 Advanced Machine Learning

Protein Fold Recognition with Relevance Vector Machines Patrick Fernie COMS 6772 Advanced Machine Learning 12/05/2005

Relevance Vector Machine A Bayesian treatment of a generalized linear model Yields a formulation

Relevance Vector Machine A Bayesian treatment of a generalized linear model Yields a formulation similar to that of a Support Vector Machine Hyperparameters Instead of Margin/Costs

Relevance Vector Machine SVM RVM Hard Binary Outputs or Probabilistic Outputs Point Estimates Requires

Relevance Vector Machine SVM RVM Hard Binary Outputs or Probabilistic Outputs Point Estimates Requires a Mercer Can Use Arbitrary Kernel Must Determine Suitable “Nuisance” Values Cost and Insensitivity Automatically Values Determined Sparse (USPS ~2500) Sparser USPS (~316!)

Relevance Vector Machine Can’t Use qp() Must solve iteratively (Sequential Minimization Optimization) As we

Relevance Vector Machine Can’t Use qp() Must solve iteratively (Sequential Minimization Optimization) As we iterate, many hyperparameters (αi) values become arbitrarily large; allows pruning.

Relevance Vector Machine Faster Algorithm (Still not SVM fast) Minimizes Number of Active Kernel

Relevance Vector Machine Faster Algorithm (Still not SVM fast) Minimizes Number of Active Kernel Functions to Reduce Computation Time Analytic Approach to Pruning/Adding Basis Functions

Protein Fold Recognition Protein Structure Families Many Fold Families Not Necessarily Directly Related by

Protein Fold Recognition Protein Structure Families Many Fold Families Not Necessarily Directly Related by Protein Sequence

Protein Fold Recognition Prime Situation for Machine Learning Techniques! NN, SVM, etc. Large Number

Protein Fold Recognition Prime Situation for Machine Learning Techniques! NN, SVM, etc. Large Number of Classes

Protein Fold Recognition 27 Fold Families Train Many 2 -Classifiers n n n One

Protein Fold Recognition 27 Fold Families Train Many 2 -Classifiers n n n One vs. Others – False Positives Unique One vs. Others – Like One vs. Others, with Another Round of Training All vs. All – Requires a Lot of Classifiers!

RVMs & Protein Folds Why RVMs? n n Probabilistic Outputs Sparsity (useful only in

RVMs & Protein Folds Why RVMs? n n Probabilistic Outputs Sparsity (useful only in assessment) True Multiclass Prediction No Need to Find “Nuisance” Parameters

Issues/Future Work Optimize RVM Classification Implement True Multiclass Reduced Greediness and Sequential Convergence Optimization

Issues/Future Work Optimize RVM Classification Implement True Multiclass Reduced Greediness and Sequential Convergence Optimization Novel Kernels?

References M. Tipping, “The Relevance Vector Machine”, http: //www. relevancevector. com M. Tipping, “Sparse

References M. Tipping, “The Relevance Vector Machine”, http: //www. relevancevector. com M. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine”, JMLR, 2001 1: 211244. M. Tipping and A. Faul, “Fast Marginal Likelihood Maximisation for Sparse Bayesian Models”, http: //www. relevancevector. com C. Ding and I. Dubchak, “Multi-class Protein Fold Recognition Using Support Vector Machines”, http: //www. kernel-machines. org