Classification Prediction Classification Supervised Learning Use a training
- Slides: 20
Classification / Prediction
Classification “Supervised Learning” Use a “training set” of examples to create a model that is able to predict, given an unknown sample, which of two or more classes that sample belongs to. ?
What we’ll cover • How to build a classifier. • How to evaluate a classifier. • Using Gene. Pattern to classify expression data.
What Is a Classifier • A predictive rule that uses a set of inputs (genes) to predict the values of the output (phenotype). • Known examples (train data) are used to build the predictive rule. • Goal: – Achieve high generalization (predictive) power. – Avoid over-fitting.
Classification Computational methodology Data Known Classes Assess Gene-Class Correlation Feature (Marker) Selection Build Classifier Test Classifier by Cross. Validation Evaluate Classifier on Independent Test Set
Classification Computational methodology Expression Data Known Classes Assess Gene-Class Correlation Feature Selection Regression Trees (CART) Weighted Voting Build Classifier k-Nearest Neighbors Support Vector Machines Test Classifier by Cross. Validation Evaluate Classifier on Independent Test Set
Classifiers Important issues: – Few cases, many variables (genes) – redundancy: many highly correlated genes. – noise: measurements are very imprecise. – feature selection: reducing p is a necessity. Avoid over-fitting.
K-nn Classifier Example: K=5, 2 genes, 2 classes project samples in gene space gene 2 class black class orange gene 1
K-nn Classifier Example: K=5, 2 genes, 2 classes project unknown sample gene 2 class black class orange ? gene 1
K-nn Classifier Example: K=5, 2 genes, 2 classes class black "consult" 5 closest neighbors: - 3 black - 2 orange Distance measures: • Euclidean distance gene 2 • 1 -Pearson correlation • … class orange ? gene 1
Support Vector Machine (SVM) Noble, Nat Biotech 2006
Weighted Voting new sample Mixture of Experts approach: – Each gene casts a vote for one of the possible classes. – The vote is weighted by a score assessing the reliability of the expert (in this case, the gene). – The class receiving the highest vote will be the predicted class. – The vote can be used as a proxy for the probability of class membership (prediction strength). Class 1 centroid g 1 g 2 … gi … gn-1 gn Class 2 centroid ? g 1 g 2 … gi … gn 1 gn Slonim et al. , RECOMB 2000
Class Prediction Computational methodology Expression Data Known Classes Assess Gene-Class Correlation Feature Selection Build Predictor Test Predictor by Cross. Validation Evaluate Predictor on Independent Test Set Regression Trees (CART) Weighted Voting k-Nearest Neighbors Support Vector Machines
Testing the Classifier • Evaluation on independent test set – Build the classifier on the train set – Assess prediction performance on test set. • Maximize generalization/Avoid overfitting. • Performance measure error rate = # of cases correctly classified total # of cases
Testing the Classifier • Evaluation on independent test set – What if we don’t have an independent test set? • Cross Validation (XV): – Split the dataset into n folds (e. g. , 10 folds of 10 cases each). – For each fold (e. g. , for each group of 10 cases), • train (i. e. , build model) on n-1 folds (e. g. , on 90 cases), • test (i. e. , predict) on left-out fold (e. g. , on remaining 10 cases). – Combine test results. – Frequently, leave-one-out XV (when small sample size).
Testing the Classifier Learning curves – leave one out cross validation ALL vs. MLL vs. AML
Testing the Classifier • Error rate estimate: – Evaluate on independent test set: • Best error estimate. – Cross Validation • Needed when small sample size and for model selection.
Classification Cookbook • Start by splitting data into train and test set (stratified). “Forget” about the test set until the very end. • Explore different feature selection methods and different classifiers on train set by XV. • Once the “best” classifier and best classifier parameters have been selected (based on XV) – Build a classifier with given parameters on entire train set. – Apply classifier to test set.
Classification Gene. Pattern methods • Split data into train and test set – Split. Dataset. Train. Test • Explore different feature selection methods and different classifiers on train set by CV. – CARTXValidation – KNNXValidation – Weighted. Voting. XValidation • Once the “best” classifier and best classifier parameters have been selected (based on CV) – CART – KNN – Weighted. Voting • Examine results: – Prediction. Reults. Viewer – Feature. Summary. Viewer
Classification Example
- Supervised learning dan unsupervised learning
- Contoh supervised learning
- On training targets for supervised speech separation
- Alexandru niculescu-mizil
- Supervised learning pipeline
- Partially supervised learning
- Www.youtube.com
- Supervised and unsupervised learning
- Deep q network
- Interactive supervised classification
- Partially supervised classification of text documents
- Supervised classification
- Supervised classification
- Avoiding discrimination through causal reasoning
- Machine learning algorithms for restaurants
- Stealing machine learning models via prediction apis
- Prediction of nba games based on machine learning methods
- Stealing machine learning models via prediction apis
- Cuadro comparativo entre e-learning b-learning y m-learning
- Normalized cut loss for weakly-supervised cnn segmentation
- Normalized cut loss for weakly-supervised cnn segmentation