Semisupervised learning based on kernel methods and graph
Semi-supervised learning based on kernel methods and graph cut algorithms Tijl De Bie Promotor: Prof. Bart De Moor 23 May 2005 Ph. D defense - Tijl De Bie
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Overview Class learning • Learning? ? • Class learning Semi-supervised learning methods • Semi-supervised learning methods • Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 2
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Semi-supervised learning methods • Learning = – Observing information (data)… – …detecting regularities… – …with the goal of making reliable predictions. • Prerequisites for learning: Other general learning settings – Statistical assumptions – Enough information, as compared to the ‘complexity’ of the type of regularity to be learned 23 May 2005 Ph. D defense - Tijl De Bie 3
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Learning to do what? ? Class learning • Divide pixels of an image in foreground and background… (image segmentation) • Divide genes in cell-cycle related and cell-cycle unrelated genes (bioinformatics)… Semi-supervised learning methods • Divide websites in related and unrelated to some query … (information retrieval) • Divide pictures of faces in faces belonging to particular persons… (machine vision) Other general learning settings Learn classes or clusters in data! 23 May 2005 Ph. D defense - Tijl De Bie 4
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Overview Class learning • Learning? ? • Class learning Semi-supervised learning methods Other general learning settings – Supervised class learning: classification – Unsupervised class learning: clustering – Semi-supervised class learning: transduction and sideinformation learning – Examples of semi-supervised learning problems • Semi-supervised learning methods • Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 5
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Supervised class learning: Classification Class learning • Classification • Given: a set of data points, and their class label. • How to tell the label from the data points? Learn this from the training set (induction). • I. e. , come up with a good classifier. • Clustering • Transduction & side-info • Examples Semi-supervised learning methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 6
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Supervised class learning: Classification Class learning • Classification • Given: a set of data points, and their class label. • How to tell the label from the data points? Learn this from the training set (induction). • I. e. , come up with a good classifier. • Make predictions about the labels of a test set (deduction) • Clustering • Transduction & side-info • Examples Semi-supervised learning methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 7
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Unsupervised class learning: Clustering Class learning • Classification • Given: a set of data points, but no class labels. • How to divide the data points into coherent groups? • (Additional data points available only later can be labeled according to their location close to one of the cluster centers. ) • Clustering • Transduction & side-info • Examples Semi-supervised learning methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 8
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Semi-supervised class learning: Transduction Class learning • Classification • Given: a training set of data points with their class labels, and a test set of unlabeled data points. • How to divide the data points into coherent groups while respecting the labels? • Clustering • Transduction & side-info • Examples Semi-supervised learning methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 9
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Semi-supervised class learning: Side-information learning Class learning • Classification • Given: a set of data points and constraints on their labels (sideinformation) • How to divide the data points into coherent groups while respecting the side-information? • Clustering • Transduction & side-info • Examples Semi-supervised learning methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 10
Semi-supervised learning based on kernel methods and graph cut algorithms Why semi-supervised learning…? Learning? ? Class learning • Classification • Unlabeled data is often easy to obtain • Clustering • Transduction & side-info • Examples Semi-supervised learning methods • Labeled data often expensive / scarce • Use all information available! • The statistical assumptions are weaker and more realistic Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 11
Semi-supervised learning based on kernel methods and graph cut algorithms Very often the ‘right’ approach Learning? ? Class learning • Classification • Clustering • Transduction & side-info • Examples: – Image segmentation: find coherent sets of pixels in a figure… what is ‘coherent’? • Examples Cluster the pixels… Semi-supervised learning methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 12
Semi-supervised learning based on kernel methods and graph cut algorithms Very often the ‘right’ approach Learning? ? Class learning • Classification • Clustering • Transduction & side-info • Examples: – Image segmentation: find coherent sets of pixels in a figure… what is ‘coherent’? transductive approach! • Examples Semi-supervised learning methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 13
Semi-supervised learning based on kernel methods and graph cut algorithms Very often the ‘right’ approach Learning? ? Class learning • Classification • Clustering • Transduction & side-info • Examples: – Image segmentation: find coherent sets of pixels in a figure… what is ‘coherent’? transductive approach! • Examples Semi-supervised learning methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 14
Semi-supervised learning based on kernel methods and graph cut algorithms Very often the ‘right’ approach Learning? ? Class learning • Classification • Clustering • Transduction & side-info • Examples: – Image segmentation: find coherent sets of pixels in a figure… what is ‘coherent’? transductive approach! • Examples Semi-supervised learning methods – Bioinformatics: many genes, wet lab may label some of them, but at a high cost Other general learning settings DNA sequence… Microarrays… 23 May 2005 Ph. D defense - Tijl De Bie 15
Semi-supervised learning based on kernel methods and graph cut algorithms Very often the ‘right’ approach Learning? ? Class learning • Classification • Clustering • Transduction & side-info • Examples: – Image segmentation: find coherent sets of pixels in a figure… what is ‘coherent’? transductive approach! • Examples Semi-supervised learning methods – Bioinformatics: many genes, wet lab may label some of them, but at a high cost – Information retrieval on the web: user labels a few websites, machine learning system uses additional unlabeled websites from the web (e. g. : likes and dislikes) Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 16
Semi-supervised learning based on kernel methods and graph cut algorithms Very often the ‘right’ approach Learning? ? Class learning • Classification • Clustering • Transduction & side-info • Examples: – Image segmentation: find coherent sets of pixels in a figure… what is ‘coherent’? transductive approach! • Examples Semi-supervised learning methods – Bioinformatics: many genes, wet lab may label some of them, but at a high cost – Information retrieval on the web: user labels a few websites, machine learning system uses additional unlabeled websites from the web (e. g. : likes and dislikes) Other general learning settings – Face recognition based on different frames of a video sequence – … 23 May 2005 Ph. D defense - Tijl De Bie 17
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? But… Class learning • Classification • Clustering • Transduction & side-info • Examples Semi-supervised learning = Semi-supervised learning methods an intrinsically hard combinatorial problem! Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 18
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Overview Class learning • Learning? ? • Class learning Semi-supervised learning methods Other general learning settings • Semi-supervised learning methods – Learning a distance metric using sideinformation – Label constrained graph cut clustering – Transductive SVM • Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 19
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Semi-supervised learning methods • 3 important approaches – Learning a distance metric using side-information – Label constrained graph cut clustering – Transductive SVM Data points & metric Semi-supervised learning methods Side-information Dim. reduction Label constrained clustering Data points & new metric Other general learning settings Data points & affinity measure Resulting classes Data points & metric (kernel) Classification With unlabeled data Resulting classes Clustering Resulting classes 23 May 2005 Ph. D defense - Tijl De Bie 20
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Learning a distance metric using side-information • Consider the data points (in the real 2 -D space) Can we distinguish classes/clusters? Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings Clustering? 23 May 2005 Ph. D defense - Tijl De Bie 21
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Learning a distance metric using side-information • Consider the data points (in the real 2 -D space) Can we distinguish classes/clusters? Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings What if we know pairwise constraints on the labels? 23 May 2005 Ph. D defense - Tijl De Bie 22
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Learning a distance metric using side-information • Can be achieved using semi-supervised dimensionality reduction, followed by clustering: Project data points on the lower dimensional hyperplane… Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 23
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Learning a distance metric using side-information • Can be achieved using semi-supervised dimensionality reduction, followed by clustering: Project data points on the lower dimensional hyperplane… Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings • We developed a technique based on canonical correlation analysis to perform such dimensionality reduction. (Can be kernelized nonlinear version) 23 May 2005 Own contribution Ph. D defense - Tijl De Bie 24
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Label constrained graph cut clustering • Graph cut clustering = ? • Define a measure of affinity, similarity, between data points : Semi-supervised learning methods • Metric learning • Graph cuts where • Kernel methods Graph clustering Other general learning settings = Divide the graph in coherent parts… 23 May 2005 Ph. D defense - Tijl De Bie 25
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Label constrained graph cut clustering • Notation: Affinity matrix: Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings Label vector: (Graph Laplacian: 23 May 2005 ) Ph. D defense - Tijl De Bie 26
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Label constrained graph cut clustering • Normalized cut cost clustering: Cut cost Labels Semi-supervised learning methods Balance • Metric learning • Graph cuts • Kernel methods Other general learning settings Very hard combinatorial problem! 23 May 2005 Ph. D defense - Tijl De Bie 27
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Label constrained graph cut clustering ( eigenvalue) • Relaxation to an eigenvalue problem: Rewrite in terms of: Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings • Observation: relax combinatorial constraint to this norm constraint 23 May 2005 Ph. D defense - Tijl De Bie 28
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Label constrained graph cut clustering ( eigenvalue) Class learning Semi-supervised learning methods • Solved by eigenvalue problem: • Metric learning • Graph cuts • Kernel methods • Eigenvector with smallest nonzero eigenvalue = approximation for unrelaxed Other general learning settings Own contribution (also: Shi & Malik) 23 May 2005 Ph. D defense - Tijl De Bie 29
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Label constrained graph cut clustering ( eigenvalue) • Transductive setting: Parameterize the label vector as } Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Training set Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie Own contribution 30
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Label constrained graph cut clustering ( SDP) • Relaxation to a semi-definite programming (SDP) problem • Trick: introduce (rank one) label matrix • Relax to positive semi-definiteness constraint: Primal: Dual: Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings • Much tighter relaxation • Quite a bit slower… doable up to 1000 data points 23 May 2005 Ph. D defense - Tijl De Bie Own contribution 31
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Label constrained graph cut clustering ( SDP) • Transductive setting: Parameterize the label matrix as } Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Training set Other general learning settings 23 May 2005 Own contribution Ph. D defense - Tijl De Bie 32
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Semi-supervised learning methods Label constrained graph cut clustering ( SDP) • Can we speed up, and retain accuracy of the relaxation? • Yes! Combine the eigenvalue relaxation and the SDP relaxation… • Metric learning • Graph cuts • Kernel methods • Then often feasible up to 5000 data points Other general learning settings The subspace trick, useful to speed up relaxations of many hard combinatorial problems! 23 May 2005 Own contribution Ph. D defense - Tijl De Bie 33
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Transductive SVM Class learning • Support Vector Machine (SVM) = classification problem + + Semi-supervised learning methods - + + - • Metric learning • Graph cuts - • Kernel methods - + + the margin Other general learning settings • SVMs search for the maximum margin hyperplane, parameterized by a weight vector 23 May 2005 Ph. D defense - Tijl De Bie 34
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Transductive SVM Class learning • is the solution of: • Dual: Semi-supervised learning methods • Metric learning where and are inner products • Graph cuts • Kernel methods Other general learning settings • Inner product can be any symmetric, bilinear, positive definite function, i. e. a kernel function: 23 May 2005 Ph. D defense - Tijl De Bie 35
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Transductive SVM Class learning • Transductive SVM: Also uses the set of unlabeled data points • (Dual) optimization problem (due to Vapnik): Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings Very hard combinatorial problem! 23 May 2005 Ph. D defense - Tijl De Bie 36
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Transductive SVM Own contribution Class learning • Relaxation to an SDP problem, after several simplifications, and with : Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings • Feasible up to 100 unlabeled and 1000 labeled data points • Subspace trick also works here up to 1000 total 23 May 2005 Ph. D defense - Tijl De Bie 37
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Transductive SVM Class learning • Performance on an artificial dataset: With only 2 training points… Semi-supervised learning methods • Metric learning • Graph cuts • Kernel methods Other general learning settings 23 May 2005 Ph. D defense - Tijl De Bie 38
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Overview Class learning • Learning? ? • Class learning Semi-supervised learning methods • Semi-supervised learning • Other general learning settings – Classification using heterogeneous information sources: a bioinformatics case study – Inferring transcriptional modules using heterogeneous information sources – Canonical correlation analysis: study of regularization 23 May 2005 Ph. D defense - Tijl De Bie 39
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Classification using heterogeneous information • Bioinformatics case study – Classify genes: coding for transmembrane proteins or not – Classify genes: coding for ribosomal proteins or not Semi-supervised learning methods – Information available • • • Other general learning settings • Classification w. heterogeneous information • Inferring modules DNA sequence of the genes Upstream DNA sequences Gene expression profiles Protein sequence Protein-protein interaction data – Positive results using a data fusion approach based on SVMs, in a transductive setting Own contribution • Regularized CCA 23 May 2005 Ph. D defense - Tijl De Bie 40
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Inferring transcriptional modules using heterogeneous information Class learning Find modules as sets of genes, satisfying: • All share the same set of regulators Semi-supervised learning methods • All share the same set of motifs in upstream DNA • Their expression profiles are strongly correlated Other general learning settings • Classification w. heterogeneous information Own contribution • Inferring modules • Regularized CCA 23 May 2005 Ph. D defense - Tijl De Bie 41
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Inferring transcriptional modules using heterogeneous information Class learning Discovery step… Semi-supervised learning methods Validation step… Other general learning settings • Classification w. heterogeneous information Thanks to Karen Lemmens! • Inferring modules • Regularized CCA 23 May 2005 Ph. D defense - Tijl De Bie 42
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Class learning Regularization of canonical correlation analysis • Regularization – To prevent overfitting, improve generalization – To improve numerical stability – To deal with noise Semi-supervised learning methods • We derived regularized CCA using the last approach Other general learning settings • Classification w. heterogeneous information • Showed connections between CCA and independent component analysis (ICA) Own contribution • Inferring modules • Regularized CCA 23 May 2005 Ph. D defense - Tijl De Bie 43
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Conclusions Class learning • Semi-supervised learning is useful ! • Semi-supervised learning is hard algorithmically, inherently a combinatorial problem Semi-supervised learning methods • We proposed 3 approaches: – Using dimensionality reduction – By adapting a graph cut cluster algorithm to take label constraints into account – By relaxing the transductive SVM classifier Other general learning settings • Other contributions in data fusion, and multivariate statistics… (not shown in presentation) 23 May 2005 Ph. D defense - Tijl De Bie 44
Semi-supervised learning based on kernel methods and graph cut algorithms Learning? ? Outlook Class learning Semi-supervised learning methods • Convex relaxations to approximately solve combinatorial problems? New optimization results needed… • The subspace trick as a means of speeding up SDP relaxations! • Statistical study of semi-supervised learning Other general learning settings • Approaches to integrate information coming from heterogeneous data sources 23 May 2005 Ph. D defense - Tijl De Bie 45
Semi-supervised learning based on kernel methods and graph cut algorithms No phd without… …my promotor Bart De Moor, Johan Suykens, Lieven De Lathauwer, Kathleen Marchal, Yves Moreau, Joos Vandewalle, Jan Willems, André Barbé, Bart Preneel, Sabine Van Huffel, Adhemar Bultheel, Nello Cristianini! Michael Jordan, Roman Rosipal, Laurent El Ghaoui, Peter Bartlett, Wolfgang Polonik, Dan Gusfield, Michinari Momma, Bill Noble, John Shawe-Taylor, Edgard Nyssen, Gert Lanckriet, Pieter Abbeel, SCD: Andy, Bart, Bert, Carlos, Diana, Dries, Evelyne, Frank, Geert, Ivan, Jeroen, Johan, Jos, Luc, Lukas, Katrien, Koenraad, Kristiaan, Maarten, Marcelo, Mieke, Oscar, Pieter, Simon, Sven, Tom, Tony, and… Bioi@SCD! Bert, Cynthia, Frank, Frizo, Geert, Gert, Janick, Joke, Karen, Kristof, Mik, Nathalie, Olivier, Patrick, Peter, Raf, Ruth, Pieter, Steffen, Stein, Steven, Tim, Tom, Wouter, the FWO for sponsoring my phd and visits abroad! 23 May 2005 Ph. D defense - Tijl De Bie Thanks! 46
- Slides: 46