Isomap Algorithm http isomap stanford edu Yuri Barseghyan
Isomap Algorithm http: //isomap. stanford. edu/ Yuri Barseghyan Yasser Essiarab University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Linear Methods for Dimensionality Reduction – PCA (Principal Component Analysis): rotate data so that principal axes lie in direction of maximum variance – MDS (Multi-Dimensional Scaling): find coordinates that best preserve pairwise distances University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Limitations of Linear methods • What if the data does not lie within a linear subspace? • Do all convex combinations of the measurements generate plausible data? • Low-dimensional non-linear Manifold embedded in a higher dimensional space University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 1. pdf
Non-linear Dimensionality Reduction • What about data that cannot be described by linear combination of latent variables? – Ex: swiss roll, s-curve • In the end, linear methods do nothing more than “globally transform” (rotate/translate/scale) data. Sometimes need to “unwrap” the data first PCA http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 2. pdf University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Non-linear Dimensionality Reduction • Unwrapping the data = “manifold learning” • Assume data can be embedded on a lower-dimensional manifold • Given data set X = {xi}i=1…n, find representation Y = {yi}i=1…n where Y lies on lower-dimensional manifold • Instead of preserving global pairwise distances, non-linear dimensionality reduction tries to preserve only the geometric properties of local neighborhoods University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Isometry • From Mathworld: two Riemannian manifolds M and N are isometric if there is a diffeomorphism such that the Riemannian metric from one pulls back to the metric on the other. For a complete Riemannian manifold: d(x, y) = geodesic distance between x and y • Informally, an isometry is a smooth invertible mapping that looks locally like a rotation plus translation • Intuitively, for 2 -dimensional case, isometries include whatever physical transformations one can perform on a sheet of paper without introducing tears, holes, or self-intersections University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Trustworthiness [2] The trustworthiness quanties how trustworthy is a projection of a high-dimensional data set onto a low-dimensional space. Specically a projection is trustworthy if the set of the t nearest neighbors of each data point in the lowdimensional space are also close-by in the original space. r(i, j) is the rank of the data point j in the ordering according to the distance from i in the original data space Ut(i) denotes the set of those data points that are among the tnearest neighbors of the data point i in the low-dimensional space but not in the original space. The maximal value that trustworthiness can take is equal to one. The closer M(t) is to one, the better the low-dimensional space describes the originaldata. University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Several methods to learn a manifold • Two to start: – Isomap [Tenenbaum 2000] – Locally Linear Embeddings (LLE) [Roweis and Saul, 2000] • Recently: – Semidefinite Embeddings (SDE) [Weinberger and Saul, 2005] University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
An important observation • Small patches on a non-linear manifold look linear • These locally linear neighborhoods can be defined in two ways – k-nearest neighbors: find the k nearest points to a given point, under some metric. Guarantees all items are similarly represented, limits dimension to K-1 – ε-ball: find all points that lie within ε of a given point, under some metric. Best if density of items is high and every point has a sufficient number of neighbors University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 1. pdf
Isomap • Find coordinates on lower-dimensional manifold that preserve geodesic distances instead of Euclidean distances • Key Observation: If goal is to discover Small Euclidean distance underlying manifold, geodesic distance makes more sense than Euclidean Large geodesic distance http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 1. pdf University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Calculating geodesic distance • We know how to calculate Euclidean distance • Locally linear neighborhoods mean that we can approximate geodesic distance within a neighborhood using Euclidean distance • A graph is constructed by connecting each point to its K nearest neighbours. • Approximate geodesic distances are calculated by finding the length of the shortest path in the graph between points • Use Dijkstra’s algorithm to fill in remaining distances http: //www. maths. lth. se/bioinformatics/calendar/20040527/Nilsson. J_KI_27 maj 04. pdf University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Dijkstra’s Algorithm • Greedy breadth-first algorithm to compute shortest path from one point to all other points University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 2. pdf
Isomap Algorithm – Compute fully-connected neighborhood of points for each item • Can be k nearest neighbors or ε-ball – Calculate pairwise Euclidean distances within each neighborhood – Use Dijkstra’s Algorithm to compute shortest path from each point to non-neighboring points – Run MDS on resulting distance matrix http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 2. pdf University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Isomap Algorithm [3] University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Time Complexity of Algorithm http: //www. cs. rutgers. edu/~elgammal/classes/cs 536/lectures/NLDR. pdf University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Isomap Results Find a 2 D embedding of the 3 D S-curve http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 2. pdf University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Residual Fitting Error http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 2. pdf Plotting eigenvalues from MDS will tell you dimensionality of your data University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Neighborhood Graph University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 2. pdf
More Isomap Results University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 2. pdf
Results on projecting the face dataset to two dimensions (Trustworthiness−Continuity) [1] University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
More Isomap Results University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 2. pdf
Isomap Failures • Isomap has problems on closed manifolds of arbitrary topology University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi http: //www. cs. unc. edu/Courses/comp 290 -090 -s 06/Lecturenotes/Dim. Reduction 2. pdf
Isomap: Advantages • Nonlinear • Globally optimal – Still produces globally optimal low-dimensional Euclidean representation even though input space is highly folded, twisted, or curved. • Guarantee asymptotically to recover the true dimensionality. University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Isomap: Disadvantages • Guaranteed asymptotically to recover geometric structure of nonlinear manifolds – As N increases, pairwise distances provide better approximations to geodesics by “hugging surface” more closely – Graph discreteness overestimates d. M(i, j) • K must be high to avoid “linear shortcuts” near regions of high surface curvature • Mapping novel test images to manifold space University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
Literature [1] Jarkko Venna and Samuel Kaski, Nonlinear dimensionality reduction viewed as information retrieval, NIPS' 2006 workshop on Novel Applications of Dimensionality Reduction, 9 Dec 2006 http: //www. cis. hut. fi/projects/mi/papers/nips 06_nldrws_poster. pdf [2] Claudio Varini, Visual Exploration of Multivariate Data in Breast Cancer by Dimensional Reduction, March 2006 http: //deposit. ddb. de/cgibin/dokserv? idn=98073472 x&dok_var=d 1&dok_ext=pdf&filena me=98073472 x. pdf [3] Yiming. Wu, Kap Luk Chan, An Extended Isomap Algorithm for Learning Multi-Class Manifold, Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference, Aug. 2004 http: //ww 2. cs. fsu. edu/~ywu/PDF-files/ICMLC 2004. pdf University of Joensuu Dept. of Computer Science P. O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www. cs. joensuu. fi
- Slides: 25