SVD and PCA COS 323 Dimensionality Reduction Map

  • Slides: 27
Download presentation
SVD and PCA COS 323

SVD and PCA COS 323

Dimensionality Reduction • Map points in high-dimensional space to lower number of dimensions •

Dimensionality Reduction • Map points in high-dimensional space to lower number of dimensions • Preserve structure: pairwise distances, etc. • Useful for further processing: – Less computation, fewer parameters – Easier to understand, visualize

PCA • Principal Components Analysis (PCA): approximating a high-dimensional data set with a lower-dimensional

PCA • Principal Components Analysis (PCA): approximating a high-dimensional data set with a lower-dimensional linear subspace Second principal component * ** * * Data points * * * First principal component * * *** Original axes *

SVD and PCA • Data matrix with points as rows, take SVD – Subtract

SVD and PCA • Data matrix with points as rows, take SVD – Subtract out mean (“whitening”) • Columns of V k are principal components • Value of wi gives importance of each component

PCA on Faces: “Eigenfaces” Average face First principal component Other components For all except

PCA on Faces: “Eigenfaces” Average face First principal component Other components For all except average, “gray” = 0, “white” > 0, “black” < 0

Uses of PCA • Compression: each new image can be approximated by projection onto

Uses of PCA • Compression: each new image can be approximated by projection onto first few principal components • Recognition: for a new image, project onto first few principal components, match feature vectors

PCA for Relighting • Images under different illumination [Matusik & Mc. Millan]

PCA for Relighting • Images under different illumination [Matusik & Mc. Millan]

PCA for Relighting • Images under different illumination • Most variation captured by first

PCA for Relighting • Images under different illumination • Most variation captured by first 5 principal components – can re-illuminate by combining only a few images [Matusik & Mc. Millan]

PCA for DNA Microarrays • Measure gene activation under different conditions [Troyanskaya]

PCA for DNA Microarrays • Measure gene activation under different conditions [Troyanskaya]

PCA for DNA Microarrays • Measure gene activation under different conditions [Troyanskaya]

PCA for DNA Microarrays • Measure gene activation under different conditions [Troyanskaya]

PCA for DNA Microarrays • PCA shows patterns of correlated activation – Genes with

PCA for DNA Microarrays • PCA shows patterns of correlated activation – Genes with same pattern might have similar function [Wall et al. ]

PCA for DNA Microarrays • PCA shows patterns of correlated activation – Genes with

PCA for DNA Microarrays • PCA shows patterns of correlated activation – Genes with same pattern might have similar function [Wall et al. ]

Multidimensional Scaling • In some experiments, can only measure similarity or dissimilarity – e.

Multidimensional Scaling • In some experiments, can only measure similarity or dissimilarity – e. g. , is response to stimuli similar or different? – Frequent in psychophysical experiments, preference surveys, etc. • Want to recover absolute positions in k-dimensional space

Multidimensional Scaling • Example: given pairwise distances between cities – Want to recover locations

Multidimensional Scaling • Example: given pairwise distances between cities – Want to recover locations [Pellacini et al. ]

Euclidean MDS • Formally, let’s say we have n n matrix D consisting of

Euclidean MDS • Formally, let’s say we have n n matrix D consisting of squared distances dij = (x i – x j)2 • Want to recover n d matrix X of positions in d-dimensional space

Euclidean MDS • Observe that • Strategy: convert matrix D of dij 2 into

Euclidean MDS • Observe that • Strategy: convert matrix D of dij 2 into matrix B of x ix j – “Centered” distance matrix – B = XX T

Euclidean MDS • Centering: – Sum of row i of D = sum of

Euclidean MDS • Centering: – Sum of row i of D = sum of column i of D = – Sum of all entries in D =

Euclidean MDS • Choose xi = 0 – Solution will have average position at

Euclidean MDS • Choose xi = 0 – Solution will have average position at origin – Then, • So, to get B : – compute row (or column) sums – compute sum of sums – apply above formula to each entry of D – Divide by – 2

Euclidean MDS • Now have B , want to factor into XX T •

Euclidean MDS • Now have B , want to factor into XX T • If X is n d, B must have rank d • Take SVD, set all but top d singular values to 0 – Eliminate corresponding columns of U and V – Have B 3=U 3 W 3 V 3 T – B is square and symmetric, so U = V – Take X = U 3 times square root of W 3

Multidimensional Scaling • Result (d = 2): [Pellacini et al. ]

Multidimensional Scaling • Result (d = 2): [Pellacini et al. ]

Multidimensional Scaling • Caveat: actual axes, center not necessarily what you want (can’t recover

Multidimensional Scaling • Caveat: actual axes, center not necessarily what you want (can’t recover them!) • This is “classical” or “Euclidean” MDS [Torgerson 52] – Distance matrix assumed to be actual Euclidean distance • More sophisticated versions available – “Non-metric MDS”: not Euclidean distance, sometimes just inequalities – “Weighted MDS”: account for observer bias

Computation • SVD very closely related to eigenvalue/vector computation – Eigenvectors/values of ATA –

Computation • SVD very closely related to eigenvalue/vector computation – Eigenvectors/values of ATA – In practice, similar class of methods, but operate on A directly

Methods for Eigenvalue Computation • Simplest: power method – Begin with arbitrary vector x

Methods for Eigenvalue Computation • Simplest: power method – Begin with arbitrary vector x 0 – Compute xi+1=Axi – Normalize – Iterate • Converges to eigenvector with maximum eigenvalue!

Power Method • As this is repeated, coefficient of e 1 approaches 1

Power Method • As this is repeated, coefficient of e 1 approaches 1

Power Method II • To find smallest eigenvalue, similar process: – Begin with arbitrary

Power Method II • To find smallest eigenvalue, similar process: – Begin with arbitrary vector x 0 – Solve Axi+1= xi – Normalize – Iterate

Deflation • Once we have found an eigenvector e 1 with eigenvalue 1, can

Deflation • Once we have found an eigenvector e 1 with eigenvalue 1, can compute matrix A – 1 e 1 T • This makes eigenvalue of e 1 equal to 0, but has no effect on other eigenvectors/values • In principle, could find all eigenvectors this way

Other Eigenvector Computation Methods • Power method OK for a few eigenvalues, but slow

Other Eigenvector Computation Methods • Power method OK for a few eigenvalues, but slow and sensitive to roundoff error • Modern methods for eigendecomposition/SVD use sequence of similarity transformations to reduce to diagonal, then read off eigenvalues