PCA SVD Eigenvectors Applications Streaming Big Data Class

PCA, SVD, Eigenvectors, Applications, Streaming Big Data Class TAU 2013

Some Notation Big Data Class TAU 2013

Eigenvalues and Eigenvectors Big Data Class TAU 2013

Eigenvalues and Eigenvectors Real valued matrices can have complex Eigenvalues/eigenvectors Big Data Class TAU 2013

Characteristic Polynomial Eigenvalues can be complex, and can appear multiple times Big Data Class TAU 2013

Example Big Data Class TAU 2013

Motivation: Web Surfer Model (Google Pagerank) 1. Follow edges uniformly with probability , and 2. Uniformly and randomly jump with probability 1 - Page. Rank obtains a probability distribution; for example: “The places we find the surfer most often are important pages”

Stochastic Matrix of the (static) Web • 8

Pagerank = Eigenvector with eigenvalue 1 Big Data Class TAU 2013

Personalized Pagerank • Big Data Class TAU 2013

Stochastic Matrix For Personalized Pagerank • Right eigenvector with eigenvalue one gives personalized pagerank 11

More Properties of eigenvectors/values • Big Data Class TAU 2013

More Properties of eigenvectors/values • Eigenvalues of (real) symmetric matrices are real A real Big Data Class TAU 2013

More Properties of eigenvectors/values • Eigenvectors of two different eigenvalues of a symmetric (real) matrix are orthogonal Eigenvectors belonging to distinct eigenvalues are linearly independent Big Data Class TAU 2013

Properties of eigenvectors/values • Big Data Class TAU 2013

Transformations • Big Data Class TAU 2013

Power iteration Assumes that A is full rank Alternative: normalize Big Data Class TAU 2013

Tricks • Big Data Class TAU 2013

Principle Component Analysis • Big Data Class TAU 2013

Singular Value Decomposition (SVD)

1 at position j

SVD Column times row gives matrix Big Data Class TAU 2013

Pagerank vs. Hubs and Authorities • Pagerank (and personalized pagerank) are the right eigenvector of a stochastic matrix for a random walk in the web • Pagerank can be interpreted as follows: – The importance of a node is the sum of the importance of the nodes that point to it (split amongst all nodes it points to). • Hubs and Authorities: A different idea, (SVD) – Hub nodes confer authority to “authorities” – This is not split amongst the outdegree – Authorities confer “hub quality” to hubs Big Data Class TAU 2013

Matrix for Hubs and Authorities • Big Data Class TAU 2013

SVD and Eigenvalues Big Data Class TAU 2013

SVD and eigenvectors • Big Data Class TAU 2013

SVD: Motivation Input: 2 -d dimensional points Output: 2 nd (right) singular vector 1 st (right) singular vector: direction of maximal variance, 2 nd (right) singular vector: direction of maximal variance, after removing the projection of the data along the first singular vector. 1 st (right) singular vector

Motivation: Singular values 2 nd (right) singular vector 1 1 st (right) singular vector

Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Polymorphisms: the most common type of genetic variation in the genome across different individuals. They are known locations at the human genome where two alternate nucleotide bases (alleles) are observed (out of A, C, G, T). individuals SNPs … … … … AG CT GT GG CT CC CC AG AG AG AA CT AA GG GG CC GG AG CG AC CC AA GG TT AG CT CG CG CG AT CT CT AG GG GT GA AG … GG TT TT GG TT CC CC GG AA AG AG AG AA CT AA GG GG CC GG AA CC AA GG TT AA TT GG GG GG TT TT CC GG TT GG AA … GG TT TT GG TT CC CC GG AA AG AG AA AG CT AA GG GG CC AG AG CG AC CC AA GG TT AG CT CG CG CG AT CT CT AG GG GT GA AG … GG TT TT GG TT CC CC GG AA AG AG AG AA CC GG AA CC CC AG GG CC AC CC AA CG AA GG TT AG CT CG CG CG AT CT CT AG GT GT GA AG … GG TT TT GG TT CC CC GG AA GG GG GG AA CT AA GG GG CT GG AA CC AC CG AA CC AA GG TT GG CC CG CG CG AT CT CT AG GG TT GG AA … GG TT TT GG TT CC CC CG CC AG AG AG AA CT AA GG GG CT GG AG CC CC CG AA CC AA GT TT AG CT CG CG CG AT CT CT AG GG TT GG AA … GG TT TT GG TT CC CC GG AA AG AG AG AA TT AA GG GG CC AG AG CG AA CC AA CG AA GG TT AA TT GG GG GG TT TT CC GG TT GG GT TT GG AA … There are ~10 million SNPs in the human genome, so this matrix could have ~10 million columns.

Our data as a matrix … … … … AG CT GT GG CT CC CC AG AG AG AA CT AA GG GG CC GG AG CG AC CC AA GG TT AG CT CG CG CG AT CT CT AG GG GT GA AG … GG TT TT GG TT CC CC GG AA AG AG AG AA CT AA GG GG CC GG AA CC AA GG TT AA TT GG GG GG TT TT CC GG TT GG AA … GG TT TT GG TT CC CC GG AA AG AG AA AG CT AA GG GG CC AG AG CG AC CC AA GG TT AG CT CG CG CG AT CT CT AG GG GT GA AG … GG TT TT GG TT CC CC GG AA AG AG AG AA CC GG AA CC CC AG GG CC AC CC AA CG AA GG TT AG CT CG CG CG AT CT CT AG GT GT GA AG … GG TT TT GG TT CC CC GG AA GG GG GG AA CT AA GG GG CT GG AA CC AC CG AA CC AA GG TT GG CC CG CG CG AT CT CT AG GG TT GG AA … GG TT TT GG TT CC CC CG CC AG AG AG AA CT AA GG GG CT GG AG CC CC CG AA CC AA GT TT AG CT CG CG CG AT CT CT AG GG TT GG AA … GG TT TT GG TT CC CC GG AA AG AG AG AA TT AA GG GG CC AG AG CG AA CC AA CG AA GG TT AA TT GG GG GG TT TT CC GG TT GG GT TT GG AA … SNPs Individuals SNPs 0 -1 -1 -1 0 -1 -1 -1 1 1 1 1 0 -1 -1 -1 -1 1 1 1 0 1 1 1 1 example: ΑΑ = 1 0 -1 -1 0 1 1 0 1 0 0 -1 0 0 1 0 1 -1 -1 0 0 0 1 1 -1 -1 0 0 0 1 0 0 0 1 -1 -1 1 1 -1 -1 -1 1 -1 -1 1 0 0 1 1 -1 -1 0 0 0 1 0 0 0 1 1 -1 1 0 1 -1 -1 0 0 0 0 -1 -1 1 0 1 -1 -1 0 -1 1 1 0 0 1 1 1 -1 -1 -1 1 0 0 0 0 0 1 -1 -1 1 0 0 1 -1 -1 0 1 1 1 -1 -1 0 0 0 1 -1 -1 1 0 0 0 1 1 1 0 1 -1 -1 1 -1 -1 -1 0 -1 -1 1 ΑG = 0 GG = -1

Rank-k approximations (Ak)

Uk solves an optimization problem… Given an m£n matrix A, we seek an m-by-k matrix C that minimizes the residual Frobenius norm:

Vk solves an optimization problem… Given an m£n matrix A, we seek an m-by-k matrix C that minimizes the residual Frobenius norm:

Rank-k approximations via the SVD A = U VT features = noise objects significant noise

SVD issues … SVD and PCA are often used to summarize and approximate matrices and they have enjoyed enormous success in data analysis. BUT

SVD issues … SVD and PCA are often used to summarize and approximate matrices and they have enjoyed enormous success in data analysis. BUT - For large sparse graphs they require large amounts of memory, exactly because the resulting matrices are not sparse any more. (Common networks such as the web, the Internet topology graph, the who-trusts-whom social network are all large and sparse. ) - Running time becomes an issue. - Assigning “meaning” to the singular vectors (reification) is tricky …

SVD Application: Document Retrieval The Vector Space Method • Term (rows) by document (columns) matrix, based on occurrence • translate into vectors in a vector space – one vector for each document • cosine to measure distance between vectors (documents) – small angle = large cosine = similar – large angle = small cosine = dissimilar

The Problem • Two problems using the vector space model: – synonymy: many ways to refer to the same object, e. g. car and automobile • leads to poor recall – polysemy: most words have more than one distinct meaning, e. g. model, python, chip • leads to poor precision

The Problem • Example: Vector Space Model – (from Lillian Lee) auto engine bonnet tyres lorry boot car emissions hood make model trunk make hidden Markov model emissions normalize Synonymy Polysemy Will have small cosine Will have large cosine but are related but not truly related

The Problem • Latent Semantic Indexing was proposed to address these two problems with the vector space model for Information Retrieval

Latent Semantic Analysis by Example • To see how this works let’s look at a small example • This example is taken from: Deerwester, S. , Dumais, S. T. , Landauer, T. K. , Furnas, G. W. and Harshman, R. A. (1990). "Indexing by latent semantic analysis. " Journal of the Society for Information Science, 41(6) , 391 -407. • Slides are from a presentation by Tom Landauer and Peter Foltz

A Small Example Technical Memo Titles c 1: Human machine interface for ABC computer applications c 2: A survey of user opinion of computer system response time c 3: The EPS user interface management system c 4: System and human system engineering testing of EPS c 5: Relation of user perceived response time to error measurement m 1: The generation of random, binary, ordered trees m 2: The intersection graph of paths in trees m 3: Graph minors IV: Widths of trees and well-quasi-ordering m 4: Graph minors: A survey

A Small Example – 2

A Small Example – 3 • Singular Value Decomposition {A}={U}{S}{V}T • Dimension Reduction {~A}~={~U}{~S}{~V}T

A Small Example – 4 • {U} =

A Small Example – 5 • {S} =

A Small Example – 6 • {V} =

A Small Example – 7

0. 92 -0. 72 1. 00

Lecture 8: More about Streaming Dimension Reduction Big Data Class TAU 2013

Again: Rank-k approximations (Ak)

Uk solves an optimization problem… Given an m£n matrix A, we seek an m-by-k matrix C that minimizes the residual Frobenius norm:

Vk solves an optimization problem… Given an m£n matrix A, we seek an m-by-k matrix C that minimizes the residual Frobenius norm:

Proofs of some claims Big Data Class TAU 2013

Big Data Class TAU 2013

Some claims Big Data Class TAU 2013

Streaming SVD: A appears row by row • Paper by Edo Liberty: Simple and Deterministic Matrix Sketches, KDD – “Frequent Directions” Algorithm – Extension of Misra-Gries • Paper by Mina Ghashami and Jeff Phillips: Relative Errors for Deterministic Low-Rank Matrix Approximations, preprint, uses “Frequent directions” Big Data Class TAU 2013

Lets study the SVD of few matrices

Lets study the SVD of few matrices Need to find the eigenvectors/eigenvalues of this matrix. .

Lets study the SVD of few matrices

Streaming Suggestion: Incremental PCA • Assume after i-1 rows we have the rank k approximation Ai-1 • Add the new row to Ai-1, compute the SVD of this new matrix and truncate it down to the k largest singular values

Incremental PCA: Problematic Example on whiteboard

• Big Data Class TAU 2013

Relative Error Heavy Hitters (Misra-Gries) • Big Data Class TAU 2013

• Big Data Class TAU 2013

Edo Liberty: Frequent directions • Big Data Class TAU 2013

Edo Liberty: Frequent directions SVD – change of notation because of GP paper Big Data Class TAU 2013

Edo Liberty: Frequent directions SVD – change of notation because of GP paper Reduce all singular values Big Data Class TAU 2013

• Big Data Class TAU 2013

• This ends the proof of the original frequent directions Big Data Class TAU 2013

• Big Data Class TAU 2013

Lemma: This ends the proof of the original frequent directions Big Data Class TAU 2013

Sampling for approximate SVD • Fast Monte Carlo Algorithms for finding Low. Rank Approximations, Freize, Kannan, Vempala Big Data Class TAU 2013

Sampling assumption • Sample the rows of A so that row i is sampled with probability Big Data Class TAU 2013

A small sample contains a good approximation • Big Data Class TAU 2013

A small sample S contains a good approximation Big Data Class TAU 2013

A small sample S contains a good approximation Projecting to subspaces does not depend on the basis Big Data Class TAU 2013

Bound the expected squared distance Big Data Class TAU 2013

References • Paper by Edo Liberty: Simple and Deterministic Matrix Sketches, KDD – “Frequent Directions” Algorithm – Extension of Misra-Gries • Paper by Mina Ghashami and Jeff Phillips: Relative Errors for Deterministic Low-Rank Matrix Approximations, preprint, uses “Frequent directions” Big Data Class TAU 2013