Principal Component Analysis Bamshad Mobasher De Paul University

Principal Component Analysis Bamshad Mobasher De. Paul University

Principal Component Analysis i PCA is a widely used data compression and dimensionality reduction technique 4 PCA takes a data matrix, A, of n objects by p variables, which may be correlated, and summarizes it by uncorrelated axes (principal components or principal axes) that are linear combinations of the original p variables h. The first k components display most of the variance among objects h. The remaining components can be discarded resulting in a lower dimensional representation of the data that still captures most of the relevant information 4 PCA is computed by determining the eigenvectors and eigenvalues of the covariance matrix h. Recall: The covariance of two random variables is their tendency to vary together 2

Geometric Interpretation of PCA i The goal is to rotate the axes of the p-dimensional space to new positions (principal axes) that have the following properties: 4 ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance, . . , and axis p has the lowest variance 4 covariance among each pair of the principal axes is zero (the principal axes are uncorrelated). PC 1 PC 2 Note: Each principal axis is a linear combination of the original two variables Credit: Loretta Battaglia, Southern Illinois University 3

PCA: Coordinate Transformation From p original variables: x 1, x 2, . . . , xp: Produce p new variables: y 1, y 2, . . . , yp: y 1 = a 11 x 1 + a 12 x 2 +. . . + a 1 pxp y 2 = a 21 x 1 + a 22 x 2 +. . . + a 2 pxp. . . yp = ap 1 x 1 + ap 2 x 2 +. . . + appxp yi's are Principal Components such that: yi's are uncorrelated (orthogonal) y 1 explains as much as possible of original variance in data set y 2 explains as much as possible of remaining variance etc.

Principal Components 2 nd Principal Component, y 2 1 st Principal Component, y 1

Principal Components: Scores xi 2 yi, 1 yi, 2 xi 1

Principal Components: Eigenvalues λ 1 Eigenvalues represent variances of along the direction of each principle component λ 2

Principal Components: Eigenvectors z 1 = [a 11, a 12, . . . , a 1 p]: 1 st Eigenvector of the covariance (or correlation) matrix, and coefficients of first principal component z 2 =[a 21, a 22, . . . , a 2 p]: 2 nd Eigenvector of the covariance (or correlation) matrix, and coefficients of first principal component … zp =[ap 1, ap 2, . . . , app]: pth Eigenvector of the covariance (or correlation), matrix and coefficients of pth principal component Dimensionality Reduction We can take only the top k principal components y 1, y 2, . . . , yk effectively transforming the data into a lower dimensional space.

Covariance Matrix Recall: PCA is computed by determining the eigenvectors and eigenvalues of the covariance matrix Covariance of variables i and j Sum over n objects Value of variable i in object m Mean of variable i Value of variable j in object m Mean of variable j i Notes: 4 For a variable x, cov(x, x) = var(x) 4 For independent variables x and y, cov(x, y ) = 0 4 The covariance matrix is a matrix C with elements Ci, j = cov(i, j) 4 The covariance matrix is square and symmetric 4 For independent variables, the covariance matrix will be a diagonal matrix with the variances along the diagonal and covariances in the non-diagonal elements 4 To calculate the covariance matrix from a dataset, ﬁrst center the data by subtracting the mean of each variable, then compute: 1/n (AT. A) 9

Covariance Matrix - Example Centered Data Original Data X= A= Cov(X) = 1/(n-1) ATA = Covariance Matrix 10

Summary: Eigenvalues and Eigenvectors i Finding the principal axes involves finding eigenvalues and eigenvectors of the covariance matrix (C = ATA) 4 eigenvalues are values ( ) such that C. Z = . Z (Z are the eigenvectors) 4 this can be re-written as: (C - I). Z = 0 4 eigenvalues can be found by solving the characteristic equation: det(C - I) = 0 i The eigenvalues, 1, 2, . . . p are the variances of the coordinates on each principal component axis 4 the sum of all p eigenvalues equals the trace of C (the sum of the variances of the original variables) i The eigenvectors of the covariance matrix are the axes of max variance 4 a good approximation of the full matrix can be computed using only a subset of the eigenvectors and eigenvalues 4 the eigenvalues are truncated below some threshold; then the data is reprojected onto the remaining r eigenvectors to get a rank-r approximation 11

Eigenvalues and Eigenvectors 1 = 73. 718 2 = 0. 384 3 = 0. 298 Covariance Matrix Eigenvalues Note: 1+ 2 + 3 = 74. 4 = trace of C Eigenvectors (sum of variances in the diagonal) Z= 12

Reduced Dimension Space i Coordinates of each object i on the kth principal axis, known as the scores on PC k, are computed as where Y is the n x k matrix of PC scores, X is the n x p centered data matrix and Z is the p x k matrix of eigenvectors i Variance of the scores on each PC axis is equal to the corresponding eigenvalue for that axis 4 the eigenvalue represents the variance displayed (“explained” or “extracted”) by the kth axis 4 the sum of the first k eigenvalues is the variance explained by the k-dimensional reduced matrix 13

Reduced Dimension Space i Each eigenvalue represents the variance displayed (“explained”) by the a PC. The sum of the first k eigenvalues is the variance explained by the kdimensional reduced matrix A Scree Plot 14

Reduced Dimension Space i So, to generate the data in the new space: Final. Data = Row. Feature. Vector x Row. Zero. Mean. Data i Row. Feature. Vector: 4 Matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top i Row. Zero. Mean. Data 4 The mean-adjusted data transposed, i. e. the data items are in each column, with each row holding a separate dimension 15