Applied Multivariate Quantitative Methods Principal Components Analysis PCA
- Slides: 46
Applied Multivariate Quantitative Methods Principal Components Analysis (PCA) By Jen-pei Liu, Ph. D Division of Biometry, Department of Agronomy, National Taiwan University and Wei-Chie, MD, Ph. D Department of Public Health National Taiwan University 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 1
Principal Components Analysis n n n Introduction Procedures Properties Examples Summary 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 2
Introduction n Described by K. Pearson (1901) Computing methods by Hotelling (1933) Objective n To transform the original variables X 1, …, Xp into index variables Z 1, …, Zp n n n 12/31/2021 Z 1, …, Zp are linear combinations of X 1, …, Xp Z 1, …, Zp are independent and are in order of important To describe the variation in the data Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 3
Introduction n n Lack of correlation index variables measure different dimensions (domains) Lack of correlation only consider the variance of index variables and do not have to take covariance into consideration Ordering Var(Z 1) Var(Z 2) … Var(Zp) The Z index variables are called the principal components 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 4
Introduction n Variance of the variation in the full data set can be adequately describe by the few Z index variables Reduction of dimension from 2 -digit number to just 2 to 4 principal compoents High correlations in the original variables 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 5
Introduction 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 6
Introduction Correlations of Female Sparrows X 1 X 2 X 3 X 4 X 5 Total length (X 1) 1. 000 Alar length (X 2) 0. 735 1. 000 Length of beak and Head (X 3) 0. 662 0. 674 1. 000 Length of humerus (X 4) 0. 645 0. 769 0. 763 1. 000 Length of keel of sternum (X 5) 0. 605 0. 529 0. 626 0. 607 1. 000 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 7
Introduction Component Variance 1 3. 616 2 0. 532 3 0. 386 4 0. 302 5 0. 165 12/31/2021 Coefficients for Components X 1 X 2 X 3 X 4 X 5 0. 452 0. 462 0. 451 0. 471 0. 398 -0. 051 0. 300 0. 325 0. 185 -0. 877 0. 691 0. 341 -0. 455 -0. 411 -0. 179 -0. 420 0. 548 -0. 606 0. 388 0. 069 0. 374 -0. 530 -0. 343 0. 652 -0. 192 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 8
Introduction Z 1=0. 452 X 1+0. 462 X 2+0. 451 X 3+0. 471 X 4+0. 398 X 5 Variance of Z 1 is 3. 62 Variance of Z 1 accounts for 72. 3% (3. 62/5. 00) of the total variation All coefficients of Z 1 are smaller than 1 and sum of squares of these coefficients is equal to 1 Z 1 is in fact as the average (or sum) of X 1, X 2, X 3, X 4, and X 5 Z 1 can be interpreted as the index for the size of the sparrow 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 9
Procedures Case 1 2. . N 12/31/2021 Data Structure X 1 X 2 … Xp x 11 x 12 … x 1 p x 21 x 22 … x 2 p xn 1 xn 2 … xnp Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 10
Procedures n The First Component The first component is a linear combination of X 1, X 2, …, Xp n Z 1= a 11 X 1+a 12 X 2+…+a 1 p. Xp n Var(Z 1) is as large as possible subject to condition that a 112+a 122+…+a 1 p 2=1 n 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 11
Procedures 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 12
Procedures n The second Component The second component is also a linear combination of X 1, X 2, …, and Xp n Z 1= a 21 X 1+a 22 X 2+…+a 2 p. Xp n Var(Z 2) is as large as possible subject to condition that a 212+a 222+…+a 2 p 2=1, Var(Z 2) is the second largest, Z 1 and Z 2 are not correlated n 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 13
Procedures n The third Component The third component is also a linear combination of X 1, X 2, …, and Xp n Z 1= a 31 X 1+a 32 X 2+…+a 3 p. Xp n Var(Z 2) is as large as possible subject to condition that a 312+a 322+…+a 3 p 2=1, Var(Z 3) is the second largest, Z 1, Z 2 and Z 3 are not correlated n 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 14
Procedures n n Continue until all p principal components are computed Covariance matrix of p variables 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 15
Procedures 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 16
Procedures 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 17
Procedures n n Different variables might have different units and magnitudes PCA might be influenced by these magnitudes and units Standardization to have zero mean and unit variance Covariance on standardized variables is the correlation matrix 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 18
Procedures n Steps of (PCA) n n Standardizing variables X 1, X 2, …, Xp to have zero means and unit variances unless that the importance of variables is reflected in their variances Calculate the covariance matrix (correlation matrix) 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 19
Procedures n Steps of (PCA) n n n Find the eigenvalues 1, 2, …, p and their corresponding eigenvectors a 1, a 2, …, ap The coefficients of the ith principal component Zi is the element of ai and i the variance of Zi Discard any components that accounts for only a small proportion of the variation in the data 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 20
Properties 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 21
Properties n n n E(Z)=A V(Z)=A A’=diag{ I, i=1, …, p} Cov(Zi, Xj)=aij i Corr(Zi, Xj)=aij i/cjj Corr(Zi, Xj)=aij i, if correlation matrix is used 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 22
Examples n Determination of the number of principal components n n Depends upon the needs of practitioners The proportion of the total variation explained by the selected principal components is high, e. g. , at least 80% If correlation matrix is used, select the principal component with the variance greater than 1 because they accounts for more variation than the original variables (=1) Use scree plot 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 23
Examples n Evaluation of Statistics Course n n n 16 students for 11 items (variables) Evaluation scales: 1(poor or not at all) to 5(excellent, strongly, or difficult) The first two principal components explain 76. 0% of total variation and the last four principal components explain only 2. 2% 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 24
Examples 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 25
Examples n Test scores of 10 students in 4 subjects Subject Chinese(X 1) English(X 2) Math(X 3) Social(X 4) 1 85 76 60 85 2 90 95 80 72 Student 3 4 60 70 45 65 38 60 80 76 5 68 56 70 70 6 77 80 65 68 7 50 30 40 80 8 80 70 60 66 9 85 75 65 84 10 55 60 40 50 Source: Shen (1998) 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 26
Examples X 1 X 2 X 3 X 4 12/31/2021 X 1 1 Correlation Matrix X 2 X 3 X 4 0. 8846 0. 8375 0. 2784 1 0. 8059 -0. 1101 1 0. 1118 1 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 27
Examples n Eigenvalues and Eigenvectors Eigenvalue 2. 70159 1. 06380 0. 19870 0. 03591 12/31/2021 Prop. 0. 6754 0. 2660 0. 0497 0. 0090 Cum. Eigenvector Prop. X 1 X 2 0. 6554 0. 5897 0. 1254 0. 9414 0. 1254 -0. 2651 0. 9910 0. 3592 0. 4378 1. 0000 -0. 7124 0. 6444 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D X 3 X 4 0. 3592 -0. 7124 -0. 0281 0. 9556 -0. 8227 0. 0501 0. 0485 0. 2737 28
Examples n n n Because the first two principal components account for 94. 14%, we can just use these two principal components The first principal component can be interpreted as the index for the sum of Chinese, English and math The second principal component can be thought as social science 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 29
Examples n n n The above results can be also obtained by inspecting the correlation matrix Correlations among Chinese, English, and math exceed 0. 8 Correlations between Chinese, English, and math with social science are below 0. 3 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 30
Examples Correlation between the first principal component with original variables Corr(Z 1, X 1)=a 11 1 =0. 5897 2. 70159=0. 9692 Corr(Z 1, X 2)=a 12 1 =0. 5682 2. 70159=0. 9339 Corr(Z 1, X 3)=a 13 1 =0. 5657 2. 70159=0. 9298 Corr(Z 1, X 4)=a 14 i = 0. 0969 2. 70159=0. 1592 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 31
Examples Correlation between the second principal component with original variables Corr(Z 2, X 1)=a 21 2 Corr(Z 2, X 2)=a 22 2 Corr(Z 2, X 3)=a 23 2 Corr(Z 2, X 4)=a 24 2 12/31/2021 =0. 1254 1. 0638=0. 1294 =-0. 2651 1. 0638=-0. 2734 =-0. 0281 1. 0638=-0. 0290 = 0. 9556 1. 0638=0. 9856 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 32
Examples Student 1 2 3 4 5 6 7 8 9 10 12/31/2021 1 st Component 0. 91883 2. 58868 -1. 85920 0. 03527 0. 01741 0. 92643 -2. 67248 0. 52758 1. 32646 -1. 80897 2 nd Component 1. 12685 -0. 41488 0. 84509 0. 23932 -0. 21745 -0. 65337 0. 96553 -0. 65459 0. 92471 -0. 16121 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 33
Examples Correlations of Female Sparrows X 1 X 2 X 3 X 4 X 5 Total length (X 1) 1. 000 Alar length (X 2) 0. 735 1. 000 Length of beak and Head (X 3) 0. 662 0. 674 1. 000 Length of humerus (X 4) 0. 645 0. 769 0. 763 1. 000 Length of keel of sternum (X 5) 0. 605 0. 529 0. 626 0. 607 1. 000 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 34
Examples Component Variance 1 3. 616 2 0. 532 3 0. 386 4 0. 302 5 0. 165 12/31/2021 Coefficients for Components X 1 X 2 X 3 X 4 X 5 0. 452 0. 462 0. 451 0. 471 0. 398 -0. 051 0. 300 0. 325 0. 185 -0. 877 0. 691 0. 341 -0. 455 -0. 411 -0. 179 -0. 420 0. 548 -0. 606 0. 388 0. 069 0. 374 -0. 530 -0. 343 0. 652 -0. 192 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 35
Examples n The first principal component Z 1=0. 452 X 1+0. 462 X 2+0. 451 X 3+0. 471 X 4+0. 398 X 5 n n An index of bird size The second principal component Z 2=-0. 051 X 1+0. 300 X 2+0. 325 X 3+0. 185 X 4 -0. 877 X 5 n An index of bird shape 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 36
Examples n The value of the first principal component for the first bird Z 1=0. 452(-0. 542)+0. 462(0. 725)+0. 451(0. 177)+ 0. 471(0. 055)+0. 398(-0. 33) = 0. 064 n The value of the second principal component for the first bird Z 2=-0. 051(-0. 542)+0. 300(0. 725)+0. 325(0. 177)+ 0. 185(0. 055)+(-0. 877(-0. 33) = 0. 602 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 37
Examples Mean 1 2 3 4 5 Survivor -0. 100 0. 004 -0. 140 0. 073 0. 023 12/31/2021 Standard Deviation Nonsurvivor 0. 075 -0. 003 0. 105 -0. 055 -0. 017 Survivor 1. 506 0. 684 0. 522 0. 563 0. 411 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D Nonsurvivor 2. 176 0. 776 0. 677 0. 543 0. 408 38
Examples n Employment in European Countries AGR MIN MAN PS CON SER FIN SPC TC AGR 1. 000 MIN 0. 316 1. 000 MAN -0. 254 -0. 672 1. 000 PS(3) -0. 382 -0. 387 0. 388 1. 000 CON -0. 349 -0. 129 -0. 034 0. 165 1. 000 SER -0. 605 -0. 407 -0. 033 0. 155 0. 473 1. 000 FIN -0. 176 -0. 248 -0. 274 0. 094 -0. 018 0. 379 1. 000 SPC -0. 811 -0. 316 0. 050 0. 238 0. 072 0. 388 0. 166 1. 000 TC -0. 487 0. 045 0. 243 0. 105 -0. 055 -0. 085 -0. 391 0. 475 1. 000 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 39
Examples n 9 eigenvalues: 3. 112(34. 6%), 1. 809(20. 1%), 1. 496(16. 6%), 1. 063(11. 8%), 0. 710(7. 9%) 0. 311(3. 5%), 0. 293(3. 3%), 0. 204(2. 4%), and 0(0. 0%) The sum of percent employment is 1 The columns of correlation matrix are linearly dependent The last eigenvalue is 0 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 40
Examples n n Select the principal components with eigenvaleues greater than 1 the first 4 principal components that explain 85% of the total variation in the data If we take first two principal components which can account only for 55% of total variation 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 41
Examples n The first principal component Z 1=0. 51(AGR)+0. 37(Min)-0. 25(MAN)0. 31(PS)-0. 22(CON)-0. 38(SER)-0. 13(FIN)0. 42(SPS)-0. 21(TC) n A contrast between AGR(agriculture, forestry, and fishing) and MIN(mining and quarrying) versus others 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 42
Examples n The second principal component Z 1=-0. -2(AGR)+0. 00(Min)+0. 43(MAN) +0. 11(PS)-0. 24(CON)-0. 41(SER) -0. 55(FIN)+0. 05(SPS)+0. 52(TC) n A contrast between MAN(manufacturing) and TC(transport and communication) versus CON(construction), SER(service industry) and FIN(finance) 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 43
12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 44
12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 45
Summary n n A linear combination of the original variables Try to reduce a large number of variables to a few index variables Index variables are not correlated and ordered in the magnitude of variation Illustration with real examples 12/31/2021 Copyright by Jen-pei Liu, Ph. D and Wei-Chi Chie, MD, Ph. D 46
- Multivariate methods in machine learning
- Advanced and multivariate statistical methods
- Multivariate analysis
- Manova definition
- Nature of multivariate analysis
- Multivariate analysis of variance and covariance
- Multivariate analysis
- Multivariate statistical analysis
- Multivariate analysis
- Multivariate analysis
- Multivariate pattern analysis
- Research methods midterm
- Pca vs ica
- What is the sample size in qualitative research?
- Integrating qualitative and quantitative methods
- Indirect methods of contoring uses how many methods
- Logistische funktion ableitung
- Multivariate binomial distribution
- Multivariate pdf
- Linear regression spss interpretation
- Mle formula
- Maximum a posteriori estimation for multivariate gaussian
- Multivariate pdf
- Multivariate vs bivariate
- Multivariate anova spss
- Ratio test
- Faktorenanalyse
- Univariate vs multivariate
- Multivariate scatter plot
- Multivariate descriptive statistics
- Multivariate statistics for the environmental sciences
- Multivariate histogram
- Accenture delivery suit
- Time series analysis
- Umo aba examples
- Applied conjoint analysis
- Ethical issues in applied behavior analysis
- Automatic reinforcement aba example
- University of manitoba applied business management
- R for gis
- Teori perilaku abc
- International institute for applied system analysis
- Discourse analysis topics
- Applied conjoint analysis
- Analysis
- Findings of qualitative research
- Lowry method