Introduction to SAS Essentials Mastering SAS for Data

  • Slides: 34
Download presentation
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward 1 SAS ESSENTIALS -- Elliott & Woodward

Chapter 17: FACTOR ANALYSIS 2 SAS ESSENTIALS -- Elliott & Woodward

Chapter 17: FACTOR ANALYSIS 2 SAS ESSENTIALS -- Elliott & Woodward

LEARNING OBJECTIVES • To be able to perform an exploratory factor analysis using PROC

LEARNING OBJECTIVES • To be able to perform an exploratory factor analysis using PROC FACTOR • To be able to use PROC FACTOR to identify underlying factors or latent variables in a data set • To be able to use PROC FACTOR to rotate factors for improved interpretation • To be able to use PROC FACTOR to compute factor scores 3 SAS ESSENTIALS -- Elliott & Woodward

Factor Analysis �Factor analysis is a dimension reduction technique designed to express the actual

Factor Analysis �Factor analysis is a dimension reduction technique designed to express the actual observed variables using a smaller number of underlying latent variables. �Exploratory factor analysis involves identifying factors, determining which factors are needed to satisfactorily describe the original data, interpreting the meaning of these factors, and so on. �Confirmatory factor analysis involves techniques for testing hypotheses to confirm theories, and so on. 4 SAS ESSENTIALS -- Elliott & Woodward

17. 1 FACTOR ANALYSIS BASICS � The typical steps in performing an exploratory factor

17. 1 FACTOR ANALYSIS BASICS � The typical steps in performing an exploratory factor analysis are the following: (a) Compute a correlation (or covariance) matrix for the observed variables. (b) Extract the factors (this involves deciding how many factors to extract, the method to use, and the values to use for the prior communality estimates). (c) Rotate the factors to improve interpretation. (d) Compute factor scores (if needed). � Factor analysis can be quite subjective without unique solutions. Consequently, there is a certain amount of "art" involved in any factor analysis solution. 5 SAS ESSENTIALS -- Elliott & Woodward

Using PROC Factor � The SAS procedure used to perform exploratory factor analysis is

Using PROC Factor � The SAS procedure used to perform exploratory factor analysis is PROC FACTOR. A simplified syntax for this procedure is as follows: PROC FACTOR <Options> ; VAR variables ; PRIORS communalities; RUN; 6 SAS ESSENTIALS -- Elliott & Woodward

Table 17. 1 Common Options for PROC FACTOR Option Explanation DATA = dataname Specifies

Table 17. 1 Common Options for PROC FACTOR Option Explanation DATA = dataname Specifies which data set to use. METHOD=option Specifies the estimation method. Options include ML and PRINCIPAL MINEIGEN=n Specifies the smallest eigenvalue for retaining a factor. NFACTORS=n Specifies the maximum number of factors to retain NOPRINT Suppress output PRIORS= option Specifies the method for obtaining prior communalities ROTATE = name Specifies the rotation method. The default is ROTATE=NONE. Common rotation methods are VARIMAX, QUARTIMAX, EQUAMAX, and PROMAX. All of the above are orthogonal rotations except PROMAX. SCREE Displays a Scree plot of the eigenvalues. SIMPLE Displays means, standard deviations, and number of observations CORR Displays the correlation matrix 7 SAS ESSENTIALS -- Elliott & Woodward

Common Statements for PROC FACTOR (Table 17. 1 Continued) VAR variable list; Specifies the

Common Statements for PROC FACTOR (Table 17. 1 Continued) VAR variable list; Specifies the numeric variables to be analyzed. Default is to use all numeric variables BY, FORMAT, LABEL, These statements are common to most WHERE procedures, and may be used here. Ø NOTE: If the Methods=Principal option is used, then principal component analysis is performed when the PRIORS= option is not used or is set to ONE (the default). Ø If you specify a PRIORS= value other than PRIORS=ONE, then a principal factor method analysis is performed. Ø A common usage is PRIORS=SMC in which case the prior communality for each variable is the squared multiple correlation of it with all other variables. Ø After extracting the factors, the communalities represent the proportion of the variance in each of the original variables retained after extracting the factors. 8 SAS ESSENTIALS -- Elliott & Woodward

Do Hands On Exercise p 379 (AFACTOR 1. SAS) � Two of the types

Do Hands On Exercise p 379 (AFACTOR 1. SAS) � Two of the types of intelligence are Logical-Mathematical Intelligence and Linguistic Intelligence. In this example, we examine a hypothetical data set that contains six variables, each measured on a 0 - 1 0 scale as follows: � COMPUTATION - Test on mathematical computations � VOCABULARY - A vocabulary test � INFERENCE - A test of the use of inductive and deductive inference � REASONING - A test of sequential reasoning � WRITING - A score on a writing sample � GRAMMAR - A test measuring proper grammar usage. 9 SAS ESSENTIALS -- Elliott & Woodward

Using PROC LOGISTIC PROC FACTOR DATA=MYSASLIB. INTEL Displays common statistics SIMPLE CORR SCORE Specifies

Using PROC LOGISTIC PROC FACTOR DATA=MYSASLIB. INTEL Displays common statistics SIMPLE CORR SCORE Specifies the estimation method. METHOD=PRINICPAL ROTATE=VARIMAX OUT=FS PRIORS=SMC Specifies rotation method Specifies the method for obtaining prior communalities PLOTS=SCREE; Requests SCREE plot RUN; 10 SAS ESSENTIALS -- Elliott & Woodward

Observe Output From PROC FACTOR � Simple Statistics 11 SAS ESSENTIALS -- Elliott &

Observe Output From PROC FACTOR � Simple Statistics 11 SAS ESSENTIALS -- Elliott & Woodward

Correlation Matrix for Six Variables � The high pairwise correlations among COMPUTATION, INFERENCE, and

Correlation Matrix for Six Variables � The high pairwise correlations among COMPUTATION, INFERENCE, and REASONING (to a lesser extent) seem to indicate some tendency to measure Math Intelligence while the variables VOCABULARY, WRITING, and GRAMMAR that seem to be measuring Linguistic Intelligence are also positively pairwise correlated. 12 SAS ESSENTIALS -- Elliott & Woodward

Prior Communality Estimates � Because we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the principal

Prior Communality Estimates � Because we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the principal factors method where the prior communality estimate for each variable is the squared multiple correlation of it with all other variables. These prior communality estimates are given in this table 13 SAS ESSENTIALS -- Elliott & Woodward

Scree Plot The Scree Plot gives a visual illustration of the sizes of the

Scree Plot The Scree Plot gives a visual illustration of the sizes of the eigenvalues. It is clear that there are two dominant eigenvalues. 14 SAS ESSENTIALS -- Elliott & Woodward

Eigenvalues � This table displays eigenvalues associated with the factors based on the reduced

Eigenvalues � This table displays eigenvalues associated with the factors based on the reduced correlation matrix. It is clear from the table that there are two dominant eigenvalues (2. 319 and 1. 725). Based on any reasonable criterion, it is clear that a two-factor solution should be used. 15 SAS ESSENTIALS -- Elliott & Woodward

Communality Estiamates � The communalities in this table are the proportion of the variance

Communality Estiamates � The communalities in this table are the proportion of the variance in each of the original variables retained after extracting the factors. It seems that all six variables are sufficiently well represented by the two factors, with variable REASONING having the smallest communality, 0. 335. 16 SAS ESSENTIALS -- Elliott & Woodward

Factor Pattern Matrix � In this table, it can be seen that for Factor

Factor Pattern Matrix � In this table, it can be seen that for Factor 1, each variable has a positive coefficient ranging from. 41 for REASONING to. 77 for WRITING. � A reasonable interpretation of this factor is that it is an overall measure of intelligence. � The second factor (Factor 2) has negative loadings on the variables measuring Linguistic Intelligence and positive coefficients on the others. 17 SAS ESSENTIALS -- Elliott & Woodward

Interpreting the Factor Analysis Results � Based on the less than ideal interpretability of

Interpreting the Factor Analysis Results � Based on the less than ideal interpretability of these factors, we use a rotation in hope of producing more interpretable results. (Recall that by construction, there should be two factors: Math Intelligence and Linguistic Intelligence. ) � Using the option ROTATE=VARIMAX, we have instructed SAS to perform a Varimax rotation. � SAS provides several rotation options, and Varimax is a popular "orthogonal rotation, " which produces two orthogonal factors that are potentially easier to interpret. 18 SAS ESSENTIALS -- Elliott & Woodward

Interpreting the Rotated Factor Pattern Matrix In this table the coefficients for COMPUTATION are

Interpreting the Rotated Factor Pattern Matrix In this table the coefficients for COMPUTATION are the correlations of the variable COMPUTATION with each of the two factors. � There is a large positive correlation between COMPUTATION and Factor 2 and a very small correlation between COMPUTATION and Factor 1. � Similar interpretations show that Factor 1 is highly correlated with the three variables measuring Linguistic Intelligence and Factor 2 tends to correspond to Math Intelligence. � 19 SAS ESSENTIALS -- Elliott & Woodward

Storing Factor Scores � Suppose you want to calculate factor scores and save them

Storing Factor Scores � Suppose you want to calculate factor scores and save them in a temporary working file FSCORES. In order to accomplish this, add the following PROC FACTOR options before PLOTS= SCREE; SCORE Outputs a SAS dataset named FSCORE NFACTOR=2 OUT=FSCORE � Then, after the RUN; statement add the code PROC PRINT DATA=FSCORE; VAR FACTORl FACTOR 2; RUN; 20 SAS ESSENTIALS -- Elliott & Woodward

Results of OUT=FSCORE � The two-factor scores are given the default names FACTOR 1

Results of OUT=FSCORE � The two-factor scores are given the default names FACTOR 1 and FACTOR 2 (the prefix "FACTOR" can be changed using the PREFIX= option). � Recalling that Factor 1 is a measure of Linguistic Intelligence and Factor 2 measures Math Intelligence, from the factor scores it can be seen that Subject 1 has a higher Linguistic Intelligence score, Subject 2 seems to have High Math Intelligence, and Subject 3 unfortunately doesn't seem to have strength in either dimension. 21 SAS ESSENTIALS -- Elliott & Woodward

Do Hands On Example p 386 (AFACTOR 2. SAS) � Olympic Data � This

Do Hands On Example p 386 (AFACTOR 2. SAS) � Olympic Data � This data set contains scores of 193 athletes who completed all 10 decathlon events in the 1988 through 2012 Olympic Games. � The 10 events in the decathlon are 100 -m run, long jump, shot put, high jump, 400 -m run, 100 -m hurdles, discus, pole vault, javelin, and 1500 -m run. � These events measure a wide variety of athletic ability, and in this example we use this decathlon data set to explore whethere are some underlying dimensions of athletic ability. � It should be noted that the "times" in the running events are given negative signs so that " larger" values are better than "smaller" values as is the case in the distance measurements 22 SAS ESSENTIALS -- Elliott & Woodward

Factor Analysis Code for Olympic Data PROC FACTOR SIMPLE CORR DATA MYSASLIB. OLYMPIC METHOD=PRINCIPAL

Factor Analysis Code for Olympic Data PROC FACTOR SIMPLE CORR DATA MYSASLIB. OLYMPIC METHOD=PRINCIPAL MSA PRIORS=SMC ROTATE=VARIMAX OUTSTAT=FACT ALL PLOTS=SCREE; VAR RUNl 0 LONGJUMP SHOTPUT HIGHJUMP RUN 400 HURDLES DISCUS POLEVAULT JAVELIN RUNl 500 S; RUN; 23 SAS ESSENTIALS -- Elliott & Woodward

Simple Statistics for Olympic Data � As mentioned earlier, times in the running events

Simple Statistics for Olympic Data � As mentioned earlier, times in the running events are given negative signs so that "larger" values are better than "smaller" values as is the case in the distance measurements. � Moreover, the 1500 -m results are given in (negative) seconds rather than the usual reporting of minutes and seconds. 24 SAS ESSENTIALS -- Elliott & Woodward

Correlations for Olympic Data � There are positive correlations between speed events such as

Correlations for Olympic Data � There are positive correlations between speed events such as the 100 -m run and 100 -m hurdles (0. 692) and between strength events SHOTPUT and DISCUS (0. 748). The 1500 -m run is not highly correlated with any of the other events. � 400 -m run (0. 368). X 25 SAS ESSENTIALS -- Elliott & Woodward

Communality Estimates, Olympic Data � Since we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the

Communality Estimates, Olympic Data � Since we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the principal factors method where the prior communality estimate for each variable is the squared multiple correlation of it with all other variables. This table shows the prior communality estimates (slightly rearranged from the original output) 26 SAS ESSENTIALS -- Elliott & Woodward

Eigenvalues for Olympic Data � See next slide… 27 SAS ESSENTIALS -- Elliott &

Eigenvalues for Olympic Data � See next slide… 27 SAS ESSENTIALS -- Elliott & Woodward

Eigenvalues for Olympic Data � The eigenvalues table shows factors based on the reduced

Eigenvalues for Olympic Data � The eigenvalues table shows factors based on the reduced correlation matrix. PROC FACTOR selected three factors. It is clear from the previous table and the Scree plot that there are three dominant eigenvalues. 28 SAS ESSENTIALS -- Elliott & Woodward

� The communalities in this table (rearranged slightly from, output) are the proportion of

� The communalities in this table (rearranged slightly from, output) are the proportion of the variance in each of the original variables retained after extracting the factors. � It seems that all 10 events are fairly well represented by the three factors, with all communalities above 0. 33. � However, HIGHJUMP, POLEVALULT, JAVELIN, and RUN 1500 S all having communalities below 0. 4. 29 SAS ESSENTIALS -- Elliott & Woodward

Factor Patterns 30 � As was the case for the unrotated solution for the

Factor Patterns 30 � As was the case for the unrotated solution for the Intelligence Data, it can be seen that Factor 1 has a positive coefficient, all of which are above 0. 4 except for RUN 1500 S, which has a coefficient of 0. 17. � A reasonable interpretation is that Factor 1 measures overall athletic ability, primarily related to the first nine events. Factors 2 and 3 are more difficult to interpret. SAS ESSENTIALS -- Elliott & Woodward

 Use ROTATE=VARIMAX �Based on the confusing interpretations associated with the Three-Factor solutions given

Use ROTATE=VARIMAX �Based on the confusing interpretations associated with the Three-Factor solutions given in the previous table, we again use a rotation to produce more interpretable results. �Using the option ROTATE=VARIMAX results in the Rotated Factor Pattern Matrix given in in the following slide… 31 SAS ESSENTIALS -- Elliott & Woodward

� The first rotated factor Rotated Factor Patterns 32 seems to focus on events

� The first rotated factor Rotated Factor Patterns 32 seems to focus on events 100 -m long jump, 400 -m run, and 110 -m hurdles that involve speed and spring. � Factor 2 seems to be primarily an arm strength factor with high coefficients for shot put and long jump and lesser in javelin, pole vault, and high jump. � The only event with a large coefficient in Factor 3 is the 1500 -m hurdles. This is consistent the correlation matrix that suggested the 1500 -m run was "different" from the other events. SAS ESSENTIALS -- Elliott & Woodward

17. 2 SUMMARY � In this chapter, we have discussed methods for using PROC

17. 2 SUMMARY � In this chapter, we have discussed methods for using PROC FACTOR to perform exploratory factor analysis. In the Hands-on Examples, we have illustrated the use of rotation to obtain more understandable results. � Continue to Chapter 18: CREATING CUSTOM GRAPHS 33 SAS ESSENTIALS -- Elliott & Woodward

These slides are based on the book: Introduction to SAS Essentials Mastering SAS for

These slides are based on the book: Introduction to SAS Essentials Mastering SAS for Data Analytics, 2 nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: 111904216 X ISBN-13: 978 -1119042167 These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to acelliott@smu. edu. Thanks. 34 SAS ESSENTIALS -- Elliott & Woodward