# Principal component analysis Strategy for solving problems Sample

• Slides: 66

Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis

Principal components factor analysis Ø Ø Obtaining a factor solution through principal components analysis is an iterative process that usually requires repeating the SPSS factor analysis procedure a number of times to reach a satisfactory solution. We begin by identifying a group of variables whose variance we believe can be represented more parsimoniously by a smaller set of factors, or components. The end result of the principal components analysis will tell us which variables can be represented by which components, and which variables should be retained as individual variables because the factor solution does not adequately represent their information.

Strategy for solving problems - 1 Ø A principal component factor analysis requires: Ø Ø Ø Ø The variables included must be metric level or dichotomous (dummy-coded) nominal level The sample size must be greater than 50 (preferably 100) The ratio of cases to variables must be 5 to 1 or larger The correlation matrix for the variables must contain 2 or more correlations of 0. 30 or greater Variables with measures of sampling adequacy less than 0. 50 must be removed The overall measure of sampling adequacy is 0. 50 or higher The Bartlett test of sphericity is statistically significant. The first phase of a principal component analysis is devoted to verifying that we meet these requirements. If we do not meet these requirements, factor analysis is not appropriate.

Strategy for solving problems - 2 Ø The second phase of a principal component factor analysis focuses on deriving a factor model, or pattern of relationships between variables and components, that satisfies the following requirements: Ø Ø The derived components explain 50% or more of the variance in each of the variables, i. e. have a communality greater than 0. 50 None of the variables have loadings, or correlations, of 0. 40 or higher for more than one component, i. e. do not have complex structure None of the components has only one variable in it To meet these requirements, we remove problematic variables from the analysis and repeat the principal component analysis procedure in SPSS.

Strategy for solving problems - 3 Ø If, at the conclusion of this process, we can substitute the components for the variables in further analyses if: Ø Ø the components have more than one variable loading on them, the components explain at least 50% of the variance in each of the included variables, and components that collectively explain more than 60% of the variance in the set of included variables. Variables that were removed in the analysis should be included individually in further analyses.

Substituting components for variables Ø Substitution of components for individual variables is accomplished by : Ø using only the highest loading variable in place of the other variables loading on the component, Ø or by combining the variables loading on each component to create a new variable.

Notes - 2 Ø Ø If there are two or more components in the component matrix, the pattern of loadings is based on the SPSS Rotated Component Matrix. If there is only one component in the solution, the Rotated Component Matrix is not computed, and the pattern of loadings is based on the Component Matrix. It is possible that the analysis will break down and we will have too few variables in the analysis to support the use of principal component analysis.

Question 1

Answer 1 To answer the first question, we examine the level of measurement for each variable listed in the problem to make certain each is metric or dichotomous. In this example, all variables satisfied the level of measurement requirement. We added a caution because we are treating ordinal variables as metric.

Question 2 To answer this question, we will compute the principal components analysis.

Computing a principal component analysis To compute a principal component analysis in SPSS, select the Data Reduction | Factor… command from the Analyze menu.

Add the variables to the analysis First, move the variables listed in the problem to the Variables list box. Second, click on the Descriptives… button to specify statistics to include in the output.

Compete the descriptives dialog box First, mark the Univariate descriptives checkbox to get a tally of valid cases. Second, keep the Initial solution checkbox to get the statistics needed to determine the number of factors to extract. Third, mark the Coefficients checkbox to get a correlation matrix, one of the outputs needed to assess the appropriateness of factor analysis for the variables. Sixth, click on the Continue button. Fifth, mark the Anti-image checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables. Fourth, mark the KMO and Bartlett’s test of sphericity checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables.

Select the extraction method First, click on the Extraction… button to specify statistics to include in the output. The extraction method refers to the mathematical method that SPSS uses to compute the factors or components.

Compete the extraction dialog box First, retain the default method Principal components. Second, click on the Continue button.

Select the rotation method First, click on the Rotation… button to specify statistics to include in the output. The rotation method refers to the mathematical method that SPSS rotate the axes in geometric space. This makes it easier to determine which variables are loaded on which components.

Compete the rotation dialog box First, mark the Varimax method as the type of rotation to used in the analysis. Second, click on the Continue button.

Complete the request for the analysis First, click on the OK button to request the output.

Sample size requirement: minimum number of cases The number of valid cases for this set of variables is 620. The preferred minimum sample size requirement of 100 valid cases is satisfied. While principal component analysis can be conducted on a sample that has fewer than 100 cases, but more than 50 cases, we should be cautious about its interpretation.

Sample size requirement: ratio of cases to variables The ratio of cases to variables in a principal component analysis should be at least 5 to 1. With 620 and 12 variables, the ratio of cases to variables is 51. 67 to 1, which exceeds the requirement for the ratio of cases to variables.

Question 3

Appropriateness of factor analysis: Presence of substantial correlations Principal components analysis requires that there be some correlations greater than 0. 30 (more than 1) between the variables included in the analysis. For this set of variables, there are 7 correlations in the matrix greater than 0. 30, satisfying this requirement. The correlations greater than 0. 30 are highlighted in yellow.

Appropriateness of factor analysis: Sampling adequacy of individual variables Principal component analysis requires that the Kaiser-Meyer-Olkin Measure of Sampling Adequacy be greater than 0. 50 for each individual variable as well as the set of variables. There are two anti-image matrices: the anti-image covariance matrix and the anti-image correlation matrix. We are interested in the anti -image correlation matrix. The Measure of Sampling Adequacy (MSA) is described at marvelous if it is 0. 90 or greater, meritorious if it is in the 0. 80's, middling if in the 0. 70's, mediocre if in the 0. 60's , miserable if in the 0. 50's, and unacceptable if below 0. 50.

Appropriateness of factor analysis: Sampling adequacy of individual variables SPSS locates the Measures of Sampling Adequacy are on the diagonal of the antiimage correlation matrix, highlighted in yellow. In our initial analysis, the MSA for the variable "importance of ethnic identity" [ethimp] was 0. 467. Since this is less than 0. 50, the variable should be removed from the principal component analysis.

Re-running the principal components analysis To re-run the analysis, click on the Dialog Recall button and select Factor Analysis from the pop-up menu. The dialog box from the last factor analysis run will be displayed.

Removing the variable First, highlight the variable to be removed, ethimp. Second, click on the button with the arrow pointing left to move the highlighted variable back to the list of variables.

Producing the revised output First, click on the OK button to request the revised output.

Appropriateness of factor analysis: Sampling adequacy of individual variables In the revised analysis, the MSA for the all of the variables is now greater than 0. 50, so we satisfy that requirement.

Appropriateness of factor analysis: Sample adequacy for set of variables In addition, the overall MSA for the set of variables included in the analysis was 0. 762, which exceeds the minimum requirement of 0. 50 for overall MSA. The eleven variables in the analysis satisfy this criteria for appropriateness of factor analysis.

Appropriateness of factor analysis: Bartlett test of sphericity Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity be less than the level of significance. The probability associated with the Bartlett test is p<0. 001, which satisfies this requirement. The variables now included in the analysis satisfy the screening criteria for the appropriateness of factor analysis. The next step is to determine the number of factors that should be included in the factor solution.

Question 4

Number of factors to extract The latent root criterion for number of factors to extract would indicate that there were 3 components to be extracted for these variables, since there were 3 eigenvalues greater than 1. 0 (3. 032, 1. 647, and 1. 272). In contrast, the cumulative proportion of variance criteria would require 4 components to satisfy the criterion of explaining 60% or more of the total variance in the original set of variables. A 4 component solution would explain 63. 131% of the total variance. Since the SPSS default is to extract the number of components indicated by the latent root criterion, our initial factor solution was based on the extraction of 3 components.

Answer 4 The question indicated that there were 2 components, but our output indicated 3, so the question is false.

Question 5

Evaluating communalities The first adjustment that we make to the factor solution is to examine the communalities. The communalities represent the proportion of the variance for each of the variables included in the analysis that is explained or accounted for by the components in the factor solution. The derived components should explain at least half of each original variable's variance, so the communality value for each variable should be 0. 50 or higher. If one or more variables have a value for communality that is less than 0. 50, the variable with the lowest communality should be excluded and the principal component analysis should be computed again.

Communality requiring variable removal Examination of the first principal components model extracted by SPSS resulted in the removal of the variable "agreement that harmony in US best achieved by ignoring ethnic differences" [ethignor] from the analysis. The communality for "agreement that harmony in US best achieved by ignoring ethnic differences" [ethignor] was. 260. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable, so this variable was removed from the analysis. While other variables in the analysis also had communalities lower than 0. 50, this variable was selected for removal because it had the lowest communality.

Answer 5 To remove ethignor from the analysis, we follow the same sequence of steps that we used to eliminate ethimp.

Question 6

Communality requiring variable removal Examination of the second principal components model extracted by SPSS resulted in the removal of the variable "agreement that ethnic minorities must better adapt to mainstream American culture" [ethadapt] from the analysis. The communality for "agreement that ethnic minorities must better adapt to mainstream American culture" [ethadapt] was. 338. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable, so this variable was removed from the analysis.

Answer 6 To remove ethadapt from the analysis, we follow the same sequence of steps that we used to eliminate ethimp.

Question 7

Communality requiring variable removal Examination of the third principal components model extracted by SPSS resulted in the removal of the variable "agreement that ethnic group members are similar to one another" [ethsame] from the analysis, not "feelings toward African Americans" [feelblks], . The communality for "agreement that ethnic group members are similar to one another" [ethsame] was. 368. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable, so this variable was removed from the analysis.

Answer 7 To remove ethsame from the analysis, we follow the same sequence of steps that we used to eliminate ethimp.

Satisfactory communalities for all variables Running the principal components analysis after removing the last variable produces a table of communalities where all are above 0. 50.

Variable loadings on components Once variables have been removed for low communalities, we examine the pattern of factor loadings (loadings greater than 0. 40) to make certain that each variable loads on one and only one component. This pattern is called simple structure and is an accurate description of this table. If a variable does not have simple structure, it is removed from the analysis. If we remove a variable for complex structure, we start back with examining communalities after we run the factor analysis again.

Single variable components If we end up with a component that contains only a single variable, the variable should be removed from the analysis. There is no advantage to using a single component to represent one variable. In this table of loadings, all three components have two or more variables loaded on them. If we remove a variable because it is the only one loading on a component, we start back with examining communalities after we run the factor analysis again.

Question 8

Answer 8 The variables which we removed in either the screening for suitability for factor analysis or in the extraction of factors should be used as individual variables in future analyses.

Question 9

Pattern of factor loadings The components and variables which they contain must match the table of component loadings in order for this question to be true. In this example, the pattern of loadings is correctly described.

Answer 9 The components and variables which they contain must match the table of component loadings in order for this question to be true. In this example, the pattern of loadings is correctly described.

Question 10

Cumulative percent of variance explained The components explain 69. 718% of the total variance in the variables which are included on the components. This percentage of variance explained satisfies the goal of explaining 60% or more of the total original variance in the variables. If the percentage of variance explained is less than 60%, we should attach a note of caution to our solution, since using the components as substitutes for the variables may not be all that useful.

Steps in answering questions about principal components analysis - 1 Question: Variables included satisfy level of measurement requirements? Are the variables included in the analysis metric or dichotomous? Yes True No Incorrect application of a statistic

Steps in answering questions about principal components analysis - 2 Question: Number of variables and cases satisfy sample size requirements? Is the ratio of cases to variables at least 5 to 1? No Incorrect application of a statistic Yes Is the number of valid cases 50 or more? No Incorrect application of a statistic Yes Is the number of valid cases 100 or more? Yes True No True with caution

Steps in answering questions about principal components analysis – 3 a Question: Available data satisfies suitability criteria for principal components analysis? Are there two or more correlations that are 0. 30 or greater? No False Yes Probability for Bartlett test of sphericity less than level of significance? Yes No False

Steps in answering questions about principal components analysis – 3 b Question: Available data satisfies suitability criteria for principal components analysis? Is the measure of sampling adequacy larger than 0. 50 for each variable? No Remove variable with lowest MSA and repeat analysis Yes Overall measure of sampling adequacy greater than 0. 50? Yes True No False

Steps in answering questions about principal components analysis - 4 Question: Number of components to be extracted initially? Correct number of eigenvalues > 1. 0? Yes True No False

Steps in answering questions about principal components analysis - 5 Question: Examination of outputs indicates a variable should be excluded from principal components analysis? Communality for a variable less than 0. 50? Yes True Remove variable with lowest communality and repeat analysis No Does any variables show complex structure (2+ loadings > 0. 40)? Yes True Remove variable with complex structure and lowest communality and repeat analysis True Remove single variable loading on component and repeat analysis No Does any of the components have one variable loading on it? No False Yes

Steps in answering questions about principal components analysis - 6 Question: Omitted variables to be included as individual variables in further analyses? Is the list of variables omitted from the analysis correct? Yes True No False

Steps in answering questions about principal components analysis - 7 Question: Components to be substituted for individual variables? Are the number of components and pattern of loadings correct? Yes True No False

Steps in answering questions about principal components analysis - 8 Question: Principal components solution explains satisfactory percentage of variance in included variables? Is the cumulative proportion of variance for variables 60% or higher? Yes True No True with caution