Factor Analysis Structural Equation Models 1 Sociology 8811

  • Slides: 28
Download presentation
Factor Analysis & Structural Equation Models 1 Sociology 8811, Class 28 Copyright © 2007

Factor Analysis & Structural Equation Models 1 Sociology 8811, Class 28 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Announcements • Paper #2 due today! • Schedule: Structural equation models • I’ll start

Announcements • Paper #2 due today! • Schedule: Structural equation models • I’ll start with related issue: • Factor Analysis • Path Models • Monday lab: • Factor analysis • Whatever else we can squeeze in (Path models, SEM) • NO graded lab assignment

Factor Analysis • Factor analysis is an exploratory tool • Often called “Exploratory Factor

Factor Analysis • Factor analysis is an exploratory tool • Often called “Exploratory Factor Analysis” • Helps identify simple patterns that underlie complex multivariate data – Not about hypothesis testing – Rather, it is more like data mining • And also helps us understand some principles of SEM – Note: Factor analysis is informally used to refer to two different methods • Factor analysis (FA) • Principle component analysis (PCA) • Differences aren’t critical here – I will focus on FA, which is most useful in understanding SEM – Most of lecture will apply to PCA.

Factor Analysis • The basic idea: FA seeks to identify a small number of

Factor Analysis • The basic idea: FA seeks to identify a small number of “underlying variables” that effectively summarize multivariate data • Ex: Suppose we have many political opinion variables – Approval of president; environmental views; etc. • Perhaps one unmeasured “factor” accounts for people’s positions on all those variables… – Ex: Liberalism vs. conservatism… • FA seeks to identify common patterns – But, it is up to the researcher to determine what the underlying pattern really means…

Factor Analysis: ‘Depression’ • Suppose we believe in a theoretical construct such as “depression”.

Factor Analysis: ‘Depression’ • Suppose we believe in a theoretical construct such as “depression”. • There is no single variable that perfectly measures it… but we believe it exists • Hypothetical questions: • HAPPY: How happy are you? (1 -10) • WORLDGOOD: How much do you agree with the statement that “The world is a good place”? (1 -5) • HOPELESS: Do you often feel hopeless? (1 -5) • SAD: Do you often feel sad? (1 -5) • TIRED: Do you often feel tired or discouraged? (1 -10)

Example: ‘Depression’ • Strategy 1: We could ask many questions & create an index

Example: ‘Depression’ • Strategy 1: We could ask many questions & create an index that combines all measures • Note: we would have to flip signs on some measures • “Happy” would have to be reversed to effectively measure ‘depression’ • Strategy 2: We could ask many questions and then conduct a factor analysis • To see if answers to questions exhibit an underlying pattern (which we could label “depression”).

Factor Analysis: Depression • Hypothetical results from a factor analysis: Happy World. Good Hopeless

Factor Analysis: Depression • Hypothetical results from a factor analysis: Happy World. Good Hopeless Sad Tired Factor Loadings Factor 1 Factor 2 -. 86 … -. 75. 92. 95. 71 … … A factor is a variable that explains lots of variance among the variables being analyzed (Happy, sad, hopeless, etc) Loadings are the correlation between each variable and the unobserved factor… The loadings tell you a lot about patterns of variation among cases… Notably: People who score high on “sad” & “hopeless” & “tired” tend to score very low on “happy” and “worldgood” and vice versa…

Factor Analysis: Depression • Issue: It is wholly up to the researcher to interpret

Factor Analysis: Depression • Issue: It is wholly up to the researcher to interpret the factors • We are just data mining… • To ascribe meaning to factors requires much careful thought – and is ideally informed by theory… Happy World. Good Hopeless Sad Tired Factor 1 -. 86 -. 75. 92. 95. 71 What might factor 1 represent? Does it seem like it captures “Depression”? Might it mean something else?

Factor Analysis: Depression • Factor analysis is agnostic to direction of factor variables… results

Factor Analysis: Depression • Factor analysis is agnostic to direction of factor variables… results might look like this: Happy World. Good Hopeless Sad Tired Factor 1. 86. 75 -. 92 -. 95 -. 71 For all intents & purposes, these results are identical… but flipped The factor is capturing the inverse of depression… (happiness? )

Factor Analysis • Things you can do with factor analysis: • 1. Examine factor

Factor Analysis • Things you can do with factor analysis: • 1. Examine factor loadings – Use them to interpret factors that are identified in the data • 2. Plot factor loadings – Vividly describe which variables “go together” (people score high on one tend to score high on another or vice versa) • 3. Compute factor scores – Estimate how individual cases score on underlying factors – How depressed is each case? • 4. Determine variation explained by factors – See which factors account for the major patterns in your data • 5. “Rotate” the factors – Modify them to enhance interpretability… Will discuss later.

FA Example: Civic Engagement • How do people participate in politics? • Do people

FA Example: Civic Engagement • How do people participate in politics? • Do people vary systematically in civic participation? • Is there such a thing as “civic engagement”? – A common pattern of behavior that appears in empirical data? – World Values Survey Data for USA: • • • Membership in civic groups Volunteering Participation in demonstrations Participation in strikes Participation in boycotts Sign petitions.

FA Example: Civic Engagement • Factor analysis of US civic participation. factor member volunteer

FA Example: Civic Engagement • Factor analysis of US civic participation. factor member volunteer petition boycott demonstrate strike occupybldg Factor analysis/correlation Method: principal factors Rotation: (unrotated) Number of obs = Retained factors = Number of params = 1110 3 18 -------------------------------------Factor | Eigenvalue Difference Proportion Cumulative -------+------------------------------Factor 1 | 1. 51105 0. 71238 0. 8319 Factor 2 | 0. 79867 0. 67994 0. 4397 1. 2717 Factor 3 | 0. 11872 0. 20190 0. 0654 1. 3370 Factor 4 | -0. 08318 0. 04249 -0. 0458 1. 2912 Factor 5 | -0. 12567 0. 05446 -0. 0692 1. 2221 Factor 6 | -0. 18013 0. 04305 -0. 0992 1. 1229 Factor 7 | -0. 22318. -0. 1229 1. 0000 -------------------------------------LR test: independent vs. saturated: chi 2(21) = 1405. 19 Prob>chi 2 = 0. 0000 Initial output describes process of factor extraction – identifying factors within the data. Stata identifies many factors (all possible patterns until it runs out of variation). But, only factors with large eigenvalues explain a lot…

FA Example: Civic Engagement • Output (cont’d) Factor loadings (pattern matrix) and unique variances

FA Example: Civic Engagement • Output (cont’d) Factor loadings (pattern matrix) and unique variances -----------------------------Variable | Factor 1 Factor 2 Factor 3 | Uniqueness -------+---------------+-------member | 0. 7111 -0. 5941 0. 0984 | 0. 1316 volunteer | 0. 6689 -0. 6450 0. 0939 | 0. 1278 petition | 0. 3485 0. 2288 -0. 6927 | 0. 3464 boycott | 0. 6350 0. 3756 -0. 2149 | 0. 4095 demonstrate | 0. 6210 0. 4021 -0. 1098 | 0. 4406 strike | 0. 4035 0. 4387 0. 4021 | 0. 4830 occupybldg | 0. 2698 0. 4038 0. 5597 | 0. 4509 ------------------------------ Next, stata reports the main factors it finds. Factor 1 explains most variation, others less… Factor 1 correlates with ALL measures of civic participation In other words, people tend to be high on all measures or low on all. Factor 2: Some people are LOW on membership & moderately high on demonstrations/strikes. Others are the converse… Is this “civic engagement”? Maybe some people are alienated or active in social movements?

FA Example: Civic Engagement • Output (cont’d) Factor loadings (pattern matrix) and unique variances

FA Example: Civic Engagement • Output (cont’d) Factor loadings (pattern matrix) and unique variances -----------------------------Variable | Factor 1 Factor 2 Factor 3 | Uniqueness -------+---------------+-------member | 0. 7111 -0. 5941 0. 0984 | 0. 1316 volunteer | 0. 6689 -0. 6450 0. 0939 | 0. 1278 petition | 0. 3485 0. 2288 -0. 6927 | 0. 3464 boycott | 0. 6350 0. 3756 -0. 2149 | 0. 4095 demonstrate | 0. 6210 0. 4021 -0. 1098 | 0. 4406 strike | 0. 4035 0. 4387 0. 4021 | 0. 4830 occupybldg | 0. 2698 0. 4038 0. 5597 | 0. 4509 ------------------------------ Factor 3 finds that some people engage in strikes/occupation of buildings but do not sign petitions. A bit hard to interpret… Focus your energies on first few factors that have big eigenvalues…

FA Example: Civic Engagement • A visual representation of factor loadings Command: “loadingplot” --

FA Example: Civic Engagement • A visual representation of factor loadings Command: “loadingplot” -- run after factor analysis Descriptive patterns emerge from the data Membership & volunteering go together… But are far from strikes, protests, etc.

Factor Rotation • Factors can be “rotated” • Rotation = recalculating them to maximize

Factor Rotation • Factors can be “rotated” • Rotation = recalculating them to maximize differences between them • This can improve interpretability of factors Rotated factor loadings (pattern matrix) and unique variances -----------------------------Variable | Factor 1 Factor 2 Factor 3 | Uniqueness -------+---------------+-------member | 0. 8061 0. 0974 0. 0139 | 0. 3405 volunteer | 0. 8055 0. 0377 -0. 0087 | 0. 3497 petition | 0. 0615 0. 3130 -0. 1456 | 0. 8771 boycott | 0. 1504 0. 5724 0. 0165 | 0. 6494 demonstrate | 0. 1358 0. 5614 0. 0671 | 0. 6619 strike | 0. 0371 0. 3536 0. 2421 | 0. 8150 occupybldg | -0. 0030 0. 2439 0. 2501 | 0. 8780 ------------------------------ Here, we see a clearer pattern… Factors 1 & 2 are more distinct. Factor 1 = civic membership; factor 2 = protest/social mvmts, etc…

FA Example: Civic Engagement • Let’s plot the rotated factor loadings: Pattern is similar

FA Example: Civic Engagement • Let’s plot the rotated factor loadings: Pattern is similar to unrotated… But, rotation moves variables closer to axes

Factor Scores • Factors = variables… • We can compute the value of them

Factor Scores • Factors = variables… • We can compute the value of them for a given case… • Ex: How high do I score on F 1 (depression)? • Stata syntax: “predict f 1 f 2 f 3…” – If you only want scores from first 2 factors, just list 2 variable names… – Note: If done after rotation, scores will be based on rotated factor loadings! Results will differ – This is a powerful way to create index variables… • Ex: Depression. You could sum several variables to create an index… • Or do a factor analysis and compute scores for a factor that appeared to reflect depression…

FA Example: Civic Engagement • Factor scores from some sample cases: . predict f

FA Example: Civic Engagement • Factor scores from some sample cases: . predict f 1 f 2 f 3 (regression scoring assumed) Scoring coefficients (method = regression; based on varimax rotated factors). list member volunteer f 1 f 2 1. 2. 3. 4. 5. 6. 8. 9. 12. 13. 14. 15. 16. +----------------------+ | member volunt~r f 1 f 2 | |----------------------| | 3 2. 3280279. 4303528 | | 1 0 -. 6338809 -. 305814 | | 3 3. 575327 -. 8480528 | | 5 5 1. 52282. 3150256 | | 7 3 1. 450748. 4064942 | | 4 4 1. 044003 -. 4640276 | | 0 0 -. 8484179. 5083777 | | 5 5 1. 523822 -. 9253936 | | 2 2. 1134908 1. 244545 | | 1 0 -. 6204671. 5076937 | | 5 4 1. 276523. 353012 | | 7 5 1. 956463 -. 4956342 | | 9 1 1. 374107 -. 3197608 | Cases that are high on membership & volunteering score very high on factor 1

FA Example: Civic Engagement • Factor scores can also be plotted This is most

FA Example: Civic Engagement • Factor scores can also be plotted This is most useful when you have a small number of cases… Ex: countries, which can be labeled on plot

Stata: Loadingplots & scoreplots • Notes: • 1. Plots can be done of all

Stata: Loadingplots & scoreplots • Notes: • 1. Plots can be done of all factors… – I’ve only showed first two… to keep things simple – Syntax: loadingplot, factors(3) • 2. Case labels can be useful on scoreplots – Scoreplot, mlabel(countryid) – Jitter can sometimes be useful, too… • 3. Some software allows “biplots” – Plotting loadings & scores together – Helps uncover patterns in data.

Example: Biplot • Cross-national data on civic participation Note that France falls near to

Example: Biplot • Cross-national data on civic participation Note that France falls near to activities like “strikes” US is nearer to mtot (memberhip)

Factor Analysis: Methods • There are MANY algorithms to extract & rotate factors •

Factor Analysis: Methods • There are MANY algorithms to extract & rotate factors • A thorough discussion is beyond the scope of this class • Some defaults (if you don’t choose): – SPSS: Principle components extraction, varimax rotation – Stata: Principle factors extraction; varimax rotation • Results can vary if you use different methods… – In practice, few people are skilled in choosing among methods… people mainly use defaults – I recommend trying multiple methods to ensure that results are robust…

Confirmatory Factor Analysis • Factor analysis is purely exploratory • It is data mining,

Confirmatory Factor Analysis • Factor analysis is purely exploratory • It is data mining, not a model • However, it is based on the idea that factors – which are unobserved – give rise to (i. e. , cause) variation on observed variables Depression Happy WGood Hopeless Sad Tired

Confirmatory Factor Analysis • Idea: Let’s imagine that depression is a latent variable •

Confirmatory Factor Analysis • Idea: Let’s imagine that depression is a latent variable • i. e. , a variable we can’t directly measure… but gives rise to observed patterns in things we can observe • Note: No observed variable perfectly measures the latent variable – There is error… – So, observed variables aren’t perfectly correlated with latent variable (even though they are “caused” by it)…

Confirmatory Factor Analysis • This forms the basis for a kind of model: Depression

Confirmatory Factor Analysis • This forms the basis for a kind of model: Depression Happy WGood Hopeless e e e Sad e Tired e

Confirmatory Factor Analysis • Idea: We can model real data based on those presumed

Confirmatory Factor Analysis • Idea: We can model real data based on those presumed relationships… • Estimate slope coefficients for each arrow – How do latent variables affect observed variables? • Examine overall model fit – How much does our theoretically-informed view of the world map onto observed data? – If model fits well, our concept of “depression” (and measurement strategy) are likely to be good • “Confirmatory” implies that we aren’t just “exploring” – Different from “exploratory factor analysis”… – Rather than data mining, we’re testing a theoretically-informed model.

SEM • Next step: Structural Equation Models (SEM) with Latent Variables • Once we’ve

SEM • Next step: Structural Equation Models (SEM) with Latent Variables • Once we’ve identified latent variables, it makes sense to analyze them! • We can develop models in which we estimate slopes relating latent variables… • This is particularly useful when we are interested in latent concepts that are difficult to measure with any single variable.