Structural equation models and confirmatory factor analysis in

  • Slides: 21
Download presentation
Structural equation models and confirmatory factor analysis in small samples: theory and applications Andrej

Structural equation models and confirmatory factor analysis in small samples: theory and applications Andrej Srakar, Ph. D Institute for Economic Research, Ljubljana and Faculty of Economics, University of Ljubljana, Slovenia 1

Structure of the presentation • • • Research problem and literature review Data and

Structure of the presentation • • • Research problem and literature review Data and variables Methodology Results – factors, indexes, clustering Results – dynamic SEM Conclusion and discussion 2

Main objective and research questions • In recent years, many discussions have been led

Main objective and research questions • In recent years, many discussions have been led over the construction of composite indicators. • Yet, in our best knowledge, there have been very few (if any) attempts to provide a method of construction in the presence of limited number of units (e. g. macroeconomic data). • Two main research questions: 1) Can we use small sample (high-dimensionality) corrections to construct an index of sustainable development (and is the method applicable to similar indexes as well)? 2) Can similar small sample corrections be implemented for validation of the index using CFA (and SEM models on this data in general)? 3

Short literature review • Literature on composite indicators: • • Brancato and Simeoni (2008):

Short literature review • Literature on composite indicators: • • Brancato and Simeoni (2008): capacity of standard quality indicators to reflect quality components and overall quality, using structural equation models. Cecconi, Polidoro and Ricci (2004): methodological approach to synthesizing basic indicators in order to compare territorial data collection quality, for the Italian consumer price survey. Munda and Nardo (2006): consistency between the mathematical aggregation rule, used to construct composite indicators and the meaning of weights. Nardo, Saisana, Saltelli, Tarantola, Hoffman and Giovannini (2008): OECD handbook i. e. a guide on constructing and using composite indicators, with a focus on composite indicators which compare and rank countries’ performances. Polidoro, Ricci and Sgamba (2006): novel methodology that expands on the methods detailed in Cecconi, Polidoro and Ricci (2004). Smith and Weir (2000): how to obtain some overall measure of quality by considering quality as a multivariate measure for any dataset, where each quality indicator represents one dimension of quality. Cherchye and colleagues (2008; 2009): propose developments of composite indicators with imprecise data and using DEA analysis. 4

Steps in the construction of composite indicators (OECD, 2008) 1) 2) 3) 4) 5)

Steps in the construction of composite indicators (OECD, 2008) 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) Theoretical framework. Data selection. Imputation of missing data. Multivariate analysis. An exploratory analysis should investigate the overall structure of the indicators, assess the suitability of the data set and explain the methodological choices, e. g. weighting, aggregation. Normalisation. Indicators should be normalised to render them comparable. Attention needs to be paid to extreme values as they may influence subsequent steps in the process of building a composite indicator. Skewed data should also be identified and accounted for. Weighting and aggregation. Indicators should be aggregated and weighted according to the underlying theoretical framework. Correlation and compensability issues among indicators need to considered and either be corrected for or treated as features of the phenomenon that need to retained in the analysis. Robustness and sensitivity. Analysis should be undertaken to assess the robustness of the composite indicator in terms of, e. g. , the mechanism for including or excluding single indicators, the normalisation scheme, the imputation of missing data, the choice of weights and the aggregation method. Back to the real data. Composite indicators should be transparent and fit to be decomposed into their underlying indicators or values. Links to other variables. Attempts should be made to correlate the composite indicator with other published indicators, as well as to identify linkages through regressions. Presentation and Visualisation. Composite indicators can be visualised or presented in a number of different ways, which can influence their interpretation.

Short literature review – Dynamic SEM • Allison (2014) and Moral-Benito (2013) claim that

Short literature review – Dynamic SEM • Allison (2014) and Moral-Benito (2013) claim that the dynamic SEM approach has several advantages over both GMM methods and previous ML methods: – there is no “incidental parameters” problem – initial conditions are treated as completely exogenous and do not need to be modeled – no difficulties arise when the autoregressive parameter is at or near 1. 0 – missing data are easily handled by full-information maximum likelihood – coefficients can be estimated for time-invariant predictors. (The A-B method cannot do this because it uses difference scores which causes all time-invariant variables to drop out) – many model constraints can be easily relaxed and/or tested – It is well known that likelihood-based approaches (ML) are preferred to method-ofmoments (GMM) counterparts in terms of finite-sample performance (see Anderson, Kunitomo, and Sawa 1982), and that ML is more efficient than GMM under normality. Moral-Benito (2013) compares the widely-used panel GMM estimator of Arellano-Bond (1991) with its likelihood-based counterpart and confirms these results in the case of dynamic panel models with predetermined regressors. • We use Dynamic SEM as implemented in Stata 6

Data and method – SDI • Eurostat Sustainable Development Indicators (SDI) - The indicator

Data and method – SDI • Eurostat Sustainable Development Indicators (SDI) - The indicator framework covers 10 thematic areas belonging to the economic, the social, the environmental, the global and the institutional dimensions. We include 123 indicators, listed on next slide. - Socioeconomic development; - Sustainable consumption and production; - Social inclusion; - Demographic changes; - Public health; - Climate change and energy; - Sustainable transport; - Natural resources; - Global partnership; 7 - Good governance.

8

8

Method - imputation • First phase: Dealing with missing data • Multiple imputations using

Method - imputation • First phase: Dealing with missing data • Multiple imputations using Markov chains method (FCS, see van Buuren et al. 2006) which allows simultaneous imputation of variables with missing values • The method is dependent upon ordering of variables, our strategy was therefore to always put „substance“ indicators before the general development indicators in the imputation equations

Methodology – ordinary and MHRM factor analysis • Second phase: regular and rotated factor

Methodology – ordinary and MHRM factor analysis • Second phase: regular and rotated factor analysis upon standardised set of data: transformation of the variables into quartiles • Results of EFA showed that we can speak about at most 5 factors • Main statistical problem: small sample (much fewer units than variables) • Some solutions in the literature: bootstrap correction (Fisher et al. 2014); corrected principal components estimator (Bai 2002) / maximum likelihood estimator (Bai & Li 2012); high-dimensional algorithms (Cai 2010 a; Cai 2010 b; Asparouhov & Muthén, 2012) • We use Metropolis-Hastings Robbins-Monro (MHRM) algorithm to adjust for the high-dimensionality of the dataset – see e. g. Cai 2010 a; Cai 2010 b • Related algorithms: Bock and Aitkin’s (1981) EM algorithm; Joint Maximum Likelihood (JML; see Baker & Kim, 2004); SAEM algorithm (see e. g. Mc. Cullagh & Nelder, 1989); Gu and Kong’s (1998) stochastic approximation Newton–Raphson algorithm; Monte Carlo Newton–Raphson algorithm (MCNR; Mc. Culloch & Searle, 2001); Lange’s (1995) gradient algorithm; Titterington’s (1984) algorithm for incomplete data estimation • Finally: hierarchical clustering, strenghtened by non-hierarchical K-means method

Methodology – Metropolis-Hastings Robbins-Monro algorithm

Methodology – Metropolis-Hastings Robbins-Monro algorithm

Indexes estimation • Calculation of indexes: exploiting the fact that factors are by definition

Indexes estimation • Calculation of indexes: exploiting the fact that factors are by definition standardised normal variables • Index = ((Factor + 3) / 6) × 100 • transform the factors by adding 3 to each value (making them positive in approximately 99. 86% cases) • dividing their values by 6 (which is the range of the factor in 99. 73% cases) • multiplying by 100 to get the conventional scales of the index values • Calculation of the final index: modification of Fernando, Samita & Abeynayake (2012) – calculation of PCA and taking the average of first two principal components

Results: Explor. factor analysis vs. MHRM 5 general factors: • General development and governance

Results: Explor. factor analysis vs. MHRM 5 general factors: • General development and governance issues • Employment • Poverty and social in/exclusion • Health care, environment and energy consumption 13

Transformation into a single index 14

Transformation into a single index 14

Clustering analysis, final groupings

Clustering analysis, final groupings

Clustering analysis, final groupings (1) Eastern European countries: Bulgaria, Hungary, Poland, Romania, Slovakia, Croatia

Clustering analysis, final groupings (1) Eastern European countries: Bulgaria, Hungary, Poland, Romania, Slovakia, Croatia (2) Baltic countries: Estonia, Latvia, Lithuania (3) „Mediterranean“ countries: Greece, Portugal (4) „Medium developed“ Western countries: Belgium, Ireland, Austria (5) Large Western countries: Germany, Spain, France, Italy, United Kingdom (6) Best achievers: Denmark, Luxembourg, Netherlands, Finland, Sweden (7) Outliers: Czech Republic, Cyprus, Malta, Slovenia

A Dynamic SEM Stata model ---------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]

A Dynamic SEM Stata model ---------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------+--------------------------------index 2 | index 1 |. 4714223. 0403507 11. 68 0. 000. 3923365. 5505082 lnbdp 1 |. 1979494. 0098281 20. 14 0. 000. 1786866. 2172122 d 1 |. 5274382. 3873893 1. 36 0. 173 -. 2318308 1. 286707 d 2 | 1. 112265. 1922060 5. 79 0. 000. 7355486 1. 488982 ---------------------------------------Number of units = 252. Number of periods = 9. LR test of model vs. saturated: chi 2(71) = 110. 23, Prob > chi 2 = 0. 0020 RMSEA = 0. 065 SRMR = 0. 071 CFI = 0. 861 TLI = 0. 819

Discussion of the main findings Factor analysis and indexes construction: • Clear set of

Discussion of the main findings Factor analysis and indexes construction: • Clear set of 4 -5 dimensions: economic development clusters into 2 dimensions (general economic development; and employment); social inclusion is a clear and separate component; environmental dimension cluster separately and in 1 -2 dimensions; good governance does not cluster as a separate dimension • Absence of a systematic difference between EFA and MHRM scores, furthemore the difference is not large • In the level of indexes, clearly the countries of Socialdemocracti (Scandinavian) and Continental regime score best while Eastern European score worst 18

Discussion of the main findings • Seven main clusters: (1) Eastern European countries; (2)

Discussion of the main findings • Seven main clusters: (1) Eastern European countries; (2) Baltic countries; (3) „Mediterranean“ countries; (4) „Medium developed“ Western countries; (5) Large Western countries; (6) Best achievers; (7) Outliers • The clusters do not differ significantly whether we take ranks or indexes • Most Mediterranean countries (Spain, Italy, France) score closer to the Western countries • Liberal and/or Socialdemocratic countries do not from a clearly separate cluster • Eastern European countries have a subcluster of Baltic countries • Several outliers – Malta, Cyprus, Slovenia, Czech Republic: they score better than Eastern and/or Mediterranean countries but slightly worse than the Western regime • The dynamic SEM model confirms the validity of the construction and 19 the path dependency of the index

Conclusion and discussion Scientific relevance: 1) New index of sustainable development, using sophisticated statistical

Conclusion and discussion Scientific relevance: 1) New index of sustainable development, using sophisticated statistical methodology. 2) Solution to the issue of small sample problem, frequently present in similar (macro-data based) indexes. 3) Empirical construction of clusters/groups of countries in the level of sustainable development, supporting some of the claims in the literature – has to be developed and tested in future. Limitations and future research: 1) Improvements in statistical methodology, improving particularly the fit of the model (CFA) and distribution of the index values (MHRM might not be the best method due to very limited sample, see Cai 2010 a) 2) Comparison with different weighting methods and different sets of used SDI indicators 3) Validation with different indexes of sustainable development and on the basis of worldwide data 4) Testing the methodology also for other indexes – for some applications see Srakar, Verbic & Copic (2015; 2016) and Srakar & Vecco (2016) 20

THANK YOU FOR LISTENING AND COMMENTS! andrej. srakar@ier. si andrej_srakar@t-2. net 21

THANK YOU FOR LISTENING AND COMMENTS! andrej. srakar@ier. si andrej_srakar@t-2. net 21