What is Quantitative Longitudinal Analysis Paul Lambert and
What is… Quantitative Longitudinal Analysis? Paul Lambert and Vernon Gayle University of Stirling Prepared for: National Centre for Research Methods, Research Methods Festival, St Catherine’s College, Oxford, 2 July 2008 www. longitudinal. stir. ac. uk July 2008: LDA 1
So what is quantitative longitudinal analysis? You already know. . Ø Working with (survey) datasets with longitudinal information (data about time) and the specialist techniques of statistical analysis that are appropriate You maybe don’t realise. . Ø Complex data and data management components Ø Groups of techniques and data types Ø Reasons why longitudinal analysis is advocated July 2008: LDA 2
Quantitative longitudinal research in the social sciences • Survey resources Data analysis is used to give a parsimonious summary of patterns of relations between variables in the survey dataset • Longitudinal – Data concerned with more than one time point • [e. g. Taris 2000; Blossfeld and Rohwer 2002] – Repeated measures over time • [e. g. Menard 2002; Martin et al 2006] July 2008: LDA 3
Motivations for Qn. LA • Focus on change / stability • Focus on the life course Ø Distinguish age, period and cohort effects Ø Career trajectories / life course sequences • Focus on time / durations Ø Substantive role of durations (e. g. Unemployment) • Getting the ‘full picture’ Ø Causality and residual heterogeneity Ø Examining multivariate relationships Ø Representative conclusions [e. g. Abbott 2006; Mayer 2005; Menard 2002; Baltagi 2001; Rose 2000; Dale and Davies 1994; Hannan and Tuma 1979; Moser 1958] July 2008: LDA 4
What’s exciting about quantitative longitudinal analysis? • A personal view: By and large, the analytical & methodological issues aren’t so exciting: they have mostly been known about for some time What is exciting is the rapid expansion of secondary quantitative longitudinal data, its quality, its volume and its accessibility (a) - new data (b) - new ways of accessing existing data July 2008: LDA 5
Some comments on quantitative longitudinal analysis • Working with secondary surveys – Expense of long-term data collection – Complex data files, need good habits in data management (using syntax) – In practical terms – lots of gruelling computer programming • Resources for supporting researchers Easy access to data, e. g. – http: //www. data-archive. ac. uk/ – http: //www. esds. ac. uk/longitudinal/ Training in relevant analytical methods and data management, e. g. – http: //www. longitudinal. stir. ac. uk/ • Distinctive research traditions and research centres. . July 2008: LDA 6
Research traditions (methodology) 1) Statistical methods for quantitative longitudinal data • [esp. Dale & Davies 1994] 2) Research on data quality – Variable constructions in longitudinal research • • www. longitudinal. stir. ac. uk/variables/ Harmonisation, standardisation, comparability – Missing data and attrition July 2008: LDA 7
Research traditions (applications) • ‘geographers study space and economists study time’ [adage quoted in Fotheringham et al. 2000: 245] Ø Vast economics literature using techniques for temporal analysis Ø Other social science disciplines are mostly catching up Ø. . we’ll come back to geography later • Data expansions c 1990 -> new substantive applications areas – – For example: [Platt 2005] - ethnic minorities’ social mobility 1971 -2001 [Pahl & Pevalin 2005] – Friendship patterns over time [Verbakel & de Graaf 2008] – spouses effect on careers 1941 -2003 • Here, one critical challenge is getting used to talking about time in a more disciplined way: e. g. traditional sociological characterisations of ‘the past’ and ‘social change’ may not be empirically satisfactory July 2008: LDA 8
Some detail: Five traditions in Quantitative Longitudinal Analysis cf. www. longitudinal. stir. ac. uk 1. Repeated cross-sections 2. Panel datasets 3. Cohort studies 4. Event history datasets 5. Time series analyses July 2008: LDA 9
Repeated cross-sectional data: in soc. sci. , the most widely used longitudinal analysis Survey 1 1 1 2 2 3 3 3 N_s=3 Person 1 2 3 4 5 6 7 8 N_c=8 Person-level Vars 1 38 1 2 34 2 2 6 1 45 1 2 41 1 1 20 2 1 25 2 1 20 1 1 2 3 1 2 2 1 10
Repeated cross sections ü Easy to communicate & appealing: how things have changed between certain time points ü Partially distinguishes age / period / cohort ü Easier to analyse – less data management However. . L Don’t get other Qn. LR attractions (nature of changers; residual heterogeneity; causality; durations) L Hidden complications: are sampling methods, variable operationalisations really comparable? § cf. http: //www. longitudinal. stir. ac. uk/variables/ => measures are more often robust than not. . . July 2008: LDA 11
Example 1. 1: UK Census • Directly access aggregate statistics from census reports, books or web, e. g. : Wales: Proportion able to speak Welsh Year % 1891 1981 1991 2001 54 19 19 21 • Census v’s Surveys: larger scale surveys often have more data points and more reliable measures July 2008: LDA 12
Example 1. 2 i: Labour Force Survey yearly stats Percent of UK workers with a higher degree, by employment category and gender (m / f ) Sample size ~35, 000 m / 30, 000 f each year Profess. Non-Prof 1991 14. 4 1. 3 11. 0 0. 6 1996 19. 9 2. 5 24. 4 2. 3 July 2008: LDA 2001 24. 9 3. 5 28. 3 3. 2 13
Example 1. 2 ii: LFS and time July 2008: LDA 14
Five traditions in Quantitative Longitudinal Analysis 1. Repeated cross-sections 2. Panel datasets 3. Cohort studies 4. Event history datasets 5. Time series analyses July 2008: LDA 15
Panel Datasets Information collected on the same cases at more than one point in time – ‘classic’ longitudinal design – incorporates ‘follow-up’, ‘repeated measures’, and ‘cohort’ – Large and small scale panels are common July 2008: LDA 16
Illustration: Unbalanced panel Wave* 1 1 1 2 2 3 3 3 N_w=3 Person 1 2 3 N_p=3 Person-level Vars 1 38 1 36 2 34 2 0 2 6 9 1 38 2 35 1 16 1 40 1 36 2 36 1 18 2 8 9 *also ‘sweep’, ‘contact’, . . 17
Panel data advantages • Study ‘changers’ – how many of them, what are they like, what caused change • Control for individuals’ unknown characteristics (‘residual heterogeneity’) • Develop a full and reliable life history – e. g. family formation, employment patterns July 2008: LDA 18
Challenges for Panel data analysis • Complex data analysis and data management • need for training & good habits (syntax programming) • Data issues • confidential data; time lag until most useful data • variable constructions and comparability • Unbalanced panels and attrition – Balanced data is still required for many analytical techniques • transition tables; dynamic effects; trajectory profiles – Unbalanced cases and attrition as missing data • Complete case analysis = ‘MCAR’ • Ad hoc methods and imputatin • Missing data models, e. g. www. missingdata. org. uk July 2008: LDA 19
Example 2. 1: Panel transitions Young people’s household circumstance changes by subjective well-being between 1994 and 1995. BHPS youth panel, 11 -14 yrs in 1994, row percents. Stays Cheers Becomes Stays happy up miserable N HH Stable 54% 19% 10% 18% 499 HH Changes 42% 22% 14% 22% 81 July 2008: LDA 20
Analytical approaches Panel data models: Yit = ΒXit + … + Є Cases i Year t Variables 1 1 2 3 1 17 18 19 1 2 2 1 1 - 2 2 3 1 2 2 1 1 2 17 18 20 1 1 2 3 1 2 21
Panel data model types • Fixed and random effects – • Ways of estimating panel regressions Growth curves – • Time effect in panel regression (cf. multilevel models) Dynamic Lag-effects models – Theoretically appealing. . . Analytically complex and often need advanced or specialist software Ø Ø Econometrics literature Stata / GLLAMM; R; S-PLUS; SABRE / GLIM; LIMDEP; MLwi. N; MPLUS; … July 2008: LDA 22
Example 2. 2: Panel model July 2008: LDA 23
Five traditions in Quantitative Longitudinal Analysis 1. Repeated cross-sections 2. Panel datasets 3. Cohort studies 4. Event history datasets 5. Time series analyses July 2008: LDA 24
Cohort Datasets Information on a group of cases which share a common circumstance, collected repeatedly as they progress through a life course – Intuitive type of repeated contact data – e. g. ‘ 7 -up’ series − Cohort comparisons − e. g. UK Birth cohort studies in 1946, 1958, 1970 and 2000 July 2008: LDA 25
Cohort data and analysis in the social sciences • Many circumstances parallel other panel types: Ø Large scale studies ambitious & expensive Ø Small scale cohorts still quite common… v Attrition problems often more severe v Considerable study duration limits v Glenn (2005) argues that ‘cohort analysis’ should be specifically directed to understanding effects of ageing/progression over time • Other uses of cohort data are just = panel data • It remains hard - even with extensive cohort data - to authoritatively understand ageing effects (age = period – cohort) July 2008: LDA 26
Five traditions in Quantitative Longitudinal Analysis 1. Repeated cross-sections 2. Panel datasets 3. Cohort studies 4. Event history datasets 5. Time series analyses July 2008: LDA 27
Event history data analysis [esp. Blossfeld et al 2007] Focus shifts to length of time in a ‘state’ analyses determinants of time in state • Alternative data sources: – Panel / cohort (more reliable) – Retrospective (cheaper, but recall errors) • Aka: ‘Survival data analysis’; ‘Failure time analysis’; ‘hazards’; ‘risks’; . . July 2008: LDA 28
Event histories differ: • In form of dataset (cases are spells of time in a state) Ø Raises data management challenges Ø Comment: data analysis techniques are not well suited to complex variates; some argue than many Event History applications are artificially simplistic in their variables • In types of analytical method Ø Many techniques are new (and/or not well known), and specialist software may be needed v Time to labour market transitions v Time to recidivism v Time to end of cohabitation July 2008: LDA 29
Key to event histories is ‘state space’ July 2008: LDA 30
July 2008: LDA 31
July 2008: LDA 32
Event history analysis software SPSS – limited analysis options Stata – wide range of pre-prepared methods SAS – as Stata S-Plus/R – vast capacity but non-introductory GLIM / SABRE – some unique options TDA – simple but powerful freeware MLwi. N; l. EM; {others} – small packages targeted at specific analysis situations July 2008: LDA 33
Eg 4. 1 : Kaplan-Meir survival July 2008: LDA 34
Eg 4. 2: Cox’s regression July 2008: LDA 35
Sequences / Trajectories: characterise event history progression through states into clusters / sequences / frameworks • Growing recent social science interest Optimal matching analysis / Correspondence analysis / log-linear models / Latent growth curves • Often analyse membership of cluster as an outcome (Problem – neutrality of data, e. g. , cluster 1= Men in full time employment) July 2008: LDA 36
Five traditions in Quantitative Longitudinal Analysis 1. Repeated cross-sections 2. Panel datasets 3. Cohort studies 4. Event history datasets 5. Time series analyses July 2008: LDA 37
Time series data Statistical summary of one particular concept, collected at repeated time points from one or more subjects Examples: • Unemployment rates by year in UK • University entrance rates by year by country Comment: – Panel = many variables few time points = ‘cross-sectional time series’ to economists – Time series = few variables, many time points July 2008: LDA 38
Time Series Analysis i) Descriptive analyses – charts / text commentaries on values by time periods and different groups Widely used (=Repeated X-Sectional analysis) – ii) Time Series statistical models Advanced methods of modelling data are possible, require specialist stats functions • – Autoregressive functions: Yt = Yt-1 + Xt + e Widely employed in business / economics, but limited use in other disciplines July 2008: LDA 39
1. Repeated cross-sections 2. Panel datasets 3. Cohort studies 4. Event history datasets 5. Time series analyses …. Phew! July 2008: LDA 40
Summary: Quantitative Longitudinal Analysis 1) Pro’s and cons to Qn. L research: : i. Appealing analytical possibilities: e. g. analysis of change, controls for residual heterogeneity ii. Pragmatic constraints: data access, management, & analytical methods; practical applications often over-simplify topics iii. Uneven penetration of applications between research fields at present July 2008: LDA 41
Summary: Quantitative Longitudinal Analysis 2) Undertaking Qn. L research: i. Needs a bit of effort: learn software syntax, data management routines – workshops and training facilities available; exploit UK networks ii. Remain substantively driven: dangers of ’methodolatry’ (applications ‘forced’ into favourite complex techniques) and simplification (simpler techniques in the more popular & influential reports) iii. Learn by doing (. . with Stata syntax examples. . !) July 2008: LDA 42
Summary: Quantitative Longitudinal Analysis 3) Some speculation on the future i. Process of mainstreaming QLA into social science discourses (so we all need to know ‘what is’!!) ii. Complex multi-process models: new data & software for complex longitudinal statistical models iii. More new longitudinal data resources i. More and more micro-data (e. g. UKHLS) ii. Data linking (e. g. administrative datasets) iii. Geographical data over time July 2008: LDA 43
References • • • Abbott, A. 2006. 'Mobility: What? When? How? ' in Morgan, S. L. , Grusky, D. B. and Fields, G. S. (eds. ) Mobility and Inequality. Stanford: Stanford University Press. Baltagi, B. H. 2001. Econometric Analysis of Panel Data. New York: Wiley. Blossfeld, H. P. and Rohwer, G. 2002. Techniques of Event History Modelling: New Approaches to Causal Analysis, 2 nd Edition. Mawah, NJ: Lawrence Erlbaum Associates. • Blossfeld, H. P. , Grolsch, K. , & Rohwer, G. (2007). Event History Analysis with Stata. New York: Lawrence Erlbaum • Davies, R. B. 1994. 'From Cross-Sectional to Longitudinal Analysis' in Dale, A. and Davies, R. B. (eds. ) Analysing Social and Political Change : A casebook of methods. London: Sage. Fotheringham, A. S. , Brunsdon, C. , & Charlton, M. (2000). Quantitative Geography: Perspectives on Spatial Data Analysis. London: Sage. Glenn, N. D. (2005). Cohort Analysis, 2 nd Edition. London: Sage. Hannan, M. T. , & Tuma, N. B. (1979). Methods for Temporal Analysis. Annual Review of Sociology, 5, 303 -328. Lambert, P. S. , Prandy, K. and Bottero, W. 2007. 'By Slow Degrees: Two Centuries of Social Reproduction and Mobility in Britain'. Sociological Research Online 12. Martin, J. , Bynner, J. , Kalton, G. , Boyle, P. , Goldstein, H. , Gayle, V. , Parsons, S. and Piesse, A. 2006. Strategic Review of Panel and Cohort Studies. London: Longview, and www. longviewuk. com/ Mayer, K. U. 2005. 'Life courses and life chances in a comparative perspective' in Svallfors, S. (ed. ) Analyzing Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press. Menard, S. 2002. Longitudinal Research, 2 nd Edition. London: Sage, Number 76 in Quantitative Applications in the Social Sciences Series. Moser, C. A. (1958). Survey Methods in Social Investigation. London: Heinemann. Pahl, R. , & Pevalin, D. (2005). Between family and friends: a longitudinal study of friendship choice. British Journal of Sociology, 56(3), 433 -450. Platt, L. (2005). Migration and Social Mobility: The Life Chances of Britain's Minority Ethnic Communities. Bristol: The Policy Press. Rose, D. 2000. 'Researching Social and Economic Change: The Uses of Household Panel Studies'. London: Routledge. Taris, T. W. 2000. A Primer in Longitudinal Data Analysis. London: Sage. Verbakel, E. , & de Graaf, P. M. (2008). Resources of the Partner: Support or Restriction in the Occupational Career Developments in the Netherlands Between 1940 and 2003. European Sociological Review, 24(1), 81 -95. • • • •
- Slides: 44