Thinking About Longitudinal Data or Crosssectional v Longitudinal
‘Thinking About Longitudinal Data’ (or Cross-sectional v Longitudinal Analysis)
A glib claim that longitudinal data analysis is important because it permits insights into the processes of change is inadequate and certainly fails to convince many social science researchers who are concerned with substantive rather than methodological challenges. . .
What is required is an understanding on the limitations of cross-sectional analysis.
Cross-sectional V Longitudinal Data
Four Main Issues • • Age and Cohort Effects Direction of Causality State Dependence Residual Heterogeneity
Age And Cohort Effects
Should I buy a new car? An example…. .
Owner’s Experience Of Car Reliability Over The Last Twelve Months - Specific Model
What happens when these cars get older (ageing effects)?
Owner’s Experience Of Car Reliability Over The Last Twelve Months - Specific Model
The manufacture tells me that there is a cohort effect – The more recently made cars are now much more reliable than the ones made five years ago. Could this be true?
Owner’s Experience Of Car Reliability Over The Last Twelve Months - Specific Model
Cross-sectional data are completely uninformative as to whether age or cohort effects (or a combination of each) provide correct explanations. We would need longitudinal data to find out!
Direction Of Causality
There is unequivocal evidence from cross-sectional data that, overall, the unemployed have poorer health.
This is consistent with both a) unemployment causing ill health and b) ill health causing unemployment
Ill Health ? Unemployment
Ill Health Unemployment
If we had a cross-sectional survey that asked how long people had been unemployed and also their level of health, generally, we would find a negative relationship.
Negative – Lower levels of health for people who had been unemployed for longer.
This is consistent with a) unemployment causing ill health Ill Health Unemployment
HOWEVER………….
If ill health causes unemployment… then people with comparatively modest levels of ill health will tend to recover more quickly and return to work.
This is consistent with b) ill health causing unemployment Ill Health Unemployment
With the increasing duration of unemployment those with less severe ill health will be progressively under represented while those with more severe ill health will be over represented.
This is known as a‘sample selection bias’ and could therefore explain the crosssectional picture of declining ill health with duration of unemployment.
It is not possible to untangle this conundrum with cross-sectional data. Longitudinal data are required!
State Dependence
Past Behaviour Current Behaviour
Young People Aged 19 APRIL Unemployed Employed MAY Employed
Residual Heterogeneity (Omitted Explanatory Variables)
The advantage of longitudinal data over cross-sectional data is that it not only facilitates analysis between cases but also facilitates analysis within cases.
A simplified view of a difficult concept!
CROSS-SECTIONAL EXPLANATORY VARIABLES Person A OUTCOME Person B
My two hypothetical identical twin daughters – The Gayle sisters.
CROSS-SECTIONAL EXPLANATORY VARIABLES WENDY VARIABLE A 1 VARIABLE B 1 VARIABLE C 2 VARIABLE D 2 BELOWNA VARIABLE A 1 VARIABLE B 1 VARIABLE C 2 VARIABLE D 2 OUTCOME A
This is often called ‘between cases’ analysis.
There is no way of accounting for omitted explanatory variables in cross-sectional analysis.
CROSS-SECTIONAL EXPLANATORY VARIABLES WENDY VARIABLE A 1 VARIABLE B 1 VARIABLE C 2 VARIABLE D 2 BELOWNA VARIABLE A 1 VARIABLE B 1 VARIABLE C 2 VARIABLE D 2 OUTCOME A
LONGITUDINAL EXPLANATORY VARIABLES TIME POINT 1 TIME POINT 2 Person A Person B
This is often called ‘within cases’ analysis. There are techniques for accounting for omitted explanatory variables if we have data at more than one time point.
TIME 2 WENDY OUTCOME A VARIABLE A 1 VARIABLE B 1 VARIABLE C 2 VARIABLE D 2 BELOWNA VARIABLE A 1 VARIABLE B 1 VARIABLE C 2 VARIABLE D 2 OUTCOME B
HOW GOOD HAVE THE EXPLANATORY VARIABLES BEEN AS FAR AS HELPING US TO UNDERSTAND THE OUTCOME?
TIME 2 WENDY VARIABLE A 1 VARIABLE B 1 VARIABLE C 2 VARIABLE D 2 Unexplained BELOWNA VARIABLE A 1 VARIABLE B 1 VARIABLE C 2 VARIABLE D 2 Unexplained OUTCOME
RESIDUAL HETEROGENEITY WENDY VARIABLE A 1 VARIABLE B 1 VARIABLE C 2 VARIABLE D 2 Unexplained Unmeasured or Unmeasurable Variables Omitted Explanatory Variables
It is sometimes claimed that the main advantage of longitudinal data is that it facilitates improved control for the plethora of variables that are omitted from any analysis.
Because surveys fail to capture the detailed nature of social life there is, almost inevitably, considerable heterogeneity in response variables even amongst respondents that share the same characteristics across all of the explanatory variables.
The possibility of substantial variation between similar individuals due to unmeasured and possibly unmeasureable variables is known as ‘residual heterogeneity’.
BEWARE We can begin to see why crosssectional analysis might incorrectly estimate the effects of explanatory variables, and therefore result in misleading conclusions being drawn.
Cross-sectional V Longitudinal 0 4
Four Main Issues • • Age and Cohort Effects Direction of Causality State Dependence Residual Heterogeneity
THINKING ABOUT CHANGE • COHORT = A common group being studied. • AGE = Amount of time since cohort was constituted. • PERIOD = Moment of observation.
THREE YOUTH COHORT STUDIES AGE 16 AGE 17 18 19 20 21 (COHORT 1) 16 17 18 19 (COHORT 2) 16 17 (COHORT 3) We can study the effects of ‘age’ or ageing.
THREE YOUTH COHORT STUDIES AGE 16 AGE 17 18 19 20 21 (COHORT 1) 16 17 18 19 (COHORT 2) 16 17 (COHORT 3) We can study the effects of cohort.
THREE YOUTH COHORT STUDIES AGE 16 AGE 17 18 19 20 21 (COHORT 1) 16 17 18 19 (COHORT 2) 16 17 (COHORT 3) Period of low unemployment Period of high unemployment We can study the effects of period.
Beware – Age, Cohort and Period effects are often very hard to untangle – See the relevant literature to become frightened and confused!
Longitudinal data are not a panacea – there are problems
- Slides: 57