Twostage sampling JF Boivin Version 14 November 2007
Two-stage sampling JF Boivin Version 14 November 2007 S: BOIVIN695Winter 2007Two-stage Sampling. ppt 1
1980 s-1990 s: Progress in use of administrative drug databases 2
Advantages • Large • Population-based • Valid prescription data • Long-time periods 3
Disadvantages • Missing data on certain outcomes • Temporal sequence not always clear Glucocorticoids cataracts Cataract surgery glucocorticoids • Lack of data on confounders 4
NSAIDs and breast cancer 5
Previous research • Poor exposure data Dose Duration Self-reports • Small numbers • Short follow-up • Inadequate control of confounding 6
NSAIDs and breast cancer • Cases: Saskatchewan cancer registry • Controls: Saskatchewan population • Drug exposure: 15 yr of computerized information • Missing: - Over the counter drugs - Other confounding factors: • Menarche • Menopause • Pregnancies • Obesity 7
Entire population (= truth) Obese cancer no cancer E+ E− 2 000 40 2 040 10 000 10 100 E+ E− 200 400 600 10 000 20 000 OR=0. 5 E+ E− 2 200 440 2 640 20 000 10 100 30 100 OR=2. 5 Not obese OR=0. 5 All 32 740 8
Obese cancer no cancer E+ E− Not obese not available E+ E− All E+ 2 200 E− 440 2 640 20 000 10 100 30 100 computerized databases 9
What to do about missing confounder data? 10
Option #1 Do not conduct research on that topic 11
Option #2 Cohort or case-control study without data on confounder Obese women cancer no cancer E+ E− ? ? Not obese All women E+ E− 2 200 440 20 000 10 100 32 740 12
Advantages • Cheaper • May be scientifically reasonable for certain questions 13
Option #3 Collect covariate data on a sample of the study subjects • • • two-stage samples three-stage samples partial questionnaire case series only etc. 14
Two-stage sample Sampling approaches: • simple random • balanced • etc. 15
Two-stage balanced design Obese cancer no cancer E+ E− 227 23 125 2 E+ E− 23 227 125 248 Not obese All E+ 250/ 2 200 250/ 20 000 E− 250/ 440 250/ 10 000 32 740 (I) 16
White JE. A two-stage design for the study of the relationship between a rare exposure and a rare disease. AJE 1982 Cain KC, Breslow NE. Logistic regression analysis and efficient design for two-stage studies. AJE 1988 17
Consent for interviews Cases : Controls 49% : 39% (Sharpe et al. Saskatchewan study) 18
Other related sampling designs • three-stage sampling • partial questionnaire • confounder data on cases only 19
Confounded data on cases only Obese cancer E+ E− 2 000 40 no cancer ? ? medical record review Not obese E+ E− 200 400 E+ E− 2 200 440 2 640 ? ? All 20 000 10 100 30 100 computerized databases 20
- Slides: 20