Precision and Validity Selection Bias Dr Jrn Olsen
Precision and Validity: Selection Bias Dr. Jørn Olsen Epi 200 B January 26 and 28, 2010 1
Bias and confounding (Last, Dictionary) p Bias: Deviation of results or inference from truth, or processes leading to such deviations. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth. 2
Bias and confounding (Last, Dictionary) p Confounding: A situation in which the effect of two processes are not separated. p Confounder, confounding factor, confounding variable-Poor term, confounding is study specific. No variables are always confounders. 3
Bias and confounding (Last, Dictionary) p Selection bias: caused by the way subjects are selected into the study or because there are selective losses of subjects prior to data analyses. p In a cohort study the first type of selection bias can often be described as selection leading to more or less confounding. 4
Selection Bias Selection as a design problem p Healthy worker selection, Berkson bias p Most problematic non-responders in casecontrol studies, loss to follow-up p 5
Survey Non-responders Smokers Non-smokers All N % % 400 200 40 20 40 33. 3 66. 6 60 40 20 80 100% 100% 6
Follow-up study-Complete Cohort E N D + - 1000 200 100 RR = 2. 0, RD = 0. 10 7
50% refuse to take part in the study E N D + E 500 D 100 50 RR = 2. 0, RD = 0. 10 S 8
E N D + - 1000 500 200 50 E D E S D S but E D S C PR = 2. 0, RD = 0. 10 Is unlikely at baseline since they do not know D. Is more likely C could be SES. 9
Most likely S E D C In cohort studies; selection may cause confounding, perhaps more likely reduce confounding. Poor health, poor social conditions, may correlate with selection. Conditioning on S would open an E-C path-induce confounding that was not present before 10
Large cohorts recruit seldom more than 50% p DNBC about 30%; half of GPs participated 60% of the invited accepted invitation p Selection bias – Yes, if used as a survey p But when making internal comparisons? p 11
Table 2. RORs Based on Adjusted* ORs in the Source Population and Among Participants Ref Nohr et al. Epidemiology 206; 17: 413 -8 12
Internal comparison, counterfactual guidelines p RR = 2 for this cohort p External validity, generalization p For the source population? p For all in the future? p For other ethnic groups, etc. p 13
p Selection bias in a cohort study is mainly related to a loss to follow-up. RCT – Drug, N = 100 5 loss to follow-up A pain killer randomization Placebo, N = 100 40 loss to follow-up Reason to expect selection bias? Will “intention to treat” solve the problem? Not when estimating effect size , but may be ok when testing Ho 14
Follow-up study E D D All + 150 9850 10, 000 - 50 9950 10, 000 RR = 3. 0 15
Now 20% loss to follow-up among exposed and 10% among not exposed E D D All + 120 7880 8, 000 - 45 8955 9, 000 RR = 3. 0 16
Suppose we got: E D D All + 140 7860 8, 000 - 40 8960 9, 000 RR = 3. 9 How could this happen? When is it likely? 17
Source population E + D A D B Total N 1 - C D N 0 Study population E + D a D b All n 1 - c d n 0 Selection bias if A/N 1 C/N 0 ≠ a/n 1 c/n 0 18
p Does condom use protect against STDs? p What is the source population for such a study? 19
p p A case-control study samples cases from an STD clinic and controls from the catchment area of the clinic. Any problems with that? Results could be like this: Males with infected partners No requirement for infected partners Condom use cases controls Yes No 100 600 200 600 OR = 0. 5 OR = 1. 0 20
E D S Selection bias is often a problem in a case-control study E D D + - 20 80 10 90 100 E D D + - 20 40 5 45 60 50 OR = 20/80 = 2. 25 10/90 OR = 20/40 = 4. 50 5/45 21
Response rates E D D + - 100% 50% 50% 22
Response rates ORresponders = ORtrue x ORresponse 4. 50 = 2. 25 x rates 100/50 50/50 When would we expect this pattern? When would we expect the opposite? 23
Selections of relevance for designs Berkson’s bias Disease may be correlated in hospital patients but not in the population 100, 000 30% asthma; 30, 000 10% bronchitis; 10, 000 0. 3 x 0. 1 = 0. 03; 3000 have both diseases 24
100, 000 3000 10, 000 25
Selections of relevance for designs In the hospital, let’s assume 40% of asthma patients get hospitalized, and 60% of patients with bronchitis 27000 asthma only 7000 bronchitis only 3000 with both diseases - 10800 in hospital 4200 in hospital 2280 in hospital 0. 4 + 0. 6 – 0. 4 x 0. 6 = 0. 76 Thus overrepresented in hospital data, the 2 diseases will look as if they are associated but they are not; those with both diseases just have a higher probability of being hospitalized A “Berkson’s like” bias could be seen for other factors that influence hospitalization rates or diagnostic probabilities. 26
30, 000 11, 080 3000 2280 10, 000 6, 480 Selections of relevance for designs 27
Smoking HBP CVD + 100 + 20 + + - 6 14 8 72 (30%) + + - 2 18 4 76 (10%) - 80 - 100 + 20 - 80 Smoking HBP ? CVD HBP (10%) (5%) CVD risk highest for those with high blood pressure and for smokers Estimates between smoking and HBP before or after exclusion of patients with CVD OR – smoking exposure odds ratios for HBP 28
Smoking HBP CVD + 100 + 20 + + - 6 14 8 72 + + - 2 18 4 76 - 80 - 100 + 20 - 80 No exclusion of CVD OR = 20/20 80/80 =1 29
p Be careful when excluding diseases from the study if they are in the causal pathway, or if they are causally linked to the end point of your study. 30
Smoking HBP CVD + 100 + 20 + + - 6 14 8 72 + + - 2 18 4 76 - 80 - 100 + 20 - 80 Use CVD as controls and exclude them from the case group OR = 14/18 = 0. 39 8/4 31
Smoking HBP CVD + 100 + 20 + + - 6 14 8 72 + + - 2 18 4 76 - 80 - 100 + 20 - 80 Use CVD as controls and include them in the case group OR = 20/20 = 0. 50 8/4 32
Smoking HBP CVD + 100 + 20 + + - 6 14 8 72 + + - 2 18 4 76 - 80 - 100 + 20 - 80 Exclude CVD patients from the control group but not from the case group OR = 20/20 = 1. 06 72/76 33
Smoking HBP CVD + 100 + 20 + + - 6 14 8 72 + + - 2 18 4 76 - 80 - 100 + 20 - 80 Exclude them from both groups OR = 14/18 = 0. 85 72/76 34
Using hospital controls to replace population controls is bias prone (this example is extreme, though). Controls should provide the exposure distribution in the population that gave rise to the cases. Do not take into consideration diseases that follow this pattern: Smoking HBP CVD Only: smoking HBP, and only if smoking is not causing CVD Exclusion of persons with an exposure related condition from one group but not from the other introduces a threat to validity (although one of these estimates was close to 1). Exclusion of such cases for both groups can cause bias (unless the 35 selection criteria are confounders).
Healthy worker selection Is a conceptual problem when designing the study, a violation of the counterfactual ideal Indicates that SMR values for workers who perform physical demanding jobs tend to be less than 100. The reason is that the comparison we make are biased. The population at large include people with chronic diseases (and high mortality) that cannot perform a physically demanding job). “The sick population effect” or “the stupid investigator effect” 36
MR n o i t la SMR = 80 u p o p d e s po ex Age 37
Selection operates into the workforce at recruitment and out of the workforce over time unemployment is associated with suicide risk – causal or bias? How can this be studied? 38
Selection Bias-Publication Bias p Decision making depends upon the combined evidence-e. g. Cochrane reviews not just one study. p But is the source population for Metaanalyses biased? 39
Selection Bias-Publication Bias Researchers may decide not to submit based on results p Editors may decide to review or reject based on results p Reviewers may decide to recommend publication based on results p Editors may make final conclusions based on results p All of this leads to a biased source population for reviews and meta analyses p 40
Selection Bias-Publication Bias p Example-Panayiotis et al Incl; 2005: 97: 1043 -1055. p Association between TP 53 (tumor suppressor protein) and risk of death in patients with head and neck cancers 41
Selection Bias-Publication Bias Fig. 1 42
Selection Bias-Publication Bias Fig. 2 43
Selection Bias-Publication Bias Fig. 3 44
External validity? p In an etiologic study the aim is to formulate abstract hypotheses in relation to the factors under study. p The hypotheses are abstract in the sense that they are not tied to a specific population but aim to formulate a general scientific theory. p Internal validity p External validity 45
Estrogen exposure (more than 0. 3 mg estrogen/d in at least 6 months) and cancer of the endometrium (N Engl J Med 1978; 299: 1089 -94). p Cases: All post-menopausal gynaecological cancer patients at Yale. New Haven Medical Center 1974 -1976. p Controls: Mainly patients with cancer of the cervix (60) or the ovarium (43), matched for age and race. p 46
E Cases Controls + - 35 84 4 115 All 119 OR = 12. 0 (95% c. l. 4. 1 -35. 0); = 29. 5 47
Incl. all postmenopausal women with bleedings. p Cases: Same cancer patients. p Controls: Women with bleedings, but no cancer of the endometrium, matched for age and race. p 48
E Cases Controls + - 44 105 23 126 All 149 OR = 2. 3 (95% c. l. 1. 3 -4. 1); = 8. 46 49
Horwitz et al. continued the discussion and presented new data in Lancet 1981; 2: 66 -8. In the abstract they state (shortened and modified) “In this study, to determine the frequency with which endometrial cancer escapes detection, all necropsies on 8998 eligible women showed previously unsuspected endometrial cancer in 24 of them. The estimated rate of undetected cancer 27/10, 000 is two to five times higher than the detection rate of 5/10, 000 noted by the Connecticut State Tumor Registry. ” Comments? 50
Two types of endometrial cancer: A-diagnosed, B-undetected A woman of 45 years of age would have a lifetime risk (until 80) of type A cancer 5/10, 000 x 35 = 175/10, 000 Better 1 -e -5/10, 000 x 35 = 174/10, 000 The proportion of type B cases would be 27/(27 + 174) = 13. 4% 51
p p p The most frequent and serious problem of selection bias in case-control studies is nonresponders. And an equal proportion of non-responding cases and controls is NOT a guarantee against selection bias. The question is whethere is an equal selection of exposed cases and exposed controls. 52
p The most serious selection problem in a follow-up study is loss to follow-up. p “If in doubt, stay out” 53
Sensitivity Analysis 54
p Cohort – 10 years of follow-up Smoking N Loss to follow End of follow-up up Lung cancer + - 1000 200 100 80 10 80/800 p RR = 10/900 = 9. 0 55
Sensitivity approach: p Lung cancer risk among lost to follow-ups Smokers Non-Smokers Comments RR 1/10 1/90 As for followed-up 9. 0 1/10 2/90 8. 2 0 1/90 0 2/90 0 (worst case) 1. 0 Underestimate risk for non-smokers Overestimate risk among smokers Underestimate risk for non-smokers All non-smokers lost to follow-up get lung cancer 7. 3 6. 6 0. 7 56
Selection Bias Main Points Selection of the people to the study produces bias under the following condition and more. A. Selection bias in the design 1. cross-sectional study: The sampling strategy does not produce a representative sample of the target population 57
Selection Bias Main Points, cont. 2. Cohort study/case control study: The not exposed are too far away from the counterfactual ideal. The exposed do not provide the expected disease occurrence had the exposed not been exposed; and stratification or statistical control will not be sufficient to produce unbiased estimates of effects. examples: health worker selection + many other poorly designed studies. 58
Selection Bias Main Points, cont. B. Selection bias in the conduct of study; nonresponders, loss to follow-up. 1. The cross-sectional study – response rate may correlate with what you want to estimates which would lead to a biased estimate of its prevalence. Risk of selection bias is high. 59
Selection Bias Main Points, cont. 2. The cohort study – non responses at baseline will usually not correlate directly with both the exposure and the (unknown) endpoint, but selection at baseline will often change the confounder structure (will correlate with exposure). Loss to follow may correlate with both the exposure and endpoint and lead to bias. Give higher priority to compliance to followup than to recruitment at baseline. Loss to follow-up will often cause bias in the randomized trial (intention to treat analysis). 60
Selection Bias Main Points, cont. 3. The case-control study - Non-responders may well correlate with both the exposure and the endpoint since both are known at the recruitment to the study. Keeping response rates high should be given high priority and the specific aim of the study should not be disclosed (IRB may not accept this procedure). 61
Selection Bias Main Points, cont. Selection bias is a serious problem and should be avoided if possible. Often it is not possible and its magnitude and possible impact should be investigated. 62
Steps to avoid bias related to non-responders p Keep non-responding as low as possible, expecially in surveys and case-control studies p Try to get some information on non-responders – at best for E and D, but also on confounders p Analyse data according to the time of responding p Do sensitivity analyses p Do follow-up studies (incl RCTs) 63
p So, the first concern in an etiologic study is that of VALIDITY ( FREEDOM FROM BIAS –at least known bias). p Internal validity: validity of inference drawn in relation to the members of the study population. p External validity: validity of the inferences as they extend outside the population. 64
- Slides: 64