Comparing Two Population Proportions Goal Compare two populationstreatments

Contingency Tables • Tables representing all combinations of levels of explanatory and response variables

2 x 2 Tables - Notation Group 1 Outcome Present X 1 Outcome Absent

Example - Firm Type/Product Quality Not Integrated Vertically Integrated Outcome Total High Quality Low

Notation • Proportion in Population 1 with the characteristic of interest: p 1 •

Example - Cotton Textile Producers • p 1 - True proportion of all Non-integretated

Notation (Continued) • Parameter of Primary Interest: p 1 -p 2, the difference in

Cotton Textile Producers (Continued) • Parameter of Primary Interest: p 1 -p 2, the

Confidence Interval for p 1 -p 2 (Wilson’s Estimate) • Method adds a success

Example - Cotton Textile Production 95% Confidence Interval for p 1 -p 2: Providing

Significance Tests for p 1 -p 2 • Deciding whether p 1=p 2 can

Example - Cotton Textile Production Again, there is strong evidence that non-integrated performs are

Measures of Association • • Absolute Risk (AR): p 1 -p 2 Relative Risk

Relative Risk • Ratio of the probability that the outcome characteristic is present for

Relative Risk • Estimated Relative Risk: 95% Confidence Interval for Population Relative Risk:

Relative Risk • Interpretation – Conclude that the probability that the outcome is present

Example - Concussions in NCAA Athletes • Units: Game exposures among college socer players

Example - Concussions in NCAA Athletes There is strong evidence that females have a

Odds Ratio • Odds of an event is the probability it occurs divided by

Odds Ratio • Estimated Odds Ratio: 95% Confidence Interval for Population Odds Ratio

Odds Ratio • Interpretation – Conclude that the probability that the outcome is present

Osteoarthritis in Former Soccer Players • Units: 68 Former British professional football players and

Fisher’s Exact Test • Method of testing for association for 2 x 2 tables

Example – Echinacea Purpurea for Colds • Healthy adults randomized to receive EP (n

Example – Echinacea Purpurea for Colds • Conditional on 32 people developing colds and

Mc. Nemar’s Test for Paired Samples • Common subjects being observed under 2 conditions

Mc. Nemar’s Test for Paired Samples • H 0: Probability the outcome is Present

Example - Juveniles Tried as Adults • Subjects - 2097 pairs of juveniles matched

Example - Juveniles Tried as Adults • H 0: Tendency to for rearrest is

Data Sources • Temin, P. (1988). “Product Quality and Vertical Integration in the Early

Slides: 32

Download presentation

Comparing Two Population Proportions • Goal: Compare two populations/treatments wrt a nominal (binary) outcome • Sampling Design: Independent vs Dependent Samples • Methods based on large vs small samples • Contingency tables used to summarize data • Measures of Association: Absolute Risk, Relative Risk, Odds Ratio

Contingency Tables • Tables representing all combinations of levels of explanatory and response variables • Numbers in table represent Counts of the number of cases in each cell • Row and column totals are called Marginal counts

2 x 2 Tables - Notation Group 1 Outcome Present X 1 Outcome Absent n 1 -X 1 Group Total n 1 Group 2 X 2 n 2 -X 2 n 2 Outcome Total X 1+X 2 (n 1+n 2)(X 1+X 2) n 1+n 2

Example - Firm Type/Product Quality Not Integrated Vertically Integrated Outcome Total High Quality Low Quality Group Total 33 55 88 5 79 84 38 134 172 • Groups: Not Integrated (Weave only) vs Vertically integrated (Spin and Weave) Cotton Textile Producers • Outcomes: High Quality (High Count) vs Low Quality (Count) Source: Temin (1988)

Notation • Proportion in Population 1 with the characteristic of interest: p 1 • Sample size from Population 1: n 1 • Number of individuals in Sample 1 with the characteristic of interest: X 1 • Sample proportion from Sample 1 with the characteristic of interest: • Similar notation for Population/Sample 2

Example - Cotton Textile Producers • p 1 - True proportion of all Non-integretated firms that would produce High quality • p 2 - True proportion of all vertically integretated firms that would produce High quality

Notation (Continued) • Parameter of Primary Interest: p 1 -p 2, the difference in the 2 population proportions with the characteristic (2 other measures given below) • Estimator: • Standard Error (and its estimate): • Pooled Estimated Standard Error when p 1=p 2=p:

Cotton Textile Producers (Continued) • Parameter of Primary Interest: p 1 -p 2, the difference in the 2 population proportions that produce High quality output • Estimator: • Standard Error (and its estimate): • Pooled Estimated Standard Error when p 1=p 2=p:

Confidence Interval for p 1 -p 2 (Wilson’s Estimate) • Method adds a success and a failure to each group to improve the coverage rate under certain conditions: • The confidence interval is of the form:

Example - Cotton Textile Production 95% Confidence Interval for p 1 -p 2: Providing evidence that non-integrated producers are more likely to provide high quality output (p 1 -p 2 > 0)

Significance Tests for p 1 -p 2 • Deciding whether p 1=p 2 can be done by interpreting “plausible values” of p 1 -p 2 from the confidence interval: – If entire interval is positive, conclude p 1 > p 2 (p 1 -p 2 > 0) – If entire interval is negative, conclude p 1 < p 2 (p 1 -p 2 < 0) – If interval contains 0, do not conclude that p 1 p 2 • Alternatively, we can conduct a significance test: – H 0: p 1 = p 2 Ha: p 1 p 2 (2 -sided) – Test Statistic: – P-value: 2 P(Z |zobs|) (2 -sided) Ha: p 1 > p 2 (1 -sided) P(Z zobs) (1 -sided)

Example - Cotton Textile Production Again, there is strong evidence that non-integrated performs are more likely to produce high quality output than integrated firms

Measures of Association • • Absolute Risk (AR): p 1 -p 2 Relative Risk (RR): p 1 / p 2 Odds Ratio (OR): o 1 / o 2 (o = p/(1 -p)) Note that if p 1 = p 2 (No association between outcome and grouping variables): – AR=0 – RR=1 – OR=1

Relative Risk • Ratio of the probability that the outcome characteristic is present for one group, relative to the other • Sample proportions with characteristic from groups 1 and 2:

Relative Risk • Estimated Relative Risk: 95% Confidence Interval for Population Relative Risk:

Relative Risk • Interpretation – Conclude that the probability that the outcome is present is higher (in the population) for group 1 if the entire interval is above 1 – Conclude that the probability that the outcome is present is lower (in the population) for group 1 if the entire interval is below 1 – Do not conclude that the probability of the outcome differs for the two groups if the interval contains 1

Example - Concussions in NCAA Athletes • Units: Game exposures among college socer players 1997 -1999 • Outcome: Presence/Absence of a Concussion • Group Variable: Gender (Female vs Male) • Contingency Table of case outcomes: Source: Covassin, et al (2003)

Example - Concussions in NCAA Athletes There is strong evidence that females have a higher risk of concussion

Odds Ratio • Odds of an event is the probability it occurs divided by the probability it does not occur • Odds ratio is the odds of the event for group 1 divided by the odds of the event for group 2 • Sample odds of the outcome for each group:

Odds Ratio • Estimated Odds Ratio: 95% Confidence Interval for Population Odds Ratio

Odds Ratio • Interpretation – Conclude that the probability that the outcome is present is higher (in the population) for group 1 if the entire interval is above 1 – Conclude that the probability that the outcome is present is lower (in the population) for group 1 if the entire interval is below 1 – Do not conclude that the probability of the outcome differs for the two groups if the interval contains 1

Osteoarthritis in Former Soccer Players • Units: 68 Former British professional football players and 136 age/sex matched controls • Outcome: Presence/Absence of Osteoathritis (OA) • Data: • Of n 1= 68 former professionals, X 1 =9 had OA, n 1 -X 1=59 did not • Of n 2= 136 controls, X 2 =2 had OA, n 2 -X 2=134 did not Source: Shepard, et al (2003) Interval > 1

Fisher’s Exact Test • Method of testing for association for 2 x 2 tables when one or both of the group sample sizes is small • Measures (conditional on the group sizes and number of cases with and without the characteristic) the chances we would see differences of this magnitude or larger in the sample proportions, if there were no differences in the populations

Example – Echinacea Purpurea for Colds • Healthy adults randomized to receive EP (n 1. =24) or placebo (n 2. =22, two were dropped) • Among EP subjects, 14 of 24 developed cold after exposure to RV-39 (58%) • Among Placebo subjects, 18 of 22 developed cold after exposure to RV-39 (82%) • Out of a total of 46 subjects, 32 developed cold • Out of a total of 46 subjects, 24 received EP Source: Sperber, et al (2004)

Example – Echinacea Purpurea for Colds • Conditional on 32 people developing colds and 24 receiving EP, the following table gives the outcomes that would have been as strong or stronger evidence that EP reduced risk of developing cold (1 sided test). P-value from SPSS is. 079. EP/Cold Plac/Cold 14 18 13 19 12 20 11 21 10 22

Example - SPSS Output

Mc. Nemar’s Test for Paired Samples • Common subjects being observed under 2 conditions (2 treatments, before/after, 2 diagnostic tests) in a crossover setting • Two possible outcomes (Presence/Absence of Characteristic) on each measurement • Four possibilities for each subjects wrt outcome: – – Present in both conditions Absent in both conditions Present in Condition 1, Absent in Condition 2 Absent in Condition 1, Present in Condition 2

Mc. Nemar’s Test for Paired Samples

Mc. Nemar’s Test for Paired Samples • H 0: Probability the outcome is Present is same for the 2 conditions • HA: Probabilities differ for the 2 conditions (Can also be conducted as 1 -sided test)

Example - Juveniles Tried as Adults • Subjects - 2097 pairs of juveniles matched on prior criminal record and severity of current crime • Condition: Adult vs Juvenile Court (one of each in pair) • Outcome: Whether juvenile was re-arrested during follow -up Source: Bishop et al (1996)

Example - Juveniles Tried as Adults • H 0: Tendency to for rearrest is not different between children tried as adults as those tried as juveniles • HA: Tendencies differ Evidence that tendencies differ (higher risk of rearrest among juveniles tried in adult court)

Data Sources • Temin, P. (1988). “Product Quality and Vertical Integration in the Early Cotton Textile Industry, ” The Journal of Economic History, 48(4), pp 891 -907 • Covassin, T. , C. B. Swanik, and M. L. Sachs (2003). “Sex Differences and the Incidence of Concussions Among Collegiate Athletes, ” Journal of Athletic Training, 38(3) pp 238 -244. • Shepard, G. J. , A. J. Banks, and W. G. Ryan (2003). “Ex-Professional Association Footballers Have an Increased Prevalence of Osteoarthritis of the Hip Compared with Age Matched Controls Desite Not Having Sustained Notable Hip Injuries, ” British Journal of Sports Medicine, 37, pp 80 -81. • Sperber, S. J. , L. P. Shah, R. D. Gilbert, et al (2004). “Echinacea purpurea for Prevention of Experimental Rhinovirus Colds, ” Clinical Infectious Diseases, 38, pp 1367 -1371. • Bishop, D. M, C. E. Frazier, L. Lanza-Kaduce, L. Winner (1996). “The Transfer of Juveniles to Criminal Court: Does it Make a Difference? ” Crime & Delinquency, 42, pp 171 -191.