Chapter 5 TwoWay Tables Associations Between Categorical Variables
Chapter 5 Two-Way Tables Associations Between Categorical Variables 1
Associations between variables • Quantitative variables correlation [Ch 3] & regression [Ch 4] • categorical variables twoway tables of frequency counts [Ch 5] 2
Two-Way Table of Counts R-by-C tables Variables EDUCATION variable = row variable (4 levels) AGE variable = column variable (3 levels) This is a 4 -by-3 table 3
Marginal Distributions Variables 27, 858 58, 077 44, 465 44, 828 37, 786 81, 435 56, 008 Row variable marginal totals Column variable marginal distribution 4
Marginal Percents • Relative frequencies (%s) for each variable separately • Descriptive purposes only; does not address association • Illustrative Example (Distribution of education level) – Statement: Describe the distribution of education levels in the population – Plan: Calculate marginal percents for row variable “EDUCATION” 5
Marginal Percents Example Step 3: “Solve” Row totals Table total % not completing HS = 27, 859 / 175, 230 × 100% = 15. 9% % completing HS = 58, 077 / 175, 230 × 100% = 33. 1% % with 1 -3 yrs college = 44, 465 / 175, 230 × 100% = 25. 4% % with 4+ yrs college = 44, 828 / 175, 230 × 100% = 25. 6% 6
Marginal Percents (Example) Step 4: “Conclude” • 16% did not complete high school • 33% completed high school • 25% completed 1 to 3 years of college • 26% completed 4+ years of college Merely descriptive statements 7
Association Use conditional proportions to determine associations • If the row variable is the explanatory variable → compare conditional row proportions • If the column variable is the explanatory variable → compare conditional column proportions 8
Example: Association between AGE & EDUCATION State: Is AGE associated with EDUCATION level? Plan: Since AGE is the explanatory variable calculate conditional column proportions. We do not need to calculate every conditional proportion. (Be selective. ) Let us calculate the proportion completing 4+ years of college by AGE 9
Example: “Solve” & “Conclude” Conclude: As age goes up, % completing college goes down Negative association between age and college completion 10
Direction of association • No association: conditional percents nearly equal at all levels of the explanatory variable • Positive association: as explanatory variable rises conditional percentages increase • Negative associations: as explanatory variable rises conditional percentages go down 11
Example: Gender bias? State: Is ACCEPTANCE into UC Berkeley graduate school (response variable) associated with GENDER (explanatory variable)? Male Female Total Accepted 198 88 286 Not accept. 162 112 274 Total 360 200 560 Plan: Since GENDER is the explanatory variable calculate row percents (acceptance “rates” by gender); compare % accepted by GENDER 12
Example: “Gender bias? ” Step 3: Solve Male Female Total Accepted 198 88 286 Not accept 162 112 274 Total 360 200 560 Conclude: positive association between “maleness” and acceptance 13
Simpson’s Paradox ≡ lurking variable reverses direction of the association • Lurking variable MAJOR applied to – Business school major (240 applicants) – Art school major (320 applicants) • State: Does lurking variable explain association between maleness and acceptance? • Plan: Subdivide (“stratify”) data into subgroups according to lurking variable MAJOR then calculate acceptance rates by gender within subgroups 14
“Gender Bias” Data by MAJOR Male Female Total All Applicants Success Failure 198 162 88 112 286 274 Business School Applicants Success Failure Total Male 18 102 120 Female 24 96 120 Total 42 198 240 Total 360 200 560 Art School Applicants Success Failure Total Male 180 60 240 Female 64 16 80 Total 244 76 320 15
Business School Applicants Success Failure Total Male 18 102 120 Female 24 96 120 Total 42 198 240 Conclude: Negative association with maleness 16
Art School Applicants Success Failure Total Male Female 180 64 60 16 240 80 Total 244 76 320 Conclude: Negative association with maleness 17
Gender Bias Example Conclusion • Overall: higher acceptance rate for men • Within Business school: higher acceptance rate for women • Within Art school: higher acceptance rate for women • Therefore, the lurking variable (MAJOR) reversed the direction of the association (Simpson’s Paradox) • Acceptance to grad school at UC Berkeley favored women after “controlling for” MAJOR 18
HIV vaccine boost (Exercise 5. 6) State: Do data support that vaccine delivered by EP results in a higher proportion responding? Plan = ? Solution = ? Conclusion = ? 19
Kidney Stones (Exercise 5. 7) Small Stones Large Stones Open Percutan Surgery eous Success 81 234 Failure 6 36 Success Failure 192 55 71 25 (a) Find % of kidney stones, combining the data for small and large stones, that were successfully removed for each of the two procedures. Which procedure had the higher overall success rate? (b) What % of all small kidney stones were successfully removed? What % of all large kidney stones…? Which type of kidney stone is easier to treat?
Helicopter Evacuation Lurking Variable /Simpson’s Paradox X Y Helicopter or Road Survived or Died Z Accident Severity 22
- Slides: 21