Chapter 18 CrossTabulated Counts Part A 6112021 1

Chapter 18 Cross-Tabulated Counts Part A 6/11/2021 1

Chapter 18, Part A: • 18. 1 Types of Samples • 18. 2 Naturalistic and Cohort Samples • 18. 3 Chi-Square Test of Association 6/11/2021 2

Types of Samples I. Naturalistic Samples ≡ simple random sample or complete enumeration of the population II. Purposive Cohorts ≡ select fixed number of individuals in each exposure group III. Case-Control ≡ select fixed number of diseased and non-diseased individuals 6/11/2021 3

Naturalistic (Type I) Sample Random sample of study base 6/11/2021 4

Naturalistic (Type I) Sample Random sample of study base • How did we study CMV (the exposure) and restenosis (the disease) relationship via a naturalistic sample? • A population was identified and sampled • Sample classified as CMV+ and CMV− • Disease occurrence (restenosis) was studied and compared in the groups. 6/11/2021 5

Purposive Cohorts (Type II sample) Fixed numbers in exposure groups • How would we study CMV and restenosis with a purposive cohort design? • A population of CMV+ individuals would be identified. – From this population, select, say 38, individuals. • A population of CMV− individuals would be identified. – From this population, select, say, 38 individuals. • Disease occurrence (restenosis) would be studied and compared among the groups. 6/11/2021 6

Case-control (Type III sample) Set number of cases and non-cases • How would I do study CMV and restenosis with a case-control design? • A population of patents who experienced restenosis (cases) would be identified. – From this population, select, say, 38, individuals. • A population of patients who did not restenose (controls) would be identified. – From this population, select, say, 38 individuals. • The exposure (CMV) would be studied and compared among the groups. 6/11/2021 7

Case-Control (Type III sample) Set number of cases and non-cases 6/11/2021 8

Naturalistic Sample Illustrative Example • SRS, N = 585 Edu. • Cross-classify education level HS (categorical JC exposure) and JC+ smoking status (categorical UG disease) Grad • Talley R-by-C table “cross-tab” Total 6/11/2021 Smoke? + − Tot 12 38 50 18 67 85 27 95 122 32 239 271 5 52 57 94 491 585 9

Cross-tabulation (cont. ) Smoke? Educ. HS + − Tot 12 38 50 JC 18 67 85 Some 27 95 122 UG 32 239 271 Grad 5 52 57 Total 94 491 585 6/11/2021 Row margins Total Column margins 10

Cross-tabulation of counts For uniformity, we will always: put the exposure variable in rows put the disease variable in columns 6/11/2021 11

Exposure / Disease relationship Use conditional proportions to describe relationships between exposure and disease 6/11/2021 12

Conditional Proportions Exposure / Disease Relationship R-by-2 Table Grp 1 Grp 2 ↓ Grp R Total 6/11/2021 + a 1 a 2 ↓ a. R m 1 − b 1 b 2 ↓ b. R m 2 Total n 1 n 2 ↓ n. R N In naturalistic and cohort samples row percents! 13

Example Prevalence of smoking by education: Lower education associated with higher prevalence (negative association between education and smoking) 6/11/2021 14

Relative Risks Let group 1 represent the least exposed group 6/11/2021 15

Illustration: RRs Note trend 6/11/2021 16

k Levels of Disease Efficacy of Echinacea example. Randomized controlled clinical trial: echinacea vs. placebo in treatment of URI Exposure ≡ Echinacea vs. placebo Disease ≡ severity of illness Source: JAMA 2003, 290(21), 2824 -30 6/11/2021 17

Row Percents for Echinacea Example Echinacea group fared slightly worse than placebo group 6/11/2021 18

Chi-Square Test of Association A. H 0: no association in population Ha: association in population B. Test statistic 6/11/2021 19

Observed Degree Smoke + HS 12 JC JC+ UG Grad Total 6/11/2021 Smoke − Tot 38 50 18 67 85 27 95 122 32 239 271 5 52 57 94 491 585 20

Expected High. S Smoke + (50 × 94) ÷ 585 = Smoke − (50 × 491) ÷ 585 = 50 JC 8. 034 13. 658 Some 19. 603 102. 397 122 UG 43. 545 227. 455 271 9. 159 94 47. 841 491 57 585 Grad Total 6/11/2021 41. 966 71. 342 Total 85 21

Continuity Corrected Chi-Square • Pearson’s (“uncorrected”) chi-square • Yates’ continuity-corrected chi-square: 6/11/2021 22

Chi-Square Hand Calc. 6/11/2021 23

Chi-Square P-value • X 2 stat= 13. 20 with 4 df • Table E 4 df row bracket chi-square statistic look up right tail (P-value) regions • Example bracket X 2 stat between 11. 14 (P =. 025) and 13. 28 (P =. 01) • . 01 < P <. 025 Right tail df =4 6/11/2021 0. 975 0. 20 0. 15 0. 10 0. 05 0. 025 0. 01 0. 48 5. 39 5. 99 6. 74 7. 78 9. 49 11. 14 13. 28 14. 86 24

Illustration: X 2 stat= 13. 20 with 4 df The P-value = AUC in the tail beyond X 2 stat 6/11/2021 25

Win. PEPI > Compare 2 > F 1 Input screen row 5 not visible Output 6/11/2021 26

Chi-Square, cont. 1. How the chi-square works. When observed values = expected values, the chi-square statistic is 0. When the observed minus expected values gets large evidence against H 0 mounts 2. Avoid chi-square tests in small samples. Do not use a chi-square test when more than 20% of the cells have expected values that are less than 5. 6/11/2021 27

Chi-Square, cont. 3. Supplement chi-squares with descriptive stat. Chi-square statistics do not quantify effects 4. For 2 -by-2 tables, chi-square and z tests produce identical P-values. 6/11/2021 28

Discussion and demo on power and sample size • For estimation • For testing – Power – Sample size 6/11/2021 29