Class 11 ChiSquared Test of Independence EMBS Section
Class 11 Chi-Squared Test of Independence EMBS Section 11. 3
Chi-squared GOF test • One row (column) of Observed Counts • One row (column) of Expected Counts determined based on H 0 – All categories are equally likely (Roulette Wheel, Soccer birth months) – Categories have specified p’s (M&M colors) – #girls in 4 is binomial(n=4, p=. 5) (Denmark Fams) – Expected Bin counts from NORMAL distribution (Lorex) • Calculated chi-squared, dof, chidist, pvalue, reject or not.
Supermarket Survey • A random sample of 160 employees of a national Supermarket chain were asked about a proposed wage freeze. • There were two categorical variables in the resulting 160 -element data set. – JOB (Stacker, Sales, Admin) – RESPONSE (favorable, unfavorable, no comment)
The data set ID Job Response 1 Stacker Unfav 2 Stacker Unfav 3 Admin NC . . 158 Sales NC 159 Sales Fav 160 Stacker Fav
To examine the relationship between 2 categorical variables, start with a contingency table Response Job Fav Unfav NC Total Stacker 6 30 4 40 Sales 12 24 20 56 Admin 20 10 34 64 Total 38 64 58 160 Are RESPONSE and JOB independent?
H 0: Response and Job are independent Response Job Fav Unfav NC Total Stacker 40 Sales 56 Admin 64 Total 38 64 58 160 What are the expected counts given H 0?
H 0: Response and Job are independent Response Job Fav Unfav NC Total Stacker 40 Sales 56 Admin 64 Total 38 64 58 What are the expected counts given H 0? 160 (11. 9)
Calculate the Expected Counts under H 0. Response Fav Unfav Job Stacker 9. 5 NC Total 16 14. 5 40 Sales 13. 3 22. 4 20. 3 56 Admin 15. 2 25. 6 23. 2 64 Total 38 64 58 160 Expected Counts if independent.
We know what to do now with our table of Observed and Expected Counts… The calculated chi-squared statistic The sum of the distances.
Calculate the table of distances. . Response Fav Unfav NC Total Job Stacker 1. 29 12. 25 7. 60 Sales 0. 13 0. 11 0. 00 Admin 1. 52 9. 51 5. 03 Total 37. 44
Get the p-value Dof =(#rows-1)(#cols-1) =2*2 =4 Response Fav Unfav NC Job Stacker 1. 29 12. 25 7. 60 Sales 0. 13 0. 11 0. 00 Admin 1. 52 9. 51 5. 03 Total P-value =Chidist(37. 44, 4) =1. 46 E-07 37. 44
=CHITEST will do the last two steps • =CHITEST(range containing the Os, range containing the Es) • Calculates the chisquared, compares it to the chidist using the appropriate dof, and reports the p-value. CHITEST will • =CHITEST(for our data) = 1. 46 E-07 also work for the GOF test!! • So…. . You just have to calculate the Es.
Excel Demo if time…
Statistically Significant? May 13, 1999 Web posted at: 11: 38 a. m. EDT (1538 GMT) (CNN) -- Young children who sleep with a light on may have a substantially higher risk of developing nearsightedness as they get older, says a new study in the journal Nature. The collaborative study of 479 children by researchers at the University of Pennsylvania Medical Center and The Children's Hospital of Philadelphia found 55 percent (of the 100) children who slept with a room light on before age 2 had myopia, or nearsightedness, between ages 2 and 16. Of the (112) children who slept with a night-light before age 2, 34 percent were myopic, while just 10 percent of children who slept in darkness were nearsighted.
1. Create the Contingency Table of Observed Counts Earlier we would have asked P(Light│Myopic) =55/120 Myopic Not Light 55 45 100 Night Light 38 74 112 Dark 27 240 267 120 359 479 Statistically Significant =chidist(84. 21, 2) = 5. 19 E-19 Expected Counts if Independent 25. 05 28. 06 66. 89 Distances 35. 80 3. 52 23. 79 11. 97 1. 18 7. 95 84. 21 74. 95 83. 94 200. 11 Now we want to test H 0: Sleep Conditions and Subsequent Eyesight are independent H 0: P(M) is equal for all three sleeping conditions.
Suppose we Flip the contingency table? Myopic Not Light 55 45 100 Night Light 38 74 112 Dark 27 240 267 120 359 479 Calculated chi-squared = 84. 21 P-value = 5. 19 E-19 Light Night Light Dark Myopic 55 38 27 Not 45 74 240 100 112 267 Calculated chi-squared = P-value = 120 359 479
Assignment 12 • Use the class data to test the independence of ATHLETE and HS STAT. • Use the Denmark Family data to test independence of “Gender Mix of first 3” and “Have 4? ” ID 1 2 3. . 67 68 69 HS Stat? Athlete? Yes No No No. . No No Yes ID 4 6 9 12 14 25. . 700023 700025 700029 Child 1 M F F F. . F M M Child 2 M M M F. . F F M Child 3 M M F M. . M F M Child 4 Famsize 3 3 3 F 4 3 3. . M 4 3 3
- Slides: 17