ChiSquare Tests Categorical data 1 sample compared to














- Slides: 14
Chi-Square Tests • Categorical data • 1 -sample, compared to theoretical distribution – Goodness-of-Fit Test • 2+ samples, 2+ levels of response variable – Chi-square Test Chi-Square Tests Slide #1
Chi-Square -- Examples • Does the dominant plants in plots differ between two locations? • Does the frequency of females in majors differ between majors in the natural sciences, social sciences, and humanities? • Does the occurrence of a food item in the stomachs of lake trout and chinook salmon differ? Chi-square Slide #2
What do those examples have in common? • A categorical response variable – dominant plant in a plot – sex of student (male or female) – occurrence of a food item (Y/N) • Compare response frequencies among >2 groups – between two locations – among three divisions – between lake trout and chinook salmon Chi-square Slide #3
An Illustrative Example • When Chinook Salmon were first introduced to Lake Superior there was concern that they would compete with native Lake Trout for Lake Herring. Preliminarily, fisheries biologists classified the diets of 50 Lake Trout and 40 Chinook Salmon as containing Lake Herring or not. They found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring. Test (at the 10% level) if there is a difference in the proportion of Lake Trout and Chinook Salmon that had Lake Herring. Chi-square Slide #4
Observed Table – Recall – “… the diets of 50 Lake Trout and 40 Chinook Salmon … found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring” 36 24 60 14 16 30 50 40 90 Chi-square Slide #5
Observed Table • If there is no difference between rows (i. e. , the Ho) then the total row could represent either row. • Thus, the proportion of predator (regardless of type) that consumed Lake Herring is estimated to be 60/90 or 0. 67 Chi-square Slide #6
Expectations if Ho is true • If there is no difference and the common proportion is estimated by 0. 67 then how many …. • LT do we expect to have LH = 50*0. 67 • LT … … to not have LH = 50*0. 33 • CS … … to have LH = 40*0. 67 • CS … … to not have LH = 40*0. 33 Chi-square Slide #7
Create Expected Table • LT to have LH = = 33. 3 Chi-square Slide #8
Create Expected Table = 16. 7 • LT to NOT have LH = 33. 3 26. 7 13. 3 • Expected counts are the product of the marginal totals divided by the table total. Chi-square Slide #9
A New Test Statistic df = (rows-1)*(cols-1) Chi-Square Tests Slide #10
Chi-Square Distribution • Right-skewed (all values are positive) • Less sharply skewed with increasing df – df are related to the size of the table, not n • All p-values are “right-ofs” – no “onetailed” tests with chi-square Chi(3) Chi(10) Chi(20) • Examine HO – page 1 0 10 20 30 40 50 Chi-square Chi-Square Tests Slide #11
Chi-Square Test • Ho: “distribution of individuals into the levels is same for each population” • HA: “distribution of individuals into levels is different for at least one pair of populations” • Assume: at least 5 in each cell of expected table • Statistic: Observed frequency table • Test Statistic: • df: (rows-1)*(columns-1) • When: categorical variable, 2+ populations/groups Chi-square Slide #12
A Full Example • When Chinook Salmon were first introduced to Lake Superior there was concern that they would compete with native Lake Trout for Lake Herring. Preliminarily, fisheries biologists classified the diets of 50 Lake Trout and 40 Chinook Salmon as containing Lake Herring or not. They found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring. Test (at the 10% level) if there is a difference in the proportion of Lake Trout and Chinook Salmon that had Lake Herring. Chi-square Slide #13
Another Full Example • Modification -- the researchers recorded what the dominant food item was. Do the dominant food items in Lake Trout and Chinook Salmon differ at the 5% level? • See R HO Page 2. Chi-square Slide #14