Tutorial ChiSquare Distribution Presented by Nikki Natividad Course
Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics 2
Purpose � To measure discontinuous categorical/binned data in which a number of subjects fall into categories � We want to compare our observed data to what we expect to see. Due to chance? Due to association? � When can we use the Chi-Square Test? ◦ Testing outcome of Mendelian Crosses, Testing Independence – Is one factor associated with another? , Testing a population for expected proportions
Assumptions: � 1 or more categories �Independent observations �A sample size of at least 10 �Random sampling �All observations must be used �For the test to be accurate, the expected frequency should be at least 5
Conducting Chi-Square Analysis 1) 2) 3) 4) 5) 6) Make a hypothesis based on your basic biological question Determine the expected frequencies Create a table with observed frequencies, expected frequencies, and chi-square values using the formula: (O-E)2 E Find the degrees of freedom: (c-1)(r-1) Find the chi-square statistic in the Chi-Square Distribution table If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and
Example 1: Testing for Proportions HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black ants. HA: Horned lizards eat more amounts of one species of ants than the others. Leaf Cutter Carpenter Black Ants Total Ants Observed 25 18 17 60 Expected 20 20 20 60 O-E 5 -2 -3 0 (O-E)2 E 1. 25 0. 2 0. 45 χ2 = 1. 90 χ2 = Sum of all: (O-E)2 E Calculate degrees of freedom: (c-1)(r-1) = 3 -1 = 2 Under a critical value of your choice (e. g. α = 0. 05 or 95% confidence), look up Chi-square statistic on a Chi-square distribution table.
Example 1: Testing for Proportions χ2α=0. 05 = 5. 991
Example 1: Testing for Proportions Leaf Cutter Ants Carpenter Ants Black Ants Total Observed 25 18 17 60 Expected 20 20 20 60 O-E 5 -2 -3 0 (O-E)2 E 1. 25 0. 2 0. 45 χ2 = 1. 90 Chi-square statistic: χ2 = 5. 991 Our calculated value: χ2 = 1. 90 *If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant difference that is not due to chance. 5. 991 > 1. 90 ∴ We do not reject our null hypothesis.
SAS: Example 1 Included to format the table Define your data Indicate what your want in your output
SAS: Example 1
SAS: What does the p-value mean? “The exact p-value for a nondirectional test is the sum of probabilities for the table having a test statistic greater than or equal to the value of the observed test statistic. ” High p-value: High probability that test statistic > observed test statistic. Do not reject null hypothesis. Low p-value: Low probability that test statistic > observed test statistic. Reject null hypothesis.
SAS: Example 1 High probability that Chi-Square statistic > our calculated chisquare statistic. We do not reject our null hypothesis.
SAS: Example 1
Example 2: Testing Association c HO: Gender and eye colour are not associated with each other. HA: Gender and eye colour are associated with each other. cellchi 2 = displays how much each cell contributes to the overall chi-squared value no col = do not display totals of column no row = do not display totals of rows chi sq = display chi square statistics
Example 2: More SAS Examples
Example 2: More SAS Examples (2 -1)(3 -1) = 1*2 = 2 High probability that Chi-Square statistic > our calculated chi-square statistic. (78. 25%) We do not reject our null hypothesis.
Example 2: More SAS Examples If there was an association, can check which interactions describe association by looking at how much each cell contributes to the overall Chisquare value.
Limitations � No categories should be less than 1 � No more than 1/5 of the expected categories should be less than 5 ◦ To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more � Yates Correction* ◦ When there is only 1 degree of freedom, regular chi-test should not be used ◦ Apply the Yates correction by subtracting 0. 5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values
What do these mean?
Likelihood Ratio Chi Square �
Continuity-Adjusted Chi-Square Test �
Mantel-Haenszel Chi-Square Test QMH = (n-1)r 2 � r 2 is the Pearson correlation coefficient (which also measures the linear association between row and column) ◦ http: //support. sas. com/documentation/cdl/en/procstat/63104/HTM L/default/viewer. htm#procstat_freq_a 0000000659. htm � Tests alternative hypothesis that there is a linear association between the row and column variable � Follows a Chi-square distribution with 1 degree of freedom
Phi Coefficient �
Contigency Coefficient �
Cramer’s V �
Yates & 2 x 2 Contingency Tables H : Heart Disease is not associated with cholesterol levels. O HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12. 65 9. 35 22 Chi-Square 0. 44 0. 59 1. 03 No Heart Disease 8 10 18 Expected 10. 35 7. 65 18 Chi-Square 0. 53 0. 72 1. 25 TOTAL 23 17 40 Chi-Square Total Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1 We need to use the YATES CORRECTION 2. 28
Yates & 2 x 2 Contingency Tables H : Heart Disease is not associated with cholesterol levels. O HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12. 65 9. 35 22 Chi-Square 0. 27 0. 37 0. 64 No Heart Disease 8 18 Expected 10. 35 Chi-Square 0. 33 10 (|15 -12. 65| - 2 0. 5)7. 65 12. 65 0. 45 = 0. 27 0. 78 TOTAL 23 17 40 Chi-Square Total 18 1. 42
Example 1: Testing for Proportions χ2α=0. 05 = 3. 841
Yates & 2 x 2 Contingency Tables H : Heart Disease is not associated with cholesterol levels. O HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12. 65 9. 35 22 Chi-Square 0. 27 0. 37 0. 64 No Heart Disease 8 10 18 Expected 10. 35 7. 65 18 Chi-Square 0. 33 0. 45 0. 78 TOTAL 23 17 40 Chi-Square Total 1. 42 3. 841 > 1. 42 ∴ We do not reject our null hypothesis.
Fisher’s Exact Test � Left: Use when the alternative to independence is negative association between the variables. These observations tend to lie in lower left and upper right cells of the table. Small p-value = Likely negative association. � Right: Use this one-sided test when the alternative to independence is positive association between the variables. These observations tend to lie in upper left and lower right cells or the table. Small pvalue = Likely positive association. � Two-Tail: Use this when there is no prior alternative.
Yates & 2 x 2 Contingency Tables
Yates & 2 x 2 Contingency Tables
HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet.
Conclusion � The Chi-square test is important in testing the association between variables and/or checking if one’s expected proportions meet the reality of one’s experiment � There are multiple chi-square tests, each catered to a specific sample size, degrees of freedom, and number of categories � We can use SAS to conduct Chi-square tests on our data by utilizing the command proc freq
References Chi-Square Test Descriptions: http: //www. enviroliteracy. org/pdf/materials/1210. pdf http: //129. 123. 92. 202/biol 1020/Statistics/Appendix %206%20%20 The%20 Chi-Square%20 TEst. pdf Ozdemir T and Eyduran E. 2005. Comparison of chisquare and likelihood ratio chi-square tests: power of test. Journal of Applied Sciences Research. 1(2): 242 -244. SAS Support website: http: //www. sas. com/index. html “FREQ procedure” You. Tube Chi-square SAS Tutorial (user: mbate 001): http: //www. youtube. com/watch? v=ACb. Q 8 FJTq 7 k
- Slides: 34