Monday Dec 2 Chisquare Goodness of Fit Chisquare

  • Slides: 26
Download presentation
Monday, Dec. 2 Chi-square Goodness of Fit Chi-square Test of Independence: Two Variables.

Monday, Dec. 2 Chi-square Goodness of Fit Chi-square Test of Independence: Two Variables.

gg yy yg yg yy yg 25% yg yg gg gy 25%

gg yy yg yg yy yg 25% yg yg gg gy 25%

Pea Color freq Observed freq Expected Yellow 158 150 Green 42 50 200 TOTAL

Pea Color freq Observed freq Expected Yellow 158 150 Green 42 50 200 TOTAL

Chi Square Goodness of Fit Pea Color freq Observed freq Expected Yellow 158 150

Chi Square Goodness of Fit Pea Color freq Observed freq Expected Yellow 158 150 Green 42 50 200 TOTAL k 2 = i=1 (fo - fe)2 fe d. f. = k - 1, where k = number of categories of in the variable.

“… the general level of agreement between Mendel’s expectations and his reported results shows

“… the general level of agreement between Mendel’s expectations and his reported results shows that it is closer than would be expected in the best of several thousand repetitions. The data have evidently been sophisticated systematically, and after examining various possibilities, I have no doubt that Mendel was deceived by a gardening assistant, who knew only too well what his principal expected from each trial made…” -- R. A. Fisher

Chi Square Goodness of Fit Pea Color freq Observed freq Expected Yellow 151 150

Chi Square Goodness of Fit Pea Color freq Observed freq Expected Yellow 151 150 Green 49 50 200 TOTAL k 2 = i=1 (fo - fe)2 fe d. f. = k - 1, where k = number of categories of in the variable.

Peas to Kids: Another Example Goodness of Fit At my children’s school science fair

Peas to Kids: Another Example Goodness of Fit At my children’s school science fair last year, where participation was voluntary but strongly encouraged, I counted about 60 boys and 40 girls who had submitted entries. Since I expect a ratio of 50: 50 if there were no gender preference for submission, is this observation deviant, beyond chance level?

Boys Girls Expected: 50 50 Observed: 60 40

Boys Girls Expected: 50 50 Observed: 60 40

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo -

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo - fe)2 fe

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo -

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo - fe)2 fe For each of k categories, square the difference between the observed and the expected frequency, divide by the expected frequency, and sum over all k categories.

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo -

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo - fe)2 fe = (60 -50)2 50 + (40 -50)2 For each of k categories, square the difference between the observed and the expected frequency, divide by the expected frequency, and sum over all k categories. 50 = 4. 00

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo -

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo - fe)2 fe = (60 -50)2 50 + (40 -50)2 50 For each of k categories, square the difference between the observed and the expected frequency, divide by the expected frequency, and sum over all k categories. This value, chi-square, will be distributed with known probability values, where the degrees of freedom is a function of the number of categories (not n). In this one-variable case, d. f. = k - 1. = 4. 00

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo -

Boys Girls Expected: 50 50 Observed: 60 40 2 k = i=1 (fo - fe)2 fe = (60 -50)2 50 + (40 -50)2 50 For each of k categories, square the difference between the observed and the expected frequency, divide by the expected frequency, and sum over all k categories. This value, chi-square, will be distributed with known probability values, where the degrees of freedom is a function of the number of categories (not n). In this one-variable case, d. f. = k - 1. Critical value of chi-square at =. 05, d. f. =1 is 3. 84, so reject H 0. = 4. 00

Chi-square Test of Independence Are two nominal level variables related or independent from each

Chi-square Test of Independence Are two nominal level variables related or independent from each other? Is race related to SES, or are they independent?

White Black Hi 12 3 15 Lo 16 16 32 19 47 SES 28

White Black Hi 12 3 15 Lo 16 16 32 19 47 SES 28

The expected frequency of any given cell is Row n x Column n Total

The expected frequency of any given cell is Row n x Column n Total n White Black Hi 12 3 15 Lo 16 16 32 19 47 SES 28

r c 2 = r=1 c=1 (fo - fe)2 fe At d. f. =

r c 2 = r=1 c=1 (fo - fe)2 fe At d. f. = (r - 1)(c - 1)

The expected frequency of any given cell is Row n x Column n Total

The expected frequency of any given cell is Row n x Column n Total n (15 x 28)/47 (15 x 19)/47 15 (32 x 28)/47 (32 x 19)/47 32 19 47 28

The expected frequency of any given cell is Row n x Column n Total

The expected frequency of any given cell is Row n x Column n Total n (15 x 28)/47 (15 x 19)/47 8. 94 6. 06 (32 x 28)/47 (32 x 19)/47 19. 06 28 12. 94 19 15 32 47

Please calculate: r c 2 = r=1 c=1 (fo - fe)2 fe 12 8.

Please calculate: r c 2 = r=1 c=1 (fo - fe)2 fe 12 8. 94 3 6. 06 15 16 19. 06 16 12. 94 32 19 47 28

Important assumptions: Independent observations. Observations are mutually exclusive. Expected frequencies should be reasonably large:

Important assumptions: Independent observations. Observations are mutually exclusive. Expected frequencies should be reasonably large: d. f. 1, at least 5 d. f. 2, >2 d. f. >3, if all expected frequencies but one are greater than or equal to 5 and if the one that is not is at least equal to 1.