Chisquare Test of Association or Independence 1 Chisquare

  • Slides: 22
Download presentation
Chi-square : Test of Association or Independence 1

Chi-square : Test of Association or Independence 1

Chi-square (χ2) • A chi square (χ2 ) distribution is a probability distribution. •

Chi-square (χ2) • A chi square (χ2 ) distribution is a probability distribution. • The chi-square is useful in making statistical inferences about categorical data in which the categories are two and above. 2

χ2 • Qualitative, or categorical, data are frequently collected in medical investigations. • For

χ2 • Qualitative, or categorical, data are frequently collected in medical investigations. • For example, variables assessed might include üsex, üblood group, üclassification of disease, üpresence or absence of a certain disease, or üwhether the patient survived or not. 3

Chi-square… • Definition A statistic which measures the discrepancy between K observed frequencies O

Chi-square… • Definition A statistic which measures the discrepancy between K observed frequencies O 1, O 2, . Ok and the corresponding expected frequencies e 1, e 2, . . . , ek. • Chi square = χ2 = Σ{ (Oi - ei)2 } / ei 4

Chi-square… • The sampling distribution of the chi-square statistic is known as the chi

Chi-square… • The sampling distribution of the chi-square statistic is known as the chi square distribution. • As in t distributions, there is a different χ2 distribution for each different value of degrees of freedom, but all of them share the following characteristics. 5

Characteristics 1. Every χ2 distribution extends indefinitely to the right from 0. 2. Every

Characteristics 1. Every χ2 distribution extends indefinitely to the right from 0. 2. Every χ2 distribution has only one (right ) tail. 3. As df increases, the χ2 curves get more bell shaped and approach the normal curve in appearance (but remember that a chi square curve starts at 0, not at - ∞ ) 6

The Chi Square Distribution • The chi square distribution is asymmetric and its values

The Chi Square Distribution • The chi square distribution is asymmetric and its values are always positive. Degrees of freedom are based on the table and are calculated as (rows-1)X(columns-1). 7

Assumptions • The observations must be independent • No more than 20% of the

Assumptions • The observations must be independent • No more than 20% of the excepted frequencies are less than 5. • The values in the contingency table are counts, not percent /proportions • No cell has 0 or 1 observed value 8

Procedures of Hypothesis Testing 1. State the hypothesis ØThe null hypothesis, H 0: There

Procedures of Hypothesis Testing 1. State the hypothesis ØThe null hypothesis, H 0: There is no association between the two variables versus Ø Alternative hypothesis, H 1: There is association between the two variables 9

Procedures… 2. Fix the level of significance and computing the critical (tabulated) value: the

Procedures… 2. Fix the level of significance and computing the critical (tabulated) value: the critical value can be read from the chisquare distribution table at α level of significance and df = (r-1)(c-1) where; r= no. of rows and c= no. of columns Hence, χ2 tab = χ2 (α, df) 10

Procedures… 3. Compute the test statistic: χ2 calc = Σ{ (Oi - ei)2 }

Procedures… 3. Compute the test statistic: χ2 calc = Σ{ (Oi - ei)2 } / ei 4. Decision rule: ØIf χ2 calc > χ2 tab then the decision is “reject the null hypothesis” and conclude, there is association between the two categorical variables BUT ØIf χ2 calc < χ2 tab then the decision is “accept the null hypothesis” 11

Procedures… • We can use the P-value to test the hypothesis This can be

Procedures… • We can use the P-value to test the hypothesis This can be shown symbolically as v. P-value= P(χ2 ≥ χ2 calc) v. To determine the P-value look at the χ2 distribution table with df degrees of freedom and find where the χ2 calc falls on this table 12

Interpreting Chi Square • The chi square test tells us only if the variables

Interpreting Chi Square • The chi square test tells us only if the variables are independent or not. • It does not tell us the pattern or nature of the relationship. 13

Example • To study the association of smoking and symptoms of asthma. The study

Example • To study the association of smoking and symptoms of asthma. The study involved 150 individuals and the result is given in the following table: Symptoms of Asthma Ever Smoke Total Yes No Yes 20 30 50 No 22 78 100 Total 42 108 150 14

Example… • The question is, is there association between smoking cigarettes and symptoms of

Example… • The question is, is there association between smoking cigarettes and symptoms of asthma at 0. 05 level of significance? Solution 1. Hypothesis Ø H 0: there is no association between smoking and symptoms of Asthma ØH 1: there is association between smoking and symptoms of Asthma 15

Example… 2. Critical Value (χ2α, df): α=0. 05 df= (2 -1)(21) =1 Then χ20.

Example… 2. Critical Value (χ2α, df): α=0. 05 df= (2 -1)(21) =1 Then χ20. 05, 1 = 3. 841 3. Test statistic: χ2 calc = Σ{ (Oi - ei)2 } / ei Ø 1 st compute the excepted values for each observed values, for each I observed value (oi) the expected I value (ei) is computed as; ei= (Ritotal x Citotal)/N • Let us do it in the given table 16

Example… Symptom s of Asthma Ever Smoke Yes Obser Expected ved Total No Obse

Example… Symptom s of Asthma Ever Smoke Yes Obser Expected ved Total No Obse Expected rved Yes 20 42*50/150= 30 14 108*50/150 = 36 No 22 42*100/150 78 = 28 108*100/150 100 = 72 Total 42 108 50 17

Example… Calculate • χ2 calc = Σ{ (Oi - ei)2 } / ei •

Example… Calculate • χ2 calc = Σ{ (Oi - ei)2 } / ei • • = (20 -14)2/14 + (22 -28) 2/28 + (30 -36) 2/36 + (78 -72) 2/72 = 2. 57+1. 28+1+0. 5= 5. 38 18

Example… 4. Decision rule: Since χ2 calc > χ2 tab i. e. 5. 38>3.

Example… 4. Decision rule: Since χ2 calc > χ2 tab i. e. 5. 38>3. 841, the null hypothesis is rejected. 5. Conclusion: Thus, there is association between smoking and symptoms of asthma. 19

Example… • P-value: P-value = P(χ2 ≥ χ2 calc, df) • P-value = P(χ2

Example… • P-value: P-value = P(χ2 ≥ χ2 calc, df) • P-value = P(χ2 ≥ 5. 38, 1) read on the table • P-value = 0. 05>P>0. 02 (in between), • since the P-value is less than the significant level of 0. 05, the null hypothesis is rejected. • What if the level of significance is 0. 01? ? ? 20

Example: • Question: Are the homicide rate and volume of gun sales related for

Example: • Question: Are the homicide rate and volume of gun sales related for a sample of 25 cities? HOMICIDE RATE GUN SALES Low High Totals High 8 5 13 Low 4 8 12 Totals 12 13 N = 25 • The bivariate table showing the relationship between homicide rate (columns) and gun sales (rows). This 2 x 2 table has 4 cells. 21

Practice it 22

Practice it 22