Virtual COMSATS Inferential Statistics Lecture21 Ossam Chohan Assistant

  • Slides: 28
Download presentation
Virtual COMSATS Inferential Statistics Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad 1

Virtual COMSATS Inferential Statistics Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad 1

Recap of last lecture • In our last sessions, we worked on: – Overview

Recap of last lecture • In our last sessions, we worked on: – Overview of statistical inferences – Hypothesis testing introduction. – Hypothesis Testing for paired observations. – Hypothesis Testing for variances. 2

Objective of lecture-21 • In this lecture, we will understand problems related to: –

Objective of lecture-21 • In this lecture, we will understand problems related to: – Test for independence. – Goodness of fit. – Test for homogeneity. – Fisher’s Exact test. 3

Test for Independence • The first type of chi square test which will be

Test for Independence • The first type of chi square test which will be examined is the chi square test for independence of two variables. • The second type of chi square test is the goodness of fit test. This is a test which makes a statement or claim concerning the nature of the distribution for the whole population. 4

Discussion on Independence 5

Discussion on Independence 5

Chi-square test for Independence • Independence (statistical): the absence of association between two cross-tabulated

Chi-square test for Independence • Independence (statistical): the absence of association between two cross-tabulated variables. • The chi square test for independence of two variables is a test which uses a cross classification table to examine the nature of the relationship between the variables. • These tables are sometimes referred to as contingency tables. • This test examines whether the observed pattern between the variables in the table is strong enough to show that the two variables are dependent on each other or not. 6

Chi Square Test of Independence • A measure of association similar to the correlations

Chi Square Test of Independence • A measure of association similar to the correlations we studied in ITS course. • Pearson and Spearman are not applicable if the data are at the nominal level of measurement. • Chi Square is used for nominal data placed in a contingency table. – A contingency table is a two-way table showing the contingency between two variables where the variables have been classified into mutually exclusive categories and the cell entries are frequencies. 7

The Null Hypothesis • The Null Hypothesis under investigation in the Chi Square Test

The Null Hypothesis • The Null Hypothesis under investigation in the Chi Square Test of Independence is that the two variables are independent (not related). • For example, the Null Hypothesis is that there is NO relationship between smoking and lungs cancer or pollution and asthma is independent. 8

 • The only limitation on the use of this test is that the

• The only limitation on the use of this test is that the sample sizes must be sufficiently large to ensure that the expected number of cases in each category is five or more. 9

STEPS 1. Specification of hypothesis: The test for independence of two variables X and

STEPS 1. Specification of hypothesis: The test for independence of two variables X and Y begins by assuming that there is no relationship between the two variables. The alternative hypothesis states that there is some relationship between the two variables. H 0 : X and Y are independent H 1 : X and Y are dependent 10

2. Level of Significance: α will be given. If not, then assume it to

2. Level of Significance: α will be given. If not, then assume it to be 5%. 3. Test statistics: The chi square statistic is defined as With 4. Calculations 5. Critical region Reject Ho if = (r-1) (c-1) 6. Conclusion 11

Coefficient of Contingency • The Contingency Coefficient is a measure of the degree of

Coefficient of Contingency • The Contingency Coefficient is a measure of the degree of relationship, association of dependence of the classifications in the frequency table. • The larger the value of this coefficient, the greater the degree of association. • The maximum value of the coefficient, is never greater than 1. • When C=0, there is complete independence. 12

Problem-18 • Suppose that we wish to determine whether the opinions of the voting

Problem-18 • Suppose that we wish to determine whether the opinions of the voting residents of the KPK province concerning a new tax reforms are independent of their level of income. A random sample of 1000 registered voters from the KPK are classified as to whether they are in a low, medium, or high income bracket and whether or not they favor a new tax reforms. The observed frequencies are presented in following table 13

Problem-18 Tax Reforms Low Medium High Total In Favor 182 213 203 598 Against

Problem-18 Tax Reforms Low Medium High Total In Favor 182 213 203 598 Against 154 138 110 402 Total 336 351 313 1000 14

Problem-18 Solution 15

Problem-18 Solution 15

Problem-18 Solution 16

Problem-18 Solution 16

Assessment Problem-15 • Four hundred and ninety-two Hypertensive patients were categorized with respect to

Assessment Problem-15 • Four hundred and ninety-two Hypertensive patients were categorized with respect to their eating habits. Hypertension levels were L 1, L 2, L 3 and eating habits were less than balance diet, more than balance diet, abrupt diet. The data are presented in following contingency table. 17

Assessment Problem-15 Cont… Categories L 1 L 2 L 3 Total Less than balance

Assessment Problem-15 Cont… Categories L 1 L 2 L 3 Total Less than balance diet 24 83 17 124 Balance diet 11 62 28 101 More than balance diet 32 121 34 187 Abrupt diet 10 26 44 80 Total 77 292 123 Discuss the association between the two criteria of classification, i. e, the levels and eating habits. 18

Assessment Problem-16 A survey sampling showing a cross classification of gender by class is

Assessment Problem-16 A survey sampling showing a cross classification of gender by class is given. If variable X is the gender of the respondent, and variable Y is the social class of the respondent, use the chi square test of independence to determine if variables X and Y are independent of each other. Use the 0. 05 level of significance. 19

X (gender) Y (Social class) Male (M) Female (F) Total Upper middle (A) 33

X (gender) Y (Social class) Male (M) Female (F) Total Upper middle (A) 33 29 62 Middle (B) 153 181 334 Working (C) 103 81 184 Lower (D) 16 14 30 Total 305 610 20

Assessment Problem-17 • Table gives the distribution of the opinions of 214 PC supporters

Assessment Problem-17 • Table gives the distribution of the opinions of 214 PC supporters and 53 Liberal supporters in the Edmonton study. That is, respondents were asked their view concerning the opinion “Unemployment is high because trade unions have priced their members out of a job. " Respondents gave their answers on a 7 point scale, with 1 being strongly disagree and 7 being strongly agree. Use the data in this table to test whether political preference and opinion are independent of each other or not. Use the 0. 01 level of significance. 21

Political preference Opinion PC Liberal Total 1 9 3 12 2 7 5 12

Political preference Opinion PC Liberal Total 1 9 3 12 2 7 5 12 3 7 11 18 4 28 3 31 5 51 12 63 6 54 7 61 7 58 12 70 Total 214 53 267 22

Special case of contingency tables The above test statistics is special case of contingency

Special case of contingency tables The above test statistics is special case of contingency tables where contingency table is having 2*2 dimensions. What about degrees of freedom? 23

Problem-19 On one of the islands the hotel chains had facilities at two different

Problem-19 On one of the islands the hotel chains had facilities at two different locations. Following table gives responses to the single question, “are you likely to chose this hotel again” At the 0. 05 LOS, is there any evidence of a significant difference in guest satisfaction (as measured by the likelihood to return to the hotel) between the two hotels? 24

Chose hotel again Beachcomber Windsurfer Total Yes 163 154 317 No 64 108 172

Chose hotel again Beachcomber Windsurfer Total Yes 163 154 317 No 64 108 172 Total 227 262 489 25

Problem-19 Solution 26

Problem-19 Solution 26

Assessment problem-18 • A public opinion poll surveyed a simple random sample of 1000

Assessment problem-18 • A public opinion poll surveyed a simple random sample of 1000 voters. Respondents were classified by gender (male or female) and by voting preference (Republican, Democrat, or Independent). Results are shown in the contingency table below. • Is there a gender gap? Do the men's voting preferences differ significantly from the women's preferences? Use a 0. 05 level of significance. 27

Voting preference Republican Democrat Independent Row total Male 200 150 50 400 Female 250

Voting preference Republican Democrat Independent Row total Male 200 150 50 400 Female 250 300 50 600 Column total 450 1000 28