S 519 Evaluation of Information Systems Social Statistics

  • Slides: 23
Download presentation
S 519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 15: Chi-square

S 519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 15: Chi-square

Last week l Linear regression l l Slope Intercept

Last week l Linear regression l l Slope Intercept

This week l l l What is chi-square CHIDIST Non-parameteric statistics

This week l l l What is chi-square CHIDIST Non-parameteric statistics

Parametric statistics l A main branch of statistics l l Assuming data with a

Parametric statistics l A main branch of statistics l l Assuming data with a type of probability distribution (e. g. normal distribution) Making inferences about the parameters of the distribution (e. g. sample size, factors in the test) Assumption: the sample is large enough to represent the population (e. g. sample size around 30). They are not distribution-free (they require a probability distribution)

Nonparametric statistics l Nonparametric statistics (distribution-free statistics) l l Do not rely on assumptions

Nonparametric statistics l Nonparametric statistics (distribution-free statistics) l l Do not rely on assumptions that the data are drawn from a given probability distribution (data model is not specified). It was widely used for studying populations that take on a ranked order (e. g. movie reviews from one to four stars, opinions about hotel ranking). Fits for ordinal data. It makes less assumption. Therefore it can be applied in situations where less is known about the application. It might require to draw conclusion on a larger sample size with the same degree of confidence comparing with parametric statistics.

Nonparametric statistics l Nonparametric statistics (distribution-free statistics) l Data with frequencies or percentage l

Nonparametric statistics l Nonparametric statistics (distribution-free statistics) l Data with frequencies or percentage l l Number of kids in difference grades The percentage of people receiving social security

One-sample chi-square l One-sample chi-square includes only one dimension l l l Whether the

One-sample chi-square l One-sample chi-square includes only one dimension l l l Whether the number of respondents is equally distributed across all levels of education. Whether the voting for the school voucher has a pattern of preference. Two-sample chi-square includes two dimensions l Whether preference for the school voucher is independent of political party affiliation and gender

Compute chi-square One-sample chi-square test O: the observed frequency E: the expected frequency

Compute chi-square One-sample chi-square test O: the observed frequency E: the expected frequency

Example Question: Whether the number of respondents is equally distributed across all opinions One-sample

Example Question: Whether the number of respondents is equally distributed across all opinions One-sample chi-square Preference for School Voucher for maybe against 23 17 total 50 90

Chi-square steps l Step 1: a statement of null and research hypothesis There is

Chi-square steps l Step 1: a statement of null and research hypothesis There is no difference in the frequency or proportion in each category There is difference in the frequency or proportion in each category

Chi-square steps l Step 2: setting the level of risk (or the level of

Chi-square steps l Step 2: setting the level of risk (or the level of significance or Type I error) associated with the null hypothesis l 0. 05

Chi-square steps l Step 3: selection of proper test statistic l Frequency nonparametric procedures

Chi-square steps l Step 3: selection of proper test statistic l Frequency nonparametric procedures chisquare

Chi-square steps l Step 4. Computation of the test statistic value (called the obtained

Chi-square steps l Step 4. Computation of the test statistic value (called the obtained value) observed expected category frequency (O) frequency (E) for 23 maybe 17 against 50 Total 90 D(difference) 30 30 30 90 (O-E)2 7 13 20 (O-E)2/E 49 169 400 1. 63 5. 63 13. 33 20. 60

Chi-square steps l Step 5: Determination of the value needed for rejection of the

Chi-square steps l Step 5: Determination of the value needed for rejection of the null hypothesis using the appropriate table of critical values for the particular statistic l l Table B 5 df=r-1 (r= number of categories) If the obtained value > the critical value reject the null hypothesis If the obtained value < the critical value accept the null hypothesis

Chi-square steps l Step 6: a comparison of the obtained value and the critical

Chi-square steps l Step 6: a comparison of the obtained value and the critical value is made l 20. 6 and 5. 99

Chi-square steps l Step 7 and 8: decision time l What is your conclusion,

Chi-square steps l Step 7 and 8: decision time l What is your conclusion, why and how to interpret?

Another example l We’ll settle the age-old debate of whether people can actually detect

Another example l We’ll settle the age-old debate of whether people can actually detect their favorite cola based solely on taste. For 30 coke-lovers, I blindfold them, and have them sample 3 colas…is there a true difference, or are these preference differences explainable by chance?

Hypothesis l l Null: There are no preferences: The population is divided evenly among

Hypothesis l l Null: There are no preferences: The population is divided evenly among the brands Alternate: There are preferences: The population is not divided evenly among the brands

Chance Model l l df = C -1 = 3 -1 = 2, set

Chance Model l l df = C -1 = 3 -1 = 2, set α =. 05 For df = 2, X 2 -crit = 5. 99

Calculate Chi-Square observed expected category frequency (O) frequency (E) Coke 13 Pepsi 9 RC

Calculate Chi-Square observed expected category frequency (O) frequency (E) Coke 13 Pepsi 9 RC Cola 8 Total 30 D(difference) 10 10 10 30 (O-E)2 3 1 2 (O-E)2/E 9 1 4 0. 9 0. 1 0. 4 1. 4

Decision and Conclusion l l Conclude that the preferences are evenly divided among the

Decision and Conclusion l l Conclude that the preferences are evenly divided among the colas when the logos are removed.

Excel functions l CHIDIST (x, degree of freedom) l CHIDIST(20. 6, 2) l l

Excel functions l CHIDIST (x, degree of freedom) l CHIDIST(20. 6, 2) l l 3. 36331 E-05<0. 05 CHIDIST(1. 40, 2) l 0. 496585308>0. 05

More non parametric statistics l Table 15. 1 (P 297)

More non parametric statistics l Table 15. 1 (P 297)