IS 4800 Empirical Research Methods for Information Science

  • Slides: 43
Download presentation
IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor:

IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof. Carole Hafner, 446 WVH hafner@ccs. neu. edu Tel: 617 -373 -5116 Course Web site: www. ccs. neu. edu/course/is 4800 sp 12/

Outline • • • Sampling and statistics (cont. ) T test for paired samples

Outline • • • Sampling and statistics (cont. ) T test for paired samples T test for independent means Analysis of Variance Two way analysis of Variance

Relationship Between Population and Samples When a Treatment Had No Effect 3

Relationship Between Population and Samples When a Treatment Had No Effect 3

Relationship Between Population and Samples When a Treatment Had An Effect 4

Relationship Between Population and Samples When a Treatment Had An Effect 4

Sampling Mean? Variance? Population m Sample of size N Mean values from all possible

Sampling Mean? Variance? Population m Sample of size N Mean values from all possible samples of size N aka “distribution of means” MM = m ZM = ( M - m ) /

Z tests and t-tests t is like Z: Z=M-μ/ t=M–μ/ μ = 0 for

Z tests and t-tests t is like Z: Z=M-μ/ t=M–μ/ μ = 0 for paired samples We use a stricter criterion (t) instead of Z because is based on an estimate of the population variance while is based on a known population variance. S 2 = Σ (X - M)2 = N– 1 SS N-1 S 2 M = S 2/N

T-test with paired samples Given info about population of change scores and the sample

T-test with paired samples Given info about population of change scores and the sample size we will be using (N) We can compute the distribution of means ? m=0 S 2 est s 2 from sample = SS/df Now, given a particular sample of change scores of size N and finally determine the probability that this mean occurred by chance S 2 M = S 2/N We compute its mean df = N-1

t test for independent samples Given two samples Estimate population variances (assume same) Estimate

t test for independent samples Given two samples Estimate population variances (assume same) Estimate variances of distributions of means Estimate variance of differences between means (mean = 0) This is now your comparison distribution

Estimating the Population Variance S 2 is an estimate of σ2 S 2 =

Estimating the Population Variance S 2 is an estimate of σ2 S 2 = SS/(N-1) for one sample (take sq root for S) For two independent samples – “pooled estimate”: S 2 = df 1/df. Total * S 12 + df 2/df. Total * S 22 df. Total = df 1 + df 2 = (N 1 -1) + (N 2 – 1) From this calculate variance of sample means: S 2 M = S 2/N needed to compute t statistic S 2 difference = S 2 Pooled / N 1 + S 2 Pooled / N 2

t test for independent samples, continued Distribution of differences between means This is your

t test for independent samples, continued Distribution of differences between means This is your comparison distribution NOT normal, is a ‘t’ distribution Shape changes depending on df df = (N 1 – 1) + (N 2 – 1) Compute t = (M 1 -M 2)/SDifference Determine if beyond cutoff score for test parameters (df, sig, tails) from lookup table.

ANOVA: When to use • Categorial IV numerical DV (same as t-test) • HOWEVER:

ANOVA: When to use • Categorial IV numerical DV (same as t-test) • HOWEVER: – There are more than 2 levels of IV so: – (M 1 – M 2) / Sm won’t work

ANOVA Assumptions • Populations are normal • Populations have equal variances • More or

ANOVA Assumptions • Populations are normal • Populations have equal variances • More or less. . 12

Basic Logic of ANOVA • Null hypothesis – Means of all groups are equal.

Basic Logic of ANOVA • Null hypothesis – Means of all groups are equal. • Test: do the means differ more than expected give the null hypothesis? • Terminology – Group = Condition = Cell 13

Accompanying Statistics • Experimental – Between-subjects • Single factor, N-level (for N>2) – One-way

Accompanying Statistics • Experimental – Between-subjects • Single factor, N-level (for N>2) – One-way Analysis of Variance (ANOVA) • Two factor, two-level (or more!) – Factorial Analysis of Variance – AKA N-way Analysis of Variance (for N IVs) – AKA N-factor ANOVA – Within-subjects • Repeated-measures ANOVA (not discussed) – AKA within-subjects ANOVA 14

ANOVA: Single factor, N-level (for N>2) • The Analysis of Variance is used when

ANOVA: Single factor, N-level (for N>2) • The Analysis of Variance is used when you have more than two groups in an experiment – The F-ratio is the statistic computed in an Analysis of Variance and is compared to critical values of F – The analysis of variance may be used with unequal sample size (weighted or unweighted means analysis) – When there are just 2 groups, ANOVA is equivalent to the t test for independent means 15

One-Way ANOVA – Assuming Null Hypothesis is True… Within-Group Estimate Of Population Variance Between-Group

One-Way ANOVA – Assuming Null Hypothesis is True… Within-Group Estimate Of Population Variance Between-Group Estimate Of Population Variance M 1 M 2 M 3

Justification for F statistic

Justification for F statistic

Calculating F

Calculating F

Example

Example

Example

Example

Using the F Statistic • Use a table for F(BDF, WDF) – And also

Using the F Statistic • Use a table for F(BDF, WDF) – And also α BDF = between-groups degrees of freedom = number of groups -1 WDF = within-groups degrees of freedom = Σ df for all groups = N – number of groups

One-way ANOVA in SPSS

One-way ANOVA in SPSS

Data Mean 23

Data Mean 23

Analyze/Compare Means/One Way ANOVA… 24

Analyze/Compare Means/One Way ANOVA… 24

SPSS Results… F(2, 21)=9. 442, p<. 05

SPSS Results… F(2, 21)=9. 442, p<. 05

Factorial Designs • Two or more nominal independent variables, each with two or more

Factorial Designs • Two or more nominal independent variables, each with two or more levels, and a numeric dependent variable. • Factorial ANOVA teases apart the contribution of each variable separately. • For N IVs, aka “N-way” ANOVA 26

Factorial Designs • Adding a second independent variable to a singlefactor design results in

Factorial Designs • Adding a second independent variable to a singlefactor design results in a FACTORIAL DESIGN • Two components can be assessed – The MAIN EFFECT of each independent variable • The separate effect of each independent variable • Analogous to separate experiments involving those variables – The INTERACTION between independent variables • When the effect of one independent variable changes over levels of a second • Or– when the effect of one variable depends on the level of the other variable. 27

Example Wait Time Sign in Student Center vs. No Sign Satisfaction

Example Wait Time Sign in Student Center vs. No Sign Satisfaction

Example of An Interaction - Student Center Sign – 2 Genders x 2 Sign

Example of An Interaction - Student Center Sign – 2 Genders x 2 Sign Conditions F M No Sign

Two-way ANOVA in SPSS 30

Two-way ANOVA in SPSS 30

Analyze/General Linear Model/Univariate 31

Analyze/General Linear Model/Univariate 31

Results 32

Results 32

Results 33

Results 33

Degrees of Freedom • df for between-group variance estimates for main effects – Number

Degrees of Freedom • df for between-group variance estimates for main effects – Number of levels – 1 • df for between-group variance estimates for interaction effect – Total num cells – df for both main effects – 1 – e. g. 2 x 2 => 4 – (1+1) – 1 = 1 • df for within-group variance estimate – Sum of df for each cell = N – num cells • Report: “F(bet-group, within-group)=F, Sig. ” 34

Publication format N=24, 2 x 3=6 cells => df Training. Days=2, df within-group variance=24

Publication format N=24, 2 x 3=6 cells => df Training. Days=2, df within-group variance=24 -6=18 => F(2, 18)=7. 20, p<. 05

Reporting rule • IF you have a significant interaction • THEN – If 2

Reporting rule • IF you have a significant interaction • THEN – If 2 x 2 study: do not report main effects, even if significant – Else: must look at patterns of means in cells to determine whether to report main effects or not. 36

Results? Sig. 0. 34 0. 12 0. 41 Training. Days Trainer Training. Days *

Results? Sig. 0. 34 0. 12 0. 41 Training. Days Trainer Training. Days * Trainer n. s.

Results? Training. Days Trainer Training. Days * Trainer Sig. 0. 34 0. 12 0.

Results? Training. Days Trainer Training. Days * Trainer Sig. 0. 34 0. 12 0. 02 Significant interaction between Training. Days And Trainer, F(2, 22)=. 584, p<. 05

Results? Training. Days Trainer Training. Days * Trainer Sig. 0. 34 0. 02 0.

Results? Training. Days Trainer Training. Days * Trainer Sig. 0. 34 0. 02 0. 41 Main effect of Trainer, F(1, 22)=. 001, p<. 05

Results? Training. Days Trainer Training. Days * Trainer Sig. 0. 04 0. 12 0.

Results? Training. Days Trainer Training. Days * Trainer Sig. 0. 04 0. 12 0. 01 Significant interaction between Training. Days And Trainer, F(2, 22)=. 584, p<. 05 Do not report Training. Days as significant

Results? Training. Days Trainer Training. Days * Trainer Sig. 0. 04 0. 02 0.

Results? Training. Days Trainer Training. Days * Trainer Sig. 0. 04 0. 02 0. 41 Main effects for both Training. Days, F(2, 22)=7. 20, p<. 05, and Trainer, F(1, 22)=. 001, p<. 05

“Factorial Design” • Not all cells in your design need to be tested –

“Factorial Design” • Not all cells in your design need to be tested – But if they are, it is a “full factorial design”, and you do a “full factorial ANOVA” Real-Time Retrospective Agent Ö Ö Text Ö X

Higher-Order Factorial Designs • More than two independent variables are included in a higher-order

Higher-Order Factorial Designs • More than two independent variables are included in a higher-order factorial design – As factors are added, the complexity of the experimental design increases • The number of possible main effects and interactions increases • The number of subjects required increases • The volume of materials and amount of time needed to complete the experiment increases 43