Effect Size and Power Effect Size and Power

  • Slides: 20
Download presentation
Effect Size and Power

Effect Size and Power

Effect Size and Power ► Two things mentioned previously: § P-values are heavily influenced

Effect Size and Power ► Two things mentioned previously: § P-values are heavily influenced by sample size (n) § Statistics Commandment #1: P-values are silent on the strength of the relationship between two variables ► Effect size is what tells you about this, and we will discuss this today, in more detail ► Don’t forget, if you haven’t already, read Cohen’s (1992) Power Primer § It’s only five pages long, simply-worded, and the best article in statistics you’ll ever read

Effect Size and Power ► P-values are influence heavily by n 2 4 6

Effect Size and Power ► P-values are influence heavily by n 2 4 6 8 10 § So heavily influenced, in fact, that with enough people anything is significant (a Type I Error) § Ex: Data with two samples, and N=10 § Group 1 mean = 6, s = 3. 16 § Group 2 mean = 7, s = 3. 16 § t = -. 5, p =. 63 We would fail to reject Ho 3 5 7 9 11

Effect Size and Power ► Take same data, but multiply Nx 20 (N =

Effect Size and Power ► Take same data, but multiply Nx 20 (N = 200) § § § Group 1 mean still = 6, s still = 3. 16 Group 2 mean still = 7, s still = 3. 16 But now t = -2. 46, p =. 02 We would reject Ho 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 Etc…

Effect Size and Power I said before, with enough n, anything is significant ►

Effect Size and Power I said before, with enough n, anything is significant ► As § Because p-values don’t say anything about the size of your effect, you can have two groups that are almost identical (like in our example) that your statistics say are significant § P-values just say how likely it is that if you took another sample, that you’d get the same result – the results from big samples are stable, as we’d expect

Effect Size and Power ► Therefore, we need something to report in addition to

Effect Size and Power ► Therefore, we need something to report in addition to p-values that are less influenced by n, and can say something about the size of our IV’s effect § In the previous example, we have a low p-value, but our IV had little effect, because both of our groups (both with it and without it) had almost the same mean score ► Jacob Cohen to the rescue! § Cohen and others have been pointing out this flaw in exclusively using p-based statistics for decades and psychologists and medical research are only beginning to catch on – most research still only reports p-values

Effect Size and Power (and others) championed the use of Effect Size statistics that

Effect Size and Power (and others) championed the use of Effect Size statistics that provide us with this information, and are not influenced by sample size ► Cohen § Effect Size: the strength of the effect that our IV had on our DV ► There is no one formula for effect size, depending on your data, there are many different formulas, and many different statistics (see the Cohen article) – they all take the general form

Effect Size and Power § Ex. The effect size estimate for the Independent. Samples

Effect Size and Power § Ex. The effect size estimate for the Independent. Samples T-Test is: § This looks a lot like our formula for z, and is interpreted similarly § D-hat = the number of standard deviations mean 1 is from mean 2 – just like z was interpreted as the number of standard deviations our score fell from the mean

Effect Size and Power ► Interpreting Effect Size: § How do we know when

Effect Size and Power ► Interpreting Effect Size: § How do we know when our effect size is large? ► 1. Prior Research – if previous research investigating an educational intervention for low-income kids only increases their grades in school by. 5 standard deviations and your does so by 1 s, you can say this is a large effect (~twice as large, to be exact) ► 2. Theoretical Prediction – if we’re developing a treatment for Borderline Personality Disorder, theory behind this disorder says that it’s stable across time and therefore difficult to treat, so we may only look for a medium effect size before we declare success

Effect Size and Power ► Interpreting Effect Size: § How do we know when

Effect Size and Power ► Interpreting Effect Size: § How do we know when our effect size is large? ► 3. Practical Considerations – if our treatment has the potential to benefit a lot of people inexpensively, even if it only helps a little (i. e. a small effect), this may be significant § I. e. the average effect size for using aspirin to treat heart disease is small, but since it is inexpensive and easily implemented, and can therefore help many people (even if only a little), this is an important finding § Fun Fact – the GRE predicts GPA in graduate school in psychology at an effect size of only r =. 15 (which is small), but is still used because there are no better standardized tests available

Effect Size and Power ► Interpreting Effect Size: § How do we know when

Effect Size and Power ► Interpreting Effect Size: § How do we know when our effect size is large? ► 4. Tradition/Convention – when your research is novel and exploratory in nature (i. e. there is little prior research or theory to guide your expectations), we need an alternative to these methods § Cohen has devised standard conventions for large, medium, and small effects for the various effect size statistics (see the Cohen article) § However, what is large for one effect size statistics IS NOT NECESSARILY large for another ► Ex. r =. 5 corresponds to a large effect size, but d =. 5 only corresponds to a medium effect

Effect Size and Power ► Take Home Messages: § 1. Interpreting effect size statistics

Effect Size and Power ► Take Home Messages: § 1. Interpreting effect size statistics requires detailed knowledge about your experiment ► Without any knowledge of how an effect size statistic was obtained, if someone asks: “Is an r =. 25 a large effect? ”, your answer should be: “It depends…”. § 2. When reporting effect size, you CANNOT say: “My effect size was. 05, and so was large”, because different effect size statistics have different conventions for small to large values ► Even David Barlow, a world-renowned expert on the treatment of anxiety disorders in his book The Clinical Handbook of Psychological Disorders made this mistake

Effect Size and Power like with too large a sample anything is significant, with

Effect Size and Power like with too large a sample anything is significant, with too small a sample nothing is significant ► Just § This refers to the probability of a Type II Error (β), incorrectly failing to reject Ho (AKA rejecting H 1 ) ► How do we determine what sample size is therefore neither too large, nor too small?

Effect Size and Power § We try to maximize power (1 – β), which

Effect Size and Power § We try to maximize power (1 – β), which is the reverse of a Type II Error (β) ►Type II Error = incorrectly failing to reject Ho ( when it is false) ; Power = correctly rejecting Ho (when it is false) ► How do we maximize power? § 1. Increase Type I Error (α) ►This is problematic for obvious reasons – we don’t want to decrease making one type of error for another if we can help it

Effect Size and Power ► How do we maximize power? § 2. Increase Effect

Effect Size and Power ► How do we maximize power? § 2. Increase Effect Size ► We accomplish this by trying to make our IV as potent as possible, or choose a weak control group § I. e. Comparing our treatment to an alternative treatment will result in a lower effect size than if we compare it to no treatment § 3. Increase n or decrease s ► Remember: in our statistical tests we are dividing by the standard error (s/√n) – decreasing s makes this number smaller, as does increasing n – dividing by a smaller number gives us a larger value of z or t, which results in an increased chance of rejecting Ho

Effect Size and Power ► What is good power? § Statistical convention says that

Effect Size and Power ► What is good power? § Statistical convention says that power =. 8 is a good value that minimizes both Type I and Type II Error ► Power =. 80 20% chance of making Type II Error § Before we conduct our experiment, i. e. a priori, we need to do what is called a Power Analysis that tells us what sample size will give us our needed power ► You can download a program called G*Power from the internet that does these calculations for you ► You type in the kind of test you’re doing (remember how tests can be more or less “powerful”), your alpha, the power you want, and the effect size you expect, and it gives you the sample size you’d need ► Other programs also do this, like Power and Precision, but G*Power is free ► Find it at: http: //www. psycho. uni-duesseldorf. de/aap/projects/gpower/

Effect Size and Power ► You can also do the calculations by hand (see

Effect Size and Power ► You can also do the calculations by hand (see the textbook) § However, understanding the concept of effect size and power is more important than knowing how to calculate it by hand, and since I don’t want to overwhelm you guys, you won’t be tested on these calculations (you can skip Secs. 8. 3 - 8. 5)

Effect Size and Power ► What is good power? § Power Analysis ►Involves estimating

Effect Size and Power ► What is good power? § Power Analysis ►Involves estimating a predicted effect size ahead of time ►Prediction based on interpretation guidelines: § § Prior Research Theory Practical Considerations Convention

Effect Size and Power ► How does effect size add to interpretation of study

Effect Size and Power ► How does effect size add to interpretation of study results over-and-above p-values? P-value/E. s High Low High IV had a strong and reliable effect on DV IV had a weak effect on DV inflated by large n Low IV had a strong effect on DV, but too low n to detect it/IV had a strong effect of unknown reliability IV had a weak effect on DV

Effect Size and Power ► Retrospective Power § SPSS provides an estimate of power

Effect Size and Power ► Retrospective Power § SPSS provides an estimate of power given the p-value and effect size obtained and sample size used § Tempting to interpret low power as indication that too few subjects were used to detect the effect obtained § Recall though that this information inferred directly from p-value and e. s. , which are used to calculate power estimates add nothing to interpretation of p-values and e. s. ► Retrospective