Lecture 7 Inferential Statistics Hypothesis Testing Preview n

Preview n Jason decides to sue the city after he was involved in a

What does Jason’s story have to do with statistics? n The primary application of

n Hypothesis testing is a statistical procedure that allows researchers to use sample data

To Start… Is the mean of my observed sample consistent with the known population

Example n Here’s what we know: – = 18 – M = 19 –

How do we decide: More than intuition n If the z-score falls outside the

Hypothesis Testing n Hypothesis Testing: a statistical procedure that allows researchers to use sample

Basic Logic of Hypothesis Testing (1) State a hypothesis about the population – Hypothesis:

Unknown Population Known population before treatment = 24 Treatment =4 Unknown population after treatment

Rules of Hypothesis Testing (1) STATE THE HYPOTHESIS about the unknown population mean. –

Rules of Hypothesis Testing (2) SET THE CRITERIA for a decision – Data will

Rules of Hypothesis Testing (3) COLLECT DATA and compute sample statistics – Select a

Rules of Hypothesis Testing (3) The z-score could also be expressed in words to

Rules of Hypothesis Testing (3) (4) MAKE A DECISION – Use the sample statistic

Results of Hypothesis Testing: Uncertainty and Error Hypothesis testing is an inferential process, so

Results of Hypothesis Testing: Uncertainty n If we reject the null hypothesis do we

Results of Hypothesis Testing: ERROR n Type I Error: null hypothesis is true, but

Results of Hypothesis Testing: ERROR n Type II Error: null hypothesis is false, but

Possible Outcomes of a Statistical Decision H 0 True Type I error Reject H

Let’s think about what happens as a result of our decision n What if

Let’s Try One Is the mean of my observed sample consistent with the known

Let’s Do One: Evidence (1) H 0 = new beings = 5. 5 H

In the Literature n Findings are said to be significant or statistically significant –

Try One… n. A psychologist examined the effect of chronic alcohol abuse on memory.

Directional Hypothesis Tests n So far, 2 -tailed or non-directional hypothesis tests (Most widely

Hypothesis for Directional Tests n STATE THE HYPOTHESIS – Still about the unknown population

Critical Region for Directional Tests n Critical Region is located entirely in one tail

Test statistic and Decision: for Directional Hypothesis Testing Step 3 and 4 of hypothesis

Let’s Try One n You are testing a new diet drug. Americans eat an

Answers (1) H 0 = diet drug > 2000 H 1 = diet drug

One vs. Two tails n Some contend two-tailed test is more rigorous and there

Statistical Power n Power - the probability that the test will correctly reject a

Statistical Power n Treatment size the larger the treatment size the greater the power

Assumptions for the Hypothesis Test with Z-scores n Random sampling – In order to

Assumptions for the Hypothesis Test with Z-scores n The value of is unchanged by

Criticisms about Hypothesis Testing (1) Hypothesis testing doesn’t tell you very much -- just

Criticisms about Hypothesis Testing (2) Idea of a null hypothesis is artificial. – Every

Criticisms about Hypothesis Testing (3) Anything is statistically significant with a large enough N

Criticisms about Hypothesis Testing (6) Hypothesis testing rejects the null to conclude that a

Measuring Effect Size To correct the fact that we don’t know “substantial” we report

Homework Chapter 8 n 1, 2, 7, 8, 11, 16, 18, 25, 26, 30

Slides: 42

Download presentation

Lecture 7 Inferential Statistics: Hypothesis Testing

Preview n Jason decides to sue the city after he was involved in a near fatal collision at Division St. and Trailridge Dr. Jason is claiming that his accident could have been prevented if the city had placed a stoplight at the intersection. It turns out that this intersection has had an abnormally high number of accidents in the past 5 years. The city is arguing that the number of accidents at this particular intersection is not abnormal and that there are no more accidents at this particular intersection than others in town.

What does Jason’s story have to do with statistics? n The primary application of inferential stats is to help researchers interpret their data. (1) Are the differences in the data due to chance? (2) Are the differences in the data something more than chance. n In the example with Jason (1) Are their an increased number of accidents due to external factors (lack of a stoplight not chance)? (2) Is the number of accidents at that particular intersection no more than chance?

n Hypothesis testing is a statistical procedure that allows researchers to use sample data to draw inferences about the population of interest – Is the mean of my observed sample consistent with the known population mean or did it come from some other distribution

To Start… Is the mean of my observed sample consistent with the known population mean or did it come from some other distribution? n We are given the following problem: n – There exists a sample of cars (some kind) – They get an mean MPG of 19 miles – Are they midsize cars (we can’t go look at them) n We know: – A midsize car gets 18 MPG – Is 19 different enough from 18 in this distribution – Or is it part of some other distribution

Example n Here’s what we know: – = 18 – M = 19 – M = 0. 4 z = (19 - 18) / 0. 4 z = +2. 5 p =. 0062 or PR =. 62% Z= M- M

How do we decide: More than intuition n If the z-score falls outside the middle of 95% of the curve, it must be from some other distributions (yesterday p<. 05 convention in psychology) n Main assumption: We assume that weird, unusual, or rare things don’t happen n If a score falls out into the 5% range we conclude that it “must be” from some other distribution. Less than %5 is rare enough

Hypothesis Testing n Hypothesis Testing: a statistical procedure that allows researchers to use sample data to draw inferences about a population – Use the concepts of: • z-scores • probability • distribution of sample means

Basic Logic of Hypothesis Testing (1) State a hypothesis about the population – Hypothesis: prediction about the relationship between variables; how IV affects DV – e. g. People who prefer Hagen Daaz ice cream will have a mean IQ that is higher than average at 130 (2) Use the hypothesis to predict the characteristics the sample should have (3) Obtain a random sample – random sampling: when all potential observations in the population have equal chances of being selected. – RANDOM: Survey every 100 th house from the list of all addresses in Tucson. – NOT RANDOM: Survey the internet users about access to new technology in their schools (4) Compare sample data with hypothesis – using a statistical test (today we’ll continue to use z-tests, but keep in mind that other statistical stats can be used in a similar fashion)

Unknown Population Known population before treatment = 24 Treatment =4 Unknown population after treatment =4 =? One basic assumption = If the treatment has any effect it is simply to add a constant amount to (or subtract a constant amount from) each individual’s score No change in shape of distribution or standard deviation Unknown pop. is just theoretical (we never administer a treatment to the entire pop. ), but we do have a real sample that represents the pop. , so this is what we use

Rules of Hypothesis Testing (1) STATE THE HYPOTHESIS about the unknown population mean. – null hypothesis: H 0 = statement that the treatment has no effect; IV has no effect on DV. – alternative hypothesis: H 1 = treatment had an effect on DV. Alternative hypothesis does not specify direction of change. It some cases it might be useful to specify (we’ll get to that). – NOTE: the null and alternative hypotheses are mutually exclusive and exhaustive. They can’t both be true. Example: on average the population remembers 7 words in a particular situation with a SD = 2. You test a new intervention designed to enhanced memory with 10 participants and find that they remember an average of 9 words. H 0 = new intervention = 7 H 1 = new intervention = 7

Rules of Hypothesis Testing (2) SET THE CRITERIA for a decision – Data will either support or refute the null hypothesis – Distribution gets divided into 2 sections: • Sample means that are likely if H 0 is true. • Sample means that are very unlikely if H 0 is true – Must set the boundaries that indicate the high-p samples from the low-p samples. • Level of significance or alpha level ( ) make critical region • Convention says =. 05 or 5%, but other commonly used alpha levels are. 01 (1%) and. 001 (0. 1%). • A z-score can mark the * Because the extreme 5% can be boundary set by alpha! split between 2 tails there is 2. 5% or. 025 in each tail (2 -tailed)

Rules of Hypothesis Testing (3) COLLECT DATA and compute sample statistics – Select a random sample from the population – NOTE: it is important to collect the data after stating the hypothesis and establishing the criteria in order to make an objective evaluation of the data – Compute a sample/ test statistic (today we are illustrating hypothesis testing through a sample statistic we already know z-scores). • We don’t know , so we make a hypothesis about the value and then plug it in to evaluate our hypothesis. Test statistic: sample data are converted to a single, statistic that is used to test the hypothesis. Z= M- M

Rules of Hypothesis Testing (3) The z-score could also be expressed in words to fit into the context of hypothesis testing and inferential stats: z = sample mean - hypothesized population mean standard error between X and In other words z = obtained difference / chance; if 0 we know null was true difference is not greater than chance; if >1 then due to more than just chance. But give rule #2, we know we want more than chance between 2 and 3 times chance. Example: on average the population remembers 7 words in a particular situation with a SD = 2. You test a new intervention designed to enhanced memory with 10 participants and find that they remember an average of 9 words. z = 9 - 7 / (2 /√ 10) z = 2 / 0. 63 z = 3. 16 Z= M- M

Rules of Hypothesis Testing (3) (4) MAKE A DECISION – Use the sample statistic (z-score) calculated in step 3 to make a decision about the null hypothesis • Reject the null hypothesis: if sample data fall in the critical region. Data collected demonstrates that the treatment really works. • Fail to reject the null hypothesis: sample data do not fall in the critical region. Data collected is not convincing, so you concluded there is currently not enough evidence. Example: on average the population remembers 7 words in a particular situation with a SD = 2. You test a new intervention designed to enhanced memory with 10 participants and find that they remember an average of 9 words. z = 3. 16; that is beyond the border of z = +/- 1. 96, so our data falls in the critical region. p <. 05. We reject the null hypothesis! Our memory enhancement technique works! We say our sample is statistically significant.

Results of Hypothesis Testing: Uncertainty and Error Hypothesis testing is an inferential process, so it uses limited info to reach a general conclusion about a sample representing a population. n Support for H 1 is indirect: n – We can’t prove the alternative hypothesis, we can only support it – Easier to show that null is false n Two types of errors: – Type II

Results of Hypothesis Testing: Uncertainty n If we reject the null hypothesis do we accept that the alternative hypothesis is true? – Almost, if we reject the null, we have strong support that the alternative is true. n If we do not reject the null hypothesis do we “accept” that the null is true? – NO!! There are lots of reasons for not rejecting the null hypothesis. If we reject we were only unable to find support for our alternative hypothesis • Often researchers run the experiment again, changing a few small elements in order to make their test more sensitive.

Results of Hypothesis Testing: ERROR n Type I Error: null hypothesis is true, but researcher rejects it. – Probability of a Type I error is equal to alpha Example: on average the population remembers 7 words in a particular situation with a SD = 2. You test a new intervention designed to enhanced memory with 10 participants and find that they remember an average of 9 words. Turns out that by chance we selected an extreme sample. Our sample has an average IQ of 130, so they are “smarter” than average. And prior research has shown that IQ is correlated with working memory, such that individuals with a higher IQ have a higher working memory. Our results are simply due to this confound of IQ. n Type I errors have serious implications – Likely that the research will report or publish these results. Other researcher may try to build theories or develop other experiments based on these false results. – Fortunately, we structure the hypothesis test to make this relatively unlikely. And the researcher gets to choose alpha!!

Results of Hypothesis Testing: ERROR n Type II Error: null hypothesis is false, but researcher fails to reject it. – Often happens when treatment effect is small OR variance is big – Impossible to determine a single exact probability value for a type II error. Depend on multiple factors. – Represented by Greek letter beta, n Consequences: – Not as serious as Type I. – Only means that one particular exp. does not show evidence for the alternative hypothesis. 2 choices: • Accept this outcome and assuming the effect is not worth pursuing • Repeat experiment with improvements

Possible Outcomes of a Statistical Decision H 0 True Type I error Reject H 0 Experimenter’s false start Data H 0 False correct (1 - ) Decision Accept H 0 correct (1 - ) Type II error miss

Let’s think about what happens as a result of our decision n What if we were looking to see if an individual were guilty of a crime? – Null hypothesis = the person is innocent there is no crime. – Type I error - rejecting the null when it is true • We send an innocent person to prison (false alarm) – Type II error - Not rejecting a false null hypothesis • We set a guilty person free (miss)

Let’s Try One Is the mean of my observed sample consistent with the known population mean or did it come from some other distribution? n • We are in a sci-fi film • There is a sample of beings (n = 5). • On average they are 8 feet tall. • Are they humans or nonhuman? Hypothesis Testing (1) State hypotheses. (2) Set criterion • We know that humans in sci-fi movies average 5. 5 feet tall with a SD = 1. 5. (3) Collect data (done) do a test stat • Is 8 different enough from 5. 5 to be in (4) Make a decision some other distribution?

Let’s Do One: Evidence (1) H 0 = new beings = 5. 5 H 1 = new beings = 5. 5 (2) Set criterion: =. 05, so. 05/2 =. 025 in each tail (2 -tailed). Critical z = +/- 1. 96. (3) Test statistic z = 8 - 5. 5 / (1. 5/√ 5) Z= M- z = 2. 5 /. 67 M z = 3. 73 (4) Decision - reject the null. The beings are not humans

In the Literature n Findings are said to be significant or statistically significant – The the height of the beings in the film is significantly different from the height of human beings, z = 3. 73, p <. 05 – There was no evidence the height of the beings in the film was different from the height of human beings, z =. 89, p <. 05 APA dictates that no 0 should be precede the decimal place n When using a statistical program report the exact p value, z = 2. 45, p =. 0142 n Scientific papers don’t report null or alternative hypotheses, but they are an imprt logic part of hypothesis testing n

Try One… n. A psychologist examined the effect of chronic alcohol abuse on memory. In this experiment a standardized memory test was used. Scores on this test for the general population form a normal distribution with = 50 and = 6. A sample of n = 22 alcohol abusers has a mean score of M = 47. Is there evidence for memory impairment among alcoholics? Use a criterion of alpha =. 01

Directional Hypothesis Tests n So far, 2 -tailed or non-directional hypothesis tests (Most widely accepted procedure for hypothesis testing) – Two-tailed or nondirectional test: regions of rejection are located in both tails of the distribution, alpha is divided n 1 -tailed or directional hypothesis tests: – One-tailed or directional test: region of rejection is located in just one tail of the distribution, alpha is not divided – The statistical hypotheses (H 0 and H 1) specify either an increase or decrease in the population mean score. – Researcher must begin with a specific prediction about the direction of the treatment effect a priori!!

Hypothesis for Directional Tests n STATE THE HYPOTHESIS – Still about the unknown population mean – Null hypothesis = H 0 = no effect – Alternative hypothesis H 1 = effect in a particular decision Example: A therapist is trying to find the best way to treat depression. She decides that mediation will boost mood in depressed people. The average Mood score for depressed people on a standardized test for depression is 25 with a SD = 5. The therapist predicts that mediation will boost the mood score. She take a sample of 10 depressed people and teaches them to practice mediation. The mean mood score for this sample is 28. H 0 = meditation 25 H 1 = meditation > 25

Critical Region for Directional Tests n Critical Region is located entirely in one tail of the distribution. – Good because…more sensitive to finding an effect if the predicted direction is correct. – Bad because…completely unable to find an effect if your predicted direction is wrong. * Critical z is less because we don’t have to divide our proportion (. 05) in half to account for both tails

Test statistic and Decision: for Directional Hypothesis Testing Step 3 and 4 of hypothesis testing are the same in directional tests. n But…we still need to finish our example. n Example: A therapist is trying to find the best way to treat depression. She decides that mediation will boost mood in depressed people. The average Mood score for depressed people on a standardized test for depression is 25 with a SD = 5. The therapist predicts that mediation will boost the mood score. She take a sample of 10 depressed people and teaches them to practice mediation. The mean mood score for this sample is 28. We reject the null hypothesis. z = 28 - 25/ (5/ √ 10) Meditation helps depressed people. Please note if we had decided to z = 3 / 1. 58 do a 2 -tailed test we would not have been able to reject the null. z = 1. 9 Why we decide a priori…

Let’s Try One n You are testing a new diet drug. Americans eat an average of 2000 calories per day with a SD of 500. You want to see if the drug decreases the amount of calories consumed. You take a sample of 10 people who eat an average of 1700 calories per day. Did your diet drug work?

Answers (1) H 0 = diet drug > 2000 H 1 = diet drug 2000 (2) Set alpha level at. 05, since this is a one-tailed test we can find the critical z-score for. 05 in the tail. Critical z = - 1. 65 (3) z = 1700 - 2000 / (500 / √ 10) z = -300 / 158. 11 z = -1. 89 (4) Reject the null hypothesis. Our diet drug works!

One vs. Two tails n Some contend two-tailed test is more rigorous and there for more convincing – Requires more evidence to reject the null n Others feel that one-tailed tests are better because they are more sensitive – More precise test specific hypotheses n In general 2 -tailed should be used when there is no strong directional expectation OR when there are two competing predictions – e. g. on theory predicts an increase of scores following treatment while another predicts a decrease n Never use 1 -tailed as a second attempt for significance

Statistical Power n Power - the probability that the test will correctly reject a false null hypothesis (1 - ). Or the likelihood that we will obtain sample data in the critical region. n The more powerful our statistical test the more readily we will detect a treatment effect when one really exists. Power

Statistical Power n Treatment size the larger the treatment size the greater the power n Alpha level - reducing alpha decreases power. n One-tailed v. two-tailed tests – One-tailed tests will cause a larger proportion of the treatment distribution to be in the critical region n Sample Size – Larger the sample the smaller the standard error, so the more separate the distributions

Assumptions for the Hypothesis Test with Z-scores n Random sampling – In order to generalize the findings from our sample to the population we need to select the sample randomly, so we don’t add bias n Independent observations – There can’t be a consistent and predictable relationship between 2 data points • e. g. in a coin toss even you if you just tossed the coin 4 times and all of those times got “head” there is still a 50% chance of getting heads the next time. • Gambler’s fallacy

Assumptions for the Hypothesis Test with Z-scores n The value of is unchanged by the treatment – This assumes that the treatment effect is constant and additive (or subtractive) – So, the mean should change, but not the standard deviation should not – This is a theoretical ideal. In actual experiments the treatment may varying a bit. n Normal sampling distribution - To evaluation z-scores (this will change slightly with other test statistics will introduce in the following days) we have to use the unit normal table and that requires a distribution of sample means that is normal n Violations of any of these assumption will invalidate the any results of our experiments!

Criticisms about Hypothesis Testing (1) Hypothesis testing doesn’t tell you very much -- just statistical significance and direction, not the size or location of the results. – It give us all or nothing information – What really is the difference between a zscore of 1. 88 (p <. 06) and 1. 96 (p <. 05)? – Some people respond that some criterion level has to be set, so that while. 05 is arbitrary, it is necessary.

Criticisms about Hypothesis Testing (2) Idea of a null hypothesis is artificial. – Every treatment must have some effect. – This means you can never have a Type I error. – But, because we are using an inferential process it is impossible to prove the alternative hypothesis true. We can only show that they null hypothesis is very unlikely.

Criticisms about Hypothesis Testing (3) Anything is statistically significant with a large enough N - H 0 is always false (4) Statistical significance can arise spuriously (e. g. , family-wise error, the tendency of measures from the same study to be related [“the crud factor”]). (5) Significance testing places too great an emphasis on Type I error and not enough on Type II error

Criticisms about Hypothesis Testing (6) Hypothesis testing rejects the null to conclude that a treatment has a significant effect, but it does not mean the treatment has a “substantial” effect Most important criticism…How do we speak to this?

Measuring Effect Size To correct the fact that we don’t know “substantial” we report effect size n Cohen’s d = mean difference / sd n Standardizes the mean diff. in terms of standard deviation (like z scores) n Diet drug example: n – x = 300 / 500 x =. 6 n (note sample size is not taken into account) Magnitude 0 < d <. 2 small effect (diff. less than. 2 sd). 2 < d <. 8 medium effect (diff. around. 5 sd) d >. 8 large effect ( diff. greater than. 8 sd)

Homework Chapter 8 n 1, 2, 7, 8, 11, 16, 18, 25, 26, 30