Statistical Hypotheses Testing Stat 700 Lectures Hypothesis Testing

  • Slides: 43
Download presentation
Statistical Hypotheses Testing Stat 700 Lectures Hypothesis Testing Week of 11/06/200 Hypotheses Testing

Statistical Hypotheses Testing Stat 700 Lectures Hypothesis Testing Week of 11/06/200 Hypotheses Testing

Overview of this Lecture n n n n The problem of hypotheses testing Elements

Overview of this Lecture n n n n The problem of hypotheses testing Elements and logic of hypotheses testing (hypotheses, decision rule, one- and two-tailed tests, significance level, Type I and Type II errors, power of test, implications of the decision, p-values) Steps in performing a hypotheses test Large-sample test for the population mean Two-sample tests for the population means Large-sample test for the population proportion Two-sample tests for the population proportions Week of 11/06/200 Hypotheses Testing 2

The problem of hypotheses testing n n Statement of the Problem: Given a population

The problem of hypotheses testing n n Statement of the Problem: Given a population (equivalently a distribution) with a parameter of interest, , (which could be the mean, variance, standard deviation, proportion, etc. ), we would like to decide/choose between two complementary statements concerning . These statements are called statistical hypotheses. The choice or decision between these hypotheses is to be based on a sample data taken from the population of interest. The ideal goal is to be able to choose the hypothesis that is true in reality based on the sample data. Week of 11/06/200 Hypotheses Testing 3

Some Situations where Hypotheses Testing is Relevant n n n Example: A drug manufacturer

Some Situations where Hypotheses Testing is Relevant n n n Example: A drug manufacturer would like to compare a newly developed pill for eliminating migraine headaches relative to a standard drug. Such a comparison is to be done by comparing the mean time to cessation of headache after taking the pill. Let denote the mean time to headache cessation after taking the new pill. If 0 is the mean time to headache cessation for the standard drug, then the manufacturer would like to decide between: Statement 1 (Null): > 0 (new drug is not better) Statement 2 (Alternative): < 0 (new drug is better) Week of 11/06/200 Hypotheses Testing 4

Some Situations … n n n Example: A medical researcher would like to compare

Some Situations … n n n Example: A medical researcher would like to compare the effectiveness of two treatments (for example, chemotherapy versus radiation-based) for a particular type of cancer, with the effectiveness being measured in terms of the five-year survival rate of patients. If p 1 denotes the proportion of patients surviving 5 years which were treated with chemotherapy, and p 2 is the survival proportion for those treated with radiation, then the researcher would like to decide between: Statement 1: p 1 < p 2; Statement 2: p 1 > p 2. Week of 11/06/200 Hypotheses Testing 5

Some Situations. . . n n Example: The Food and Drug Administration would like

Some Situations. . . n n Example: The Food and Drug Administration would like to check that the amount of an active ingredient of a certain substance in a certain type of medication is as specified in the label. If is the mean amount of this substance, then the FDA would like to decide between the statements: Statement 1 (Null): = 0, where 0 is the specified amount; Statement 2 (Alternative): 0. This is an example of a two-sided hypothesis since it indicates that either < 0 or > 0. Week of 11/06/200 Hypotheses Testing 6

Elements and Logic of Statistical Hypotheses Testing n n Consider a population or distribution

Elements and Logic of Statistical Hypotheses Testing n n Consider a population or distribution whose mean is . To introduce the elements and discuss the logic of hypotheses testing, we consider the problem of deciding whether = 0, where 0 is a pre-specified value, or 0. This is the type of problem that the FDA might be interested. The first step in hypotheses testing, which should be done before you gather your sample data, is to set up your statistical hypotheses, which are the null hypothesis (H 0) and the alternative hypothesis (H 1). Week of 11/06/200 Hypotheses Testing 7

The Statistical Hypotheses n The null hypothesis, H 0, is usually the hypothesis that

The Statistical Hypotheses n The null hypothesis, H 0, is usually the hypothesis that corresponds to the status quo, the standard, the desired level/amount, or it represents the statement of “no difference. ” n The alternative hypothesis, H 1, on the other hand, is the complement of H 0, and is typically the statement that the researcher would like to prove or verify. n These hypotheses are usually set-up in such a way that deciding in favor of H 1 when in fact H 0 is the true statement will not be a desirable outcome. Week of 11/06/200 Hypotheses Testing 8

An Analogy to Remember n n Setting the null and alternative hypotheses has an

An Analogy to Remember n n Setting the null and alternative hypotheses has an analog in the justice system where the defendant is “presumed innocent” until “proven guilty. ” In the court system, the null hypothesis corresponds to the defendant being innocent (this is the status quo, the standard, etc. ). The alternative hypothesis, on the other hand, is that the defendant is guilty. Note that it is very difficult to reject the null (convict the defendant), and only “a proof (based on good evidence) beyond a reasonable doubt” will warrant rejection of H 0. Week of 11/06/200 Hypotheses Testing 9

The Hypotheses in our Problem n For the problem we are considering, the appropriate

The Hypotheses in our Problem n For the problem we are considering, the appropriate hypotheses will be: n H 0: = 0 H 1: 0. n n Another word of caution: It is not proper for a researcher to set up the hypotheses after seeing the sample data; however, a data maybe used to generate a hypotheses, but to test these generated hypotheses you should gather a new set of sample data! Week of 11/06/200 Hypotheses Testing 10

Determine the Type of Sample Data that will be Gathered n The second step

Determine the Type of Sample Data that will be Gathered n The second step is to determine what kind of sample data you will be gathering. Is it a simple random sample? A stratified sample? n For the moment we will assume that a simple random sample of size n will be obtained, so the data will be representable by X 1, X 2, …, Xn, with n > 30. n Also, determine if you know the population standard deviation . We assume for the moment that we do. Week of 11/06/200 Hypotheses Testing 11

The Decision Rule n n n The decision rule is the procedure that states

The Decision Rule n n n The decision rule is the procedure that states when the null hypothesis, H 0, will be rejected on the basis of the sample data. To specify the decision rule, one specifies a test statistic, which is a quantity that is computed from the sample data, and whose sampling distribution under H 0 is known or can be determined. Such a statistic measures the agreement of the sample data with the null hypothesis specification. For our problem, a logical choice for the test statistic is: Week of 11/06/200 Hypotheses Testing 12

The Test Statistic: n n The latter is a reasonable choice since it measures

The Test Statistic: n n The latter is a reasonable choice since it measures how far the sample mean is from the population mean under H 0. The larger the value of |Zc| the more it will indicate that H 0 is not true. Furthermore, under H 0, by virtue of the Central Limit Theorem, the sampling distribution of Zc will be approximately standard normal. Week of 11/06/200 Hypotheses Testing 13

When to Reject H 0 and its Consequences n n n Having decided which

When to Reject H 0 and its Consequences n n n Having decided which test statistic to use, the next step is to specify the precise situation in which to reject H 0. We have said that it is logical to reject H 0 if the absolute value of Zc is large. But how “large” is “large”? For the moment, let us specify a critical value, denoted by C, such that if n |Zc| > C then H 0 will be rejected. Before deciding on the value of C, let us examine the consequences of our decision rule. Week of 11/06/200 Hypotheses Testing 14

Possible Errors of Decision n Remember at this stage that either H 0 is

Possible Errors of Decision n Remember at this stage that either H 0 is correct, or H 1 is correct. Thus, there is a “true state of reality, ” but this state is not known to us (otherwise we wouldn’t be performing a test). n On the other hand, our decision on whether to reject H 0 will only be based on partial information, which is the sample data. n We may therefore represent in a table the possible combinations of “states of reality” and “decision based on the sample” as follows: Week of 11/06/200 Hypotheses Testing 15

States of Reality and Decisions Made n n In decision-making, there is therefore the

States of Reality and Decisions Made n n In decision-making, there is therefore the possibility of committing an error, which could either be an error of Type I or an error of Type II. Which of these two types of error is more serious? ? Week of 11/06/200 Hypotheses Testing 16

Assessing the Two Types of Errors n n n From the table in the

Assessing the Two Types of Errors n n n From the table in the preceding slide, we have: Type I error: committed when H 0 is rejected when in reality it is true. Type II error: committed when H 0 is not rejected when in reality it is false. Just like in the court trial alluded to earlier, an error of Type I is considered to be a more serious type of error (“convicting an innocent man”). Therefore, we try to minimize the probability of committing the Type I error. Week of 11/06/200 Hypotheses Testing 17

Setting the Probability of a Type I Error n n n In trying to

Setting the Probability of a Type I Error n n n In trying to minimize, however, the probability of a Type I error, we encounter an obstacle in that the probabilities of the Type I and Type II errors are inversely related. Thus, if we try to make the probability of a Type I error very, very small, then it will make the probability of a Type II error quite large. As a compromise we therefore specify a maximum tolerable Type I error probability, called the significance level, and denoted by , and choose the critical value C such that the probability of a Type I error is (at most) equal to . This is conventionally set to 0. 10, 0. 05, or 0. 01. Week of 11/06/200 Hypotheses Testing 18

Determining the Critical Value, C n n n Let us now determine the critical

Determining the Critical Value, C n n n Let us now determine the critical value C in our test. Recall that our test will reject H 0 if |Zc| > C. By definition, P{Type I error} = P{reject H 0 | H 0 is true} = P{|Zc| > C | H 0 is true}. But, under H 0, Zc is distributed as standard normal, so if we want P{Type I error} = , then we should choose the critical value C to be: C = Z /2, which is the value such that P{Z > Z /2} = /2. Week of 11/06/200 Hypotheses Testing 19

The Resulting Decision Rule n Given a significance level of , for testing the

The Resulting Decision Rule n Given a significance level of , for testing the null hypothesis H 0: = 0 versus the alternative hypothesis H 1: 0, the appropriate test statistic, under the assumptions that (a) is known, and (b) n > 30 is given by: Week of 11/06/200 Hypotheses Testing 20

Data Gathering and Making the Decision n Having specified the final decision rule, the

Data Gathering and Making the Decision n Having specified the final decision rule, the next step is to gather the sample data and to compute the sample mean and the value of Zc. n If |Zc| > z /2 then H 0 is rejected; otherwise, we say that we “fail to reject H 0. ” n Note: If is not known, then we could replace it in the formula of Zc by the sample standard deviation S. n The final step is to make the relevant conclusion. Week of 11/06/200 Hypotheses Testing 21

On the Conclusion that One Could Make n The final step in performing a

On the Conclusion that One Could Make n The final step in performing a statistical test of hypotheses is to make the conclusion relevant to the particular study, that is, not to simply say that “H 0 is rejected” or “H 0 is not rejected. ” n When H 0 is rejected, then either that a correct decision has been made, or an error of Type I has been committed. But since we have controlled the probability of committing a Type I error (set to , which we could tolerate), then we can conclude in this case that H 0 is not true, and hence that H 1 is correct. Week of 11/06/200 Hypotheses Testing 22

On Conclusions … continued n n n On the other hand, if we did

On Conclusions … continued n n n On the other hand, if we did not reject H 0, then either we are making the correct decision, or we are making a Type II error. However, since we did not control for the Type II error probability (when we set the Type I error probability to be , we “closed our eyes to the probability of a Type II error”), if we do not reject H 0, we cannot conclude that H 0 is true. Rather, we could only say that we “failed to reject H 0 on the basis of the available data. ” This is the basis of the saying that: “you can never prove a theory, you can only disprove it. ” Week of 11/06/200 Hypotheses Testing 23

Recapitulation: Steps in Hypotheses Testing n n n Step 1: Formulate your null and

Recapitulation: Steps in Hypotheses Testing n n n Step 1: Formulate your null and alternative hypotheses. Step 2: Determine the type of sample you will be getting with regards to sample size, knowledge of the standard deviation, etc. Step 3: Specify your level of significance. Step 4: State precisely your decision rule. Step 5: Gather your sample data and compute the test statistic. Step 6: Decide and make final conclusions. Week of 11/06/200 Hypotheses Testing 24

The p-Value Approach n Another approach to making the decision in hypotheses testing is

The p-Value Approach n Another approach to making the decision in hypotheses testing is to compute the p-value associated with the observed value of the test statistic. n By definition, the p-value is the probability of getting the observed value or more extreme values of the test statistic under H 0. n In our situation, the p-value would then be: p-value = P{|Z| > |zc|} where zc is the observed value of the test statistic. n Week of 11/06/200 Hypotheses Testing 25

Deciding Based on the p-Value n n If the p-value exceed 0. 10, then

Deciding Based on the p-Value n n If the p-value exceed 0. 10, then H 0 is not rejected and we say that the result is not significant. If the p-value is between 0. 10 and 0. 05, we usually say that the result is almost significant or tending towards significance. If the p-value is between 0. 05 and 0. 01, we reject H 0 and conclude that the result is significant. If the p-value is less than 0. 01 then H 0 is rejected and conclude that the result is highly significant. Week of 11/06/200 Hypotheses Testing 26

On the Sensitivity of a Test n Ideally, we would like our test procedure

On the Sensitivity of a Test n Ideally, we would like our test procedure to always produce the correct decision. However, this is not possible if the decision is based only on sample data. n To measure the sensitivity of a test under the alternative hypothesis, we can compute its power, which is the probability of rejecting H 0 under the alternative hypothesis. n That is, Power of Test at 1 = P{reject H 0 | = 1}. This function could be plotted and can be used to determine the appropriate sample size. Week of 11/06/200 Hypotheses Testing 27

Some Concrete Problems n Situation: The mean yield of corn in the US is

Some Concrete Problems n Situation: The mean yield of corn in the US is about 120 bushels per acre. A survey of 40 farmers this year gives a sample mean yield of 123. 8 bushels per acre. We want to know whether this is good evidence that the national mean this year is not 120 bushels per acre. Assume that the farmers surveyed are an SRS from the population of all commercial corn growers and that the standard deviation of the yield in this population is = 10 bushels per acre. Test H 0: = 120 versus H 1: 120 at 5% level of significance. n Solution: Because H 1 is a two-sided hypothesis and Week of 11/06/200 Hypotheses Testing 28

Solution … continued n n n Level of significance is = 0. 05, then

Solution … continued n n n Level of significance is = 0. 05, then the appropriate decision rule is: Reject H 0 if |Zc| > z. 025 = 1. 96, where the test statistic is Zc = (Xbar - 0)/( /n 1/2). From the given information, the value of this test statistic is Zc = (123. 8 - 120)/[10/401/2] = 2. 4033. Since this value is larger than the critical value of 1. 96, then our decision is to reject H 0 at 5% significance level. We can therefore conclude at the 5% level that the mean yield of corn for this year is different from the usual mean yield of 120 bushels per acre. Week of 11/06/200 Hypotheses Testing 29

P-value Approach Illustrated n n n Recall that the p-value is the probability, under

P-value Approach Illustrated n n n Recall that the p-value is the probability, under H 0, of getting the observed value of the test statistic or more extreme values. For our problem, we therefore have: p-value = P{|Z| > 2. 4033} = 0. 0162. Based on this value we could reject H 0 at the 5% level, but not at the 1% level. Another interpretation of the p-value of 0. 0162 is that it is the smallest level of significance at which H 0 can be rejected. Let us also examine the power of our test. Week of 11/06/200 Hypotheses Testing 30

Power of the Test n Let us denote by ( 1) the power of

Power of the Test n Let us denote by ( 1) the power of the test when the value of the true value of the mean is 1. Thus, Week of 11/06/200 Hypotheses Testing 31

Power … continued n n Substituting 0 = 120, = 10, and n =

Power … continued n n Substituting 0 = 120, = 10, and n = 40 into the above expression, we can then calculate the value of ( 1) for different values of 1. The values of 1 and ( 1) could then be plotted. This plot is given in the next slide. Week of 11/06/200 Hypotheses Testing 32

Plot of the Power Function Week of 11/06/200 Hypotheses Testing 33

Plot of the Power Function Week of 11/06/200 Hypotheses Testing 33

Problems. . . n n Situation: The Survey of Study Habits and Attitudes (SSHA)

Problems. . . n n Situation: The Survey of Study Habits and Attitudes (SSHA) is a psychological test that measures the motivation, attitude toward school, and study habits of students. Scores range from 0 to 200. The mean score for US college students is about 115, and the standard deviation is about 30. A teacher who suspects that older students have better attitudes toward school gives the SSHA to 20 students who are at least 30 years of age. Their mean score is 135. 2. Assume that = 30. Perform a test of H 0: = 115 versus H 1: > 115 using the p-value approach. Solution: To be done in class. Week of 11/06/200 Hypotheses Testing 34

Some Comments on Assumptions n n n The testing procedure we developed here required

Some Comments on Assumptions n n n The testing procedure we developed here required two assumptions: (a) sample size is at least 30; (b) population standard deviation is known. Assumption (b) is not crucial since could be replaced by S in the formula for Zc. When assumption (a) is not satisfied, then we need to be able to assume that the population is normal and we need to know the population standard deviation. If is not known, we will need to use the tdistribution, which will be discussed next week. Week of 11/06/200 Hypotheses Testing 35

Concrete Problems for Testing Two Means n Question of Interest: Does cocaine use by

Concrete Problems for Testing Two Means n Question of Interest: Does cocaine use by pregnant women cause their babies to have low birth weight? n Hypothesis: – H 0: Mean birth weight of babies of cocaine users is greater than or equal to the mean birth weight of babies from non-cocaine users. Symbolically, 1 > 2. – H 1: 1 < 2. Week of 11/06/200 Hypotheses Testing 36

Data of the Study n Data Gathering Performed: Birth weights (measured in grams) of

Data of the Study n Data Gathering Performed: Birth weights (measured in grams) of babies of women who tested positive for cocaine/crack during a drug-screening test were compared with the birth weights for women who either tested negative or were not tested, a group called “other. ” Below is the summary statistics for the two samples. Week of 11/06/200 Hypotheses Testing 37

Problems … continued n n n Study Question: Is the mean hemoglobin level among

Problems … continued n n n Study Question: Is the mean hemoglobin level among breast-fed babies higher than those fed with standard baby formula without iron supplements? What are the appropriate hypotheses? Situation: A study of iron deficiency among infants compared the samples of infants following different feeding regimens. One group contained breast-fed infants, while the children in another group were fed a standard baby formula without any iron supplements. A summary of the blood hemoglobin levels at 12 months of age is presented in the following table. Week of 11/06/200 Hypotheses Testing 38

Summary of the Data from Study n n The appropriate test will be done

Summary of the Data from Study n n The appropriate test will be done in class. What conclusions could be made? What assumptions are needed for the test to be valid? What if the standard deviations that were provided were actually the sample standard deviations? Week of 11/06/200 Hypotheses Testing 39

Tests of a Population Proportion n n Situation: A peony plant with red petals

Tests of a Population Proportion n n Situation: A peony plant with red petals was crossed with another plant having streaky petals. A geneticist states that 75% of the offspring resulting from this cross will have red flowers. To test this claim, 100 seeds from this cross were collected and germinated and 58 plants had red petals. What hypotheses are being tested? Does the observed data contradict the geneticist’s claim? The test will be done in class. Week of 11/06/200 Hypotheses Testing 40

Testing Differences of Two Population Proportions n n Situation: A clinical trial examined the

Testing Differences of Two Population Proportions n n Situation: A clinical trial examined the effectiveness of aspirin in the treatment of cerebral ischemia (stroke). Patients were randomized into treatment and control groups. The study was double-blind in the sense that neither the patients nor physicians who evaluated the patients knew which patients received aspirin and which received the placebo tablet. After 6 months of treatment, the attending physicians evaluated each patient’s progress as either favorable or unfavorable. Week of 11/06/200 Hypotheses Testing 41

Continued. . . n n n Of the 78 patients in the aspirin group,

Continued. . . n n n Of the 78 patients in the aspirin group, 63 had favorable outcomes; 43 of the 77 control (placebo) patients had favorable outcomes. Source: William S. Fields, et al (1977), “Controlled trial of aspirin in cerebral ischemia, ” Stroke, 8, 301315. What hypotheses are being tested? The hypotheses test will be performed in class. What conclusions could be made based on this data? Week of 11/06/200 Hypotheses Testing 42

Another Problem n n Situation: Gastric freezing was once a recommended treatment for ulcers

Another Problem n n Situation: Gastric freezing was once a recommended treatment for ulcers in the upper intestine. A randomized comparative experiment found that 28 of the 82 patients who were subjected to gastric freezing improved, while 30 of the 78 patients in the control group improved. Based on this information, test for the hypothesis of “no difference” for the two populations. By the way, what will be the relevant populations in this study? The test will be done in class. Week of 11/06/200 Hypotheses Testing 43