9 4 Inferences from Two Dependent Populations College

  • Slides: 47
Download presentation
9 -4: Inferences from Two Dependent Populations College Prep Stats

9 -4: Inferences from Two Dependent Populations College Prep Stats

Learning Targets In this section we develop methods for testing hypotheses and constructing confidence

Learning Targets In this section we develop methods for testing hypotheses and constructing confidence intervals involving the mean of the differences of the values from two dependent populations. With dependent samples, there is some relationship whereby each value in one sample is paired with a corresponding value in the other sample.

Learning Targets Because the hypothesis test and confidence interval use the same distribution and

Learning Targets Because the hypothesis test and confidence interval use the same distribution and standard error, they are equivalent in the sense that they result in the same conclusions. Consequently, the null hypothesis that the mean difference equals 0 can be tested by determining whether the confidence interval includes 0. There are no exact procedures for dealing with dependent samples, but the t distribution serves as a reasonably good approximation, so the following methods are commonly used.

Dependent Populations • Specific situation in which each data value from one population is

Dependent Populations • Specific situation in which each data value from one population is paired with another data value from the other population – Guarantees that the two populations are the same size – Considers the difference of the pairs to create a special new “difference population” • Instead of looking at each population separately

Notation for Dependent Samples d = µd = mean value of the differences d

Notation for Dependent Samples d = µd = mean value of the differences d for the population of paired data d = mean value of the differences d for the paired sample data (equal to the mean of the x – y values) sd = standard deviation of the differences d for the paired sample data n = number of pairs of data. individual difference between the two values of a single matched pair

Requirements 1. The sample data are dependent. 2. The samples are simple random samples.

Requirements 1. The sample data are dependent. 2. The samples are simple random samples. 3. Either or both of these conditions is satisfied: The number of pairs of sample data is large (n > 30) or the pairs of values have differences that are from a population having a distribution that is approximately normal.

Hypothesis Test Statistic for Matched Pairs t= d – µd sd n where degrees

Hypothesis Test Statistic for Matched Pairs t= d – µd sd n where degrees of freedom = n – 1

Test Statistic • Test Statistic (Tells you distribution for pvalue) The average difference between

Test Statistic • Test Statistic (Tells you distribution for pvalue) The average difference between pairs (statistic) The average difference between pairs (parameter) Sample standard deviation of the differences from sample Comes from hypotheses The size of the two samples (remember they are each the same size)

P-values and Critical Values Use Calculator or Table A-3 (t-distribution).

P-values and Critical Values Use Calculator or Table A-3 (t-distribution).

Confidence Intervals for Matched Pairs d – E < µd < d + E

Confidence Intervals for Matched Pairs d – E < µd < d + E where E = t /2 sd n Critical values of tα/2 : Use Calculator or Table A-3 with n – 1 degrees of freedom.

Method • The average difference between pairs (parameter) 0 unless given a specific difference

Method • The average difference between pairs (parameter) 0 unless given a specific difference to consider

How to find parts of test statistic •

How to find parts of test statistic •

P-values and Critical Values P-values: Use the tcdf feature on your calculator, with degrees

P-values and Critical Values P-values: Use the tcdf feature on your calculator, with degrees of freedom (df) = n – 1. For right-tailed tests: *To find this probability in your calculator, type: tcdf(t test statistic, 9999, df) For left-tailed tests: *To find this probability in your calculator, type: tcdf(– 9999999, t test statistic , df) ***Don’t forget if your test is two-sided, double your P-value. *** Critical Values: Use the inv. T feature on your calculator, with degrees of freedom (df) = n – 1. For right-tailed tests: *α is in the right tail, inv. T(1 – α, df) For left-tailed tests: *α is in the left tail, inv. T(α, df) For two-tailed tests: *α is split evenly in the two tails, inv. T(α/2, df) (remember that this t* value is ±, not the negative that the calculator gives you)

Example 1: Data Set 3 in Appendix B includes measured weights of college students

Example 1: Data Set 3 in Appendix B includes measured weights of college students in September and April of their freshman year. Table 9 -1 lists a small portion of those sample values. (Here we use only a small portion of the available data so that we can better illustrate the method of hypothesis testing. ) Use the sample data in Table 9 -1 with a 0. 05 significance level to test the claim that for the population of students, the mean change in weight from September to April is equal to 0 kg.

Example 1: Step 1: Requirements are satisfied: samples are dependent, values paired from each

Example 1: Step 1: Requirements are satisfied: samples are dependent, values paired from each student; although a volunteer study, we’ll proceed as if simple random sample and deal with this in the interpretation; calculator displays a histogram that is approximately normal.

Example 1: Weight gained = April weight – Sept. weight d denotes the mean

Example 1: Weight gained = April weight – Sept. weight d denotes the mean of the “April – Sept. ” differences in weight; the claim is d = 0 kg Step 2: H 0: d = 0 kg (original claim) H 1: d ≠ 0 kg Step 3: significance level is = 0. 05 Step 4: use the student t distribution to calculate the test statistic

Example 1: Step 4: find the test statistic Calculator: df = n – 1

Example 1: Step 4: find the test statistic Calculator: df = n – 1 = 4, total area in two tails is 0. 05, yields two critical value t = ± 2. 7764. Table A-3: df = n – 1 = 4, total area in two tails is 0. 05, yields two critical value t = ± 2. 776.

Example 1: Step 5: find P-values of test statistic. tcdf(0. 1873, 9999, 4) =

Example 1: Step 5: find P-values of test statistic. tcdf(0. 1873, 9999, 4) = 0. 4303 P-value = 0. 4303 *2 = 0. 8606 > 0. 05 Because the P-value is greater than the significance level, fail to reject the null hypothesis: H 0: d = 0 kg

Example 1: Step 5: (Critical Value Method) Because the test statistic does not fall

Example 1: Step 5: (Critical Value Method) Because the test statistic does not fall in the critical region, we fail to reject the null hypothesis.

Example 1: Step 6: We conclude that there is not sufficient evidence to warrant

Example 1: Step 6: We conclude that there is not sufficient evidence to warrant rejection of the claim that for the population of students, the mean change in weight from September to April is equal to 0 kg. Based on the sample results listed in Table 9 -1, there does not appear to be a significant weight gain from September to April.

Example 1: More Notes The conclusion should be qualified with the limitations noted in

Example 1: More Notes The conclusion should be qualified with the limitations noted in the article about the study. The requirement of a simple random sample is not satisfied, because only Rutgers students were used. Also, the study subjects are volunteers, so there is a potential for a selfselection bias. In the article describing the study, the authors cited these limitations and stated that “Researchers should conduct additional studies to better characterize dietary or activity patterns that predict weight gain among young adults who enter college or enter the workforce during this critical period in their lives. ”

Example 1: More Notes The P-value method: Using technology, we can find the P-value

Example 1: More Notes The P-value method: Using technology, we can find the P-value of 0. 8605. (Using Table A-3 with the test statistic of t = 0. 1873 and 4 degrees of freedom, we can determine that the P-value is greater than 0. 20. ) We again fail to reject the null hypothesis, because the P-value is greater than the significance level of = 0. 05.

How to Find Your P-Value < tcdf(– 9999, t test statistic, df) > tcdf(t

How to Find Your P-Value < tcdf(– 9999, t test statistic, df) > tcdf(t test statistic, 9999, df) ≠, t positive Double tcdf(t test statistic, 99999, df) ≠, t negative Double tcdf(-99999, t test statistics, df)

Example 2 • A sample of Freshman at Rutgers University were weighed in both

Example 2 • A sample of Freshman at Rutgers University were weighed in both August and May to determine if the “freshmen 15” is true. The data values are paired with August weight and May weight. Test the claim (at 0. 05 significance) that the amount of weight gained is equal to 15 pounds (6. 8 kg)

Data for Example 2 (kg’s) August May 72 59 67 66 97 86 56

Data for Example 2 (kg’s) August May 72 59 67 66 97 86 56 55 74 69 70 68 93 88 61 60 68 64 53 52 59 55 92 92 64 60 57 58 56 53 67 67 70 68 58 56 49 50 50 47 68 68 71 69 69 69

Example 2: Step 1: Requirement check passed. Weight gained = May – August weight

Example 2: Step 1: Requirement check passed. Weight gained = May – August weight d denotes the mean of the “May – Aug. ” differences in weight; the claim is d = 15 kg Step 2: If original claim is not true, we have d ≠ 15 kg Step 2: H 0: d = 15 kg original claim H 1: d ≠ 15 kg Step 3: significance level is = 0. 05 Step 4: find the test statistic

Example 2: Step 5: find P-values of test statistic. tcdf(– 9999, – 25. 2426,

Example 2: Step 5: find P-values of test statistic. tcdf(– 9999, – 25. 2426, 23) 0 P-value 0 * 2 = 0 < 0. 05

Example 2: Step 5: Because the P-value is less than the significance level, reject

Example 2: Step 5: Because the P-value is less than the significance level, reject the null hypothesis: H 0: d = 15 kg Step 6: We conclude that there is sufficient evidence to reject the claim that the amount of weight gained is equal to 15 pounds

Your Turn: The following table shows the weights of nine subjects before and after

Your Turn: The following table shows the weights of nine subjects before and after following a particular diet for two months. You wish to test the claim that the diet is effective in helping people lose weight at the 0. 01 significance level. a) State the null hypothesis and the alternative hypothesis. H 0: μd = 0 H 1: μd < 0

c) Find the critical value(s). inv. T(0. 01, 8) = – 2. 8965 d)

c) Find the critical value(s). inv. T(0. 01, 8) = – 2. 8965 d) Find the test statistic. e) Find the p-value. P-value = tcdf(– 9999, – 3. 156, 8) = 0. 0067, Reject H 0. f) What is the conclusion? There is sufficient evidence to support the claim that the diet is effective in helping people lose weight.

Your Tuen: A farmer has decided to use a new additive to grow his

Your Tuen: A farmer has decided to use a new additive to grow his crops. He divided his farm into 10 plots and kept records of the corn yield (in bushels) before and after using the additive. The results are shown below. The farmer wants to test the new additive is effective at the 0. 01 level of significance. a) State the null hypothesis and the alternative hypothesis. H 0: μd = 0 H 1: μd > 0

c) Find the critical value(s). inv. T(1 – 0. 01, 9) = 2. 821

c) Find the critical value(s). inv. T(1 – 0. 01, 9) = 2. 821 d) Find the test statistic. e) Find the p-value. P-value = tcdf(5. 0142, 9999, 9) 0. 0004, Reject H 0. f) What is the conclusion? There is sufficient evidence to support that the new additive is effective.

Example 3 • Are best actresses younger than best actors? • At the 0.

Example 3 • Are best actresses younger than best actors? • At the 0. 05 significance level, test the claim that best actresses are younger than best actors. The age at the time of winning the award is given paired with the other winner from the same year.

Data for Example 3 • Actress: 28, 32, 27, 26, 24, 25, 29, 41,

Data for Example 3 • Actress: 28, 32, 27, 26, 24, 25, 29, 41, 40, 27, 42, 33, 21, 35 • Actor: 62, 41, 52, 41, 34, 40, 56, 41, 39, 48, 56, 42, 62, 29 • Assume that the pairs of ages have differences that are from a population having a distribution that is approximately normal.

Example 3: Step 1: Requirements are satisfied. Age difference = Actor age – Actress

Example 3: Step 1: Requirements are satisfied. Age difference = Actor age – Actress age, d denotes the mean of the age differences between actors and actresses, the claim is d = 0 Step 2: H 0: d = 0 yr. H 1: d > 0 yr. (original claim) Step 3: significance level is = 0. 05 Step 4: find the test statistic

Example 3:

Example 3:

Example 3: Step 5: find P-values of test statistic. tcdf(4. 7121, 9999, 14) =

Example 3: Step 5: find P-values of test statistic. tcdf(4. 7121, 9999, 14) = 0. 00017 P-value = 0. 00017 < 0. 05 Because the P-value is less than the significance level, reject the null hypothesis: H 0: d = 0 yr. Step 6: We conclude that there is sufficient evidence to support the claim that best actresses are younger than best actors at the time of winning the award.

Confidence Intervals for Matched Pairs

Confidence Intervals for Matched Pairs

Example 4: Example 1 data Construct a 95% confidence interval estimate of d ,

Example 4: Example 1 data Construct a 95% confidence interval estimate of d , which is the mean of the “April– September” weight differences of college students in their freshman year. = 0. 2, sd = 2. 4, n = 5, t /2 = 2. 7764 Find the margin of error, E

Confidence Intervals for Matched Pairsinterval: Construct the confidence We have 95% confidence that the

Confidence Intervals for Matched Pairsinterval: Construct the confidence We have 95% confidence that the limits of 2. 8 kg and 3. 2 kg contain the true value of the mean weight change from September to April. In the long run, 95% of such samples will lead to confidence interval limits that actually do contain the true population mean of the differences.

Example 5: Using the same paired data from Example 2, construct a 98% confidence

Example 5: Using the same paired data from Example 2, construct a 98% confidence interval estimate of µd, which is the mean of the weight differences for a particular diet. Remember that your critical value must be POSITIVE!! t /2 = inv. T(1 – 0. 01, 8) = 2. 8965 (– 9. 8889 – 9. 0756, – 9. 8889 + 9. 0756) (– 18. 9645, – 0. 8133)

Your Turn Is there more tobacco use than alcohol use in Disney movies? •

Your Turn Is there more tobacco use than alcohol use in Disney movies? • Below is paired data the amount of seconds in which tobacco or alcohol is used in selected Disney films (from the same movie). At 0. 05 significance, test the claim that there is more tobacco use than alcohol use. Tobacco 176 51 0 299 74 2 23 205 6 155 Alcohol 88 33 113 51 0 3 46 73 5 74 • Assume requirements are all satisfied.

Your Turn Step 1: Time difference = Tobacco time – Alcohol time, d denotes

Your Turn Step 1: Time difference = Tobacco time – Alcohol time, d denotes the mean of the time differences between tobacco and alcohol in second. Step 2: H 0: d = 0 sec. H 1: d > 0 sec. (original claim) Step 3: significance level is = 0. 05 Step 4: find the test statistic

Your Turn

Your Turn

Your Turn Step 5: find P-values of test statistic. tcdf(1. 6258, 9999, 9) =

Your Turn Step 5: find P-values of test statistic. tcdf(1. 6258, 9999, 9) = 0. 0692 P-value = 0. 0692 > 0. 05 Because the P-value is greater than the significance level, fail to reject the null hypothesis: H 0: d = 0 sec. Step 6: We conclude that there is not sufficient evidence to support the claim that tobacco use is longer than alcohol use in the Disney movie.

Recap In this section we have discussed: v Requirements for inferences from matched pairs.

Recap In this section we have discussed: v Requirements for inferences from matched pairs. v Notation. v Hypothesis test. v Confidence intervals.

Homework • P. 495: #14, 15, 18 a

Homework • P. 495: #14, 15, 18 a