10162021 Parametric Statistical Inference Instructor Ron S Kenett

  • Slides: 37
Download presentation
10/16/2021 Parametric Statistical Inference Instructor: Ron S. Kenett Email: ron@kpa. co. il Course Website:

10/16/2021 Parametric Statistical Inference Instructor: Ron S. Kenett Email: ron@kpa. co. il Course Website: www. kpa. co. il/biostat Course textbook: MODERN INDUSTRIAL STATISTICS, Kenett and Zacks, Duxbury Press, 1998 (c) 2001, Ron S. Kenett, Ph. D. 1

10/16/2021 Course Syllabus • Understanding Variability • Variability in Several Dimensions • Basic Models

10/16/2021 Course Syllabus • Understanding Variability • Variability in Several Dimensions • Basic Models of Probability • Sampling for Estimation of Population Quantities • Parametric Statistical Inference • Computer Intensive Techniques • Multiple Linear Regression • Statistical Process Control • Design of Experiments (c) 2001, Ron S. Kenett, Ph. D. 2

Definitions n Null Hypotheses n n 10/16/2021 H 0: Put here what is typical

Definitions n Null Hypotheses n n 10/16/2021 H 0: Put here what is typical of the population, a term that characterizes “business as usual” where nothing out of the ordinary occurs. Alternative Hypotheses n H 1: Put here what is the challenge, the view of some characteristic of the population that, if it were true, would trigger some new action, some change in procedures that had previously defined “business as usual. ” (c) 2001, Ron S. Kenett, Ph. D. 3

10/16/2021 The Logic of Hypothesis Testing n Step 1. A claim is made. n

10/16/2021 The Logic of Hypothesis Testing n Step 1. A claim is made. n A new claim is asserted that challenges existing thoughts about a population characteristic. n (c) 2001, Ron S. Kenett, Ph. D. Suggestion: Form the alternative hypothesis first, since it embodies the challenge. 4

10/16/2021 The Logic of Hypothesis Testing n Step 2. How much error are you

10/16/2021 The Logic of Hypothesis Testing n Step 2. How much error are you willing to accept? (c) 2001, Ron S. Kenett, Ph. D. n Select the maximum acceptable error, a. The decision maker must elect how much error he/she is willing to accept in making an inference about the population. The significance level of the test is the maximum probability that the null hypothesis will be rejected incorrectly, a Type I error. 5

10/16/2021 The Logic of Hypothesis Testing n n Step 3. If the null hypothesis

10/16/2021 The Logic of Hypothesis Testing n n Step 3. If the null hypothesis were true, what would you expect to see? (c) 2001, Ron S. Kenett, Ph. D. Assume the null hypothesis is true. This is a very powerful statement. The test is always referenced to the null hypothesis. Form the rejection region, the areas in which the decision maker is willing to reject the presumption of the null hypothesis. 6

10/16/2021 The Logic of Hypothesis Testing n Step 4. What did you actually see?

10/16/2021 The Logic of Hypothesis Testing n Step 4. What did you actually see? (c) 2001, Ron S. Kenett, Ph. D. n Compute the sample statistic. The sample provides a set of data that serves as a window to the population. The decision maker computes the sample statistic and calculates how far the sample statistic differs from the presumed distribution that is established by the null hypothesis. 7

10/16/2021 The Logic of Hypothesis Testing n Step 5. n The decision is a

10/16/2021 The Logic of Hypothesis Testing n Step 5. n The decision is a conclusion supported Make the by evidence. The decision maker will: n reject the null hypothesis if the sample decision. n (c) 2001, Ron S. Kenett, Ph. D. evidence is so strong, the sample statistic so unlikely, that the decision maker is convinced H 1 must be true. fail to reject the null hypothesis if the sample statistic falls in the nonrejection region. In this case, the decision maker is not concluding the null hypothesis is true, only that there is insufficient evidence to dispute it based on this sample. 8

10/16/2021 The Logic of Hypothesis Testing n Step 6. What are the implications of

10/16/2021 The Logic of Hypothesis Testing n Step 6. What are the implications of the decision for future actions? (c) 2001, Ron S. Kenett, Ph. D. n State what the decision means in terms of the research program. The decision maker must draw out the implications of the decision. Is there some action triggered, some change implied? What recommendations might be extended for future attempts to test similar hypotheses? 9

Two Types of Errors n Type I Error: n n n 10/16/2021 Saying you

Two Types of Errors n Type I Error: n n n 10/16/2021 Saying you reject H 0 when it really is true. Rejecting a true H 0. Type II Error: n n Saying you do not reject H 0 when it really is false. Failing to reject a false H 0. (c) 2001, Ron S. Kenett, Ph. D. 10

10/16/2021 What are acceptable error levels? n Decision makers frequently use a 5% significance

10/16/2021 What are acceptable error levels? n Decision makers frequently use a 5% significance level. n n n Use a = 0. 05. An a-error means that we will decide to adjust the machine when it does not need adjustment. This means, in the case of the robot welder, if the machine is running properly, there is only a 0. 05 probability of our making the mistake of concluding that the robot requires adjustment when it really does not. (c) 2001, Ron S. Kenett, Ph. D. 11

Three Types of Tests n Nondirectional, two-tail test: n n H 1: pop parameter

Three Types of Tests n Nondirectional, two-tail test: n n H 1: pop parameter n. e. value Directional, right-tail test: n n 10/16/2021 H 1: pop parameter > value Directional, left-tail test: n H 1: pop parameter < value Always put hypotheses in terms of population parameters and have H 0: pop parameter = value (c) 2001, Ron S. Kenett, Ph. D. 12

Two tailed test 10/16/2021 H 0: pop parameter = value H 1: pop parameter

Two tailed test 10/16/2021 H 0: pop parameter = value H 1: pop parameter n. e. value (c) 2001, Ron S. Kenett, Ph. D. 13

Right tailed test 10/16/2021 H 0: pop parameter = value H 1: pop parameter

Right tailed test 10/16/2021 H 0: pop parameter = value H 1: pop parameter > value (c) 2001, Ron S. Kenett, Ph. D. 14

Left tailed test 10/16/2021 H 0: pop parameter = value H 1: pop parameter

Left tailed test 10/16/2021 H 0: pop parameter = value H 1: pop parameter < value (c) 2001, Ron S. Kenett, Ph. D. 15

10/16/2021 Ho H 1 Ho (c) 2001, Ron S. Kenett, Ph. D. H 1

10/16/2021 Ho H 1 Ho (c) 2001, Ron S. Kenett, Ph. D. H 1 Type I Error OK OK Type II Error 16

What Test to Apply? 10/16/2021 Ask the following questions: n Are the data the

What Test to Apply? 10/16/2021 Ask the following questions: n Are the data the result of a measurement (a continuous variable) or a count (a discrete variable)? n Is s known? n What shape is the distribution of the population parameter? n What is the sample size? (c) 2001, Ron S. Kenett, Ph. D. 17

Test of µ, s Known, Population Normally Distributed 10/16/2021 n Test Statistic: n x

Test of µ, s Known, Population Normally Distributed 10/16/2021 n Test Statistic: n x –m z= s 0 n where n n is the sample statistic. µ 0 is the value identified in the null hypothesis. s is known. n is the sample size. (c) 2001, Ron S. Kenett, Ph. D. 18

Test of µ, s Known, Population Not Normally Distributed 10/16/2021 n If n >

Test of µ, s Known, Population Not Normally Distributed 10/16/2021 n If n > 30, Test Statistic: x –m z= s 0 n n If n < 30, use a distribution-free test. (c) 2001, Ron S. Kenett, Ph. D. 19

Test of µ, s Unknown, Population Normally Distributed 10/16/2021 n Test Statistic: n where

Test of µ, s Unknown, Population Normally Distributed 10/16/2021 n Test Statistic: n where n n x –m t= s 0 n x is the sample statistic. µ 0 is the value identified in the null hypothesis. n s is unknown. n is the sample size n degrees of freedom on t are n – 1. n (c) 2001, Ron S. Kenett, Ph. D. 20

Test of µ, s Unknown, Population Not Normally Distributed 10/16/2021 n If n >

Test of µ, s Unknown, Population Not Normally Distributed 10/16/2021 n If n > 30, Test Statistic: n If n < 30, use a distribution-free test. (c) 2001, Ron S. Kenett, Ph. D. 21

Test of p, Sample Sufficiently Large 10/16/2021 n If both n p > 5

Test of p, Sample Sufficiently Large 10/16/2021 n If both n p > 5 and n(1 – p) > 5, Test Statistic: n n n where p = sample proportion p 0 is the value identified in the null hypothesis. n is the sample size. (c) 2001, Ron S. Kenett, Ph. D. 22

Test of p, Sample Not Sufficiently Large 10/16/2021 If either n p < 5

Test of p, Sample Not Sufficiently Large 10/16/2021 If either n p < 5 or n(1 – p) < 5, convert the proportion to the underlying binomial distribution. + Note there is no t-test on a population proportion. n (c) 2001, Ron S. Kenett, Ph. D. 23

Observed Significance Levels n 10/16/2021 A p-Value is: n n n the exact level

Observed Significance Levels n 10/16/2021 A p-Value is: n n n the exact level of significance of the test statistic. the smallest value a can be and still allow us to reject the null hypothesis. the amount of area left in the tail beyond the test statistic for a one-tailed hypothesis test or twice the amount of area left in the tail beyond the test statistic for a two-tailed test. the probability of getting a test statistic from another sample that is at least as far from the hypothesized mean as this sample statistic is. (c) 2001, Ron S. Kenett, Ph. D. 24

Observed Significance Levels n 10/16/2021 A p-Value is: n n n the exact level

Observed Significance Levels n 10/16/2021 A p-Value is: n n n the exact level of significance of the test statistic. the smallest value a can be and still allow us to reject the null hypothesis. the amount of area left in the tail beyond the test statistic for a one-tailed hypothesis test or twice the amount of area left in the tail beyond the test statistic for a two-tailed test. the probability of getting a test statistic from another sample that is at least as far from the hypothesized mean as this sample statistic is. (c) 2001, Ron S. Kenett, Ph. D. 25

Several Samples n Independent Samples: n Testing a company’s claim that its peanut butter

Several Samples n Independent Samples: n Testing a company’s claim that its peanut butter contains less fat than that produced by a competitor. (c) 2001, Ron S. Kenett, Ph. D. 10/16/2021 n Dependent Samples: n Testing the relative fuel efficiency of 10 trucks that run the same route twice, once with the current air filter installed and once with the new filter. 26

Test of (µ 1 – µ 2), s 1 = s 2, Populations Normal

Test of (µ 1 – µ 2), s 1 = s 2, Populations Normal n 10/16/2021 Test Statistic [x – x ] – [m – m ] t = 1 2 ! 1 !2 ! ! ! 1 1 s p 2! n + n !! !! ! 1 2 !!! n (n – 1) s 2 + (n – 1) s 2 1 2 2 where s p 2 = 1 n +n – 2 1 2 where degrees of freedom on t = n 1 + n 2 – 2 (c) 2001, Ron S. Kenett, Ph. D. 27

Example: Comparing Two populations 10/16/2021 H 0: pop 1 = pop 2 H 1:

Example: Comparing Two populations 10/16/2021 H 0: pop 1 = pop 2 H 1: pop 1 n. e. pop 2 Hypothesis Assumption Test Statistic The mean of population 1 is equal to the mean of population 2 (1) Both distributions are normal (2) s 1 = s 2 t distribution with df = n 1+ n 2 -2 (c) 2001, Ron S. Kenett, Ph. D. 28

Example: Comparing Two populations 10/16/2021 Rejection Region t distribution with df = n 1+

Example: Comparing Two populations 10/16/2021 Rejection Region t distribution with df = n 1+ n 2 -2 (c) 2001, Ron S. Kenett, Ph. D. 29

Test of (µ 1 – µ 2), s 1 n. e. s 2, Populations

Test of (µ 1 – µ 2), s 1 n. e. s 2, Populations Normal, large n 10/16/2021 n Test Statistic [x – x ]–[m – m ] 1 20 z = 1 2 s 2 1 + 2 n n 1 2 n with s 12 and s 22 as estimates for s 12 and s 22 (c) 2001, Ron S. Kenett, Ph. D. 30

Test of Dependent Samples (µ 1 – µ 2) = µd 10/16/2021 n Test

Test of Dependent Samples (µ 1 – µ 2) = µd 10/16/2021 n Test Statistic t=s d d n where d = (x 1 – x 2) = Sd/n, the average difference n = the number of pairs of observations sd = the standard deviation of d df = n – 1 n (c) 2001, Ron S. Kenett, Ph. D. 31

Test of (p 1 – p 2), where n 1 p 1>5, n 1(1–p

Test of (p 1 – p 2), where n 1 p 1>5, n 1(1–p 1)>5, n 2 p 2>5, and n 2 (1–p 2 )>5 10/16/2021 n Test Statistic n where p 1 = p 2 = n 1 = n 2 = observed proportion, sample 1 observed proportion, sample 2 sample size, sample 1 sample size , sample 2 n p + n p 2 2 p = 1 1 n + n 1 2 (c) 2001, Ron S. Kenett, Ph. D. 32

Test of Equal Variances n n n 10/16/2021 Pooled-variances t-test assumes the two population

Test of Equal Variances n n n 10/16/2021 Pooled-variances t-test assumes the two population variances are equal. The F-test can be used to test that assumption. The F-distribution is the sampling distribution of s 12/s 22 that would result if two samples were repeatedly drawn from a single normally distributed population. (c) 2001, Ron S. Kenett, Ph. D. 33

Test of s 1 = s 2 2 n n n 2 10/16/2021 If

Test of s 1 = s 2 2 n n n 2 10/16/2021 If s 12 = s 22 , then s 12/s 22 = 1. So the hypotheses can be worded either way. s 2 Test Statistic: F = 1 or 2 whichever is larger s 2 2 1 The critical value of the F will be F(a/2, n 1, n 2) n where a = the specified level of significance n 1 = (n – 1), where n is the size of the sample with the larger variance n 2 = (n – 1), where n is the size of the sample with the smaller variance (c) 2001, Ron S. Kenett, Ph. D. 34

Confidence Interval for (µ 1 – µ 2) 10/16/2021 n The (1 – a)%

Confidence Interval for (µ 1 – µ 2) 10/16/2021 n The (1 – a)% confidence interval for the difference in two means: n Equal variances, populations normal ! ! ! ! 1 1 ( x – x ) ± t a ׳ s 2 + !!! p 1 2 2 n n !! 1 2! n Unequal variances, large samples s 2 (x – x ) ± z a ׳ 1 + 2 1 2 n 1 2 (c) 2001, Ron S. Kenett, Ph. D. 35

Confidence Interval for (p 1 – p 2) 10/16/2021 n The (1 – a)%

Confidence Interval for (p 1 – p 2) 10/16/2021 n The (1 – a)% confidence interval for the difference in two proportions: p (1– p ) 1 + 2 2 (p – p ) ± z a ׳ 1 1 2 n n 2 1 2 n when sample sizes are sufficiently large. (c) 2001, Ron S. Kenett, Ph. D. 36

Summary 10/16/2021 Hypothesis Assumption Test Statistic The mean of population 1 is equal to

Summary 10/16/2021 Hypothesis Assumption Test Statistic The mean of population 1 is equal to the mean of population 2 (1) Both distributions are normal (2) s 1 = s 2 t distribution with df = n 1+ n 2 -2 The standard deviation of population 1 is equal to the standard deviation of population 2 Both distributions are normal F distribution with df 2 = n 1 -1 and df 2 = n 2 -1 The proportion of error in population 1 is equal to the proportion of errors in population 2 n 1 p 1 and n 2 p 2 > 5 (approximation by normal distribution) (c) 2001, Ron S. Kenett, Ph. D. Z - Normal distribution 37