Reminder What is a sampling distribution The sampling

Reminder: What is a sampling distribution? • The sampling distribution of a statistic is the distribution of all possible values of the statistic when all possible samples of a fixed size n are taken from the population. It is a theoretical idea — we do not actually build it. • The sampling distribution of a statistic is the probability distribution of that statistic.

Sampling distribution of x bar • We take many random samples of a given size n from a population with mean m and standard deviation s. • Some sample means will be above the population mean m and some will be below, making up the sampling distribution. Sampling distribution of “x bar” Histogram of some sample averages

For any population with mean m and standard deviation s: • The mean, or center of the sampling distribution of x bar, is equal to the population mean m : • The standard deviation of the sampling distribution of x bar is where n is the sample size. Sampling distribution of x bar s/√n m

For normally distributed populations When a random variable is normally distributed, the sampling distribution of x bar for all possible samples of size n is also normally distributed. Sampling distribution If the population is N(m, s) then the sample mean has a N(m, s/√n) distribution. Population

• The shape of Xbar tends to be normal. Even if the population is not normal, if the size (n) of the SRS is large enough and it is taken from any population with mean = m and standard deviation = s, then: – Xbar is approximately N(m, s/sqrt(n)). • This fact is called the Central Limit Theorem

Population distribution Dist. of X-bar for n=10 Dist. of X-bar for n=25

Application • Hypokalemia is diagnosed when blood potassium levels are low, below 3. 5 m. Eq/dl. Let’s assume that we know a patient whose measured potassium levels vary daily according to a normal distribution N(m = 3. 8, s = 0. 2) • If only one measurement is made, what is the probability that this patient will be misdiagnosed hypokalemic? z = − 1. 5, P(z < − 1. 5) = 0. 0668 ≈ 7% If instead measurements are taken on 4 separate days and they are averaged, what is the probability of such a misdiagnosis? z = − 3, P(z < − 1. 5) = 0. 0013 ≈ 0. 1% Note: Be sure to standardize (z) using the standard deviation of the variable being standardized (X in first case, X-bar in second case)!!

Income distribution Let’s consider the very large database of individual incomes from the Bureau of Labor Statistics as our population. It is strongly right skewed. – We take 1000 SRSs of 100 incomes, calculate the sample mean for each, and make a histogram of these 1000 means. – We also take 1000 SRSs of 25 incomes, calculate the sample mean for each, and make a histogram of these 1000 means. Which histogram corresponds to the samples of size 100? 25?

How large a sample size is required to achieve normality of X-bar? • … depends on the population distribution. More observations are required if the population distribution is far from being normal. – A sample size of 25 is generally enough to obtain a normal sampling distribution for X-bar from a strong skewness or even mild outliers. – A sample size of 40 will typically be good enough to overcome extreme skewness and outliers and make Xbar look normal In many cases, n = 25 isn’t a huge sample. Thus, even for strange population distributions we can assume a normal sampling distribution of the sample mean and work with it to solve problems.

• HW: Read section 5. 2 thru p. 242; don’t worry too much about how the book derives the formulas. . . instead make sure you know the Central Limit Theorem and what’s found in the boxes on p. 337, 338, and 339 and in the Summary on page 346. • Do problems # 5. 36 -5. 42, 5. 44, 5. 47, 5. 48, 5. 51, 5. 53, 5. 55, 5. 66, 5. 70, 5. 73