Chapter 2 Describing the Location in a Distribution
Chapter 2 Describing the Location in a Distribution Measures of Relative Standing Used when, 1) we want to know an individual’s location within the distribution, or 2) to compare the relative standing of individuals in different distributions. Two measurements Used 1) percentiles 2) z-scores Remember, a percentile is the value such that p percent of the observations fall at or below that value. So in this list of data - 3 6 15 19 27 33 35 38 The 75 th percentile is at what value? 34 The value 15 is at what percentile? 37 th
Usually we use a z-score to show relative standing within a distribution. A z-score is a standardized measurement. To standardize is to convert from a value’s original units to standard deviation units. The basic z-score calculation is z = (x – mean) ∕ standard deviation The z-score tells us how many standard deviations our value is positioned from the mean and in which direction it is situated. The set of standardized values(z-scores) for a distribution has a mean of zero and a standard deviation of one. Looking at our prior list, 3 6 15 19 27 33 35 38 Find the z-score for the value 19. mean= 22 std dev= 13. 3417 z= -0. 2249
Another z-score example If Matt’s (age 16) blood glucose level was tested to be 107 while Ann’s (age 36) was 122, which one is a more typical reading? We need a little more information to answer this question. Age Mean Glucose Level Standard Deviation 10 -20 99 4. 5 20 -30 102 6. 2 30 -40 109 8. 7 Calculate Matt’s and Ann’s z-score and determine which is “better”. For Matt z= 1. 7778 For Ann z= 1. 4943 Which is “better”?
Chebyshev’s Inequality says that for any distribution, the percent of observations falling within k standard deviations of the mean is at least 100 ( ). Note k must be > 1 So if k = 2. 8 standard deviations, 100(1 – (1/7. 84)) = 87. 4%
Chapter 2 Describing the Location in a Distribution We have learned thus far to, v. Always plot our data – make a histogram, stemplot, or boxplot. v. Look for patterns in our graph and describe it in terms of shape, center and spread. v. Calculate numerical summaries to support our written/verbal descriptions Now we can introduce a important situation where we have many observations and the created graph of that data can be accurately approximated by a smooth curve (or a very set pattern). These curves are called density curves.
DENSITY CURVES A density curve is a curve that v is always on or above the horizontal axis, and v has an area of exactly 1 underneath its curve. It also describes the overall distribution of the data, can have many different shapes and sizes, and do not describe the outliers of the observations.
DENSITY CURVES The area under the density curve represents the proportion of the total observations. A total area = 1 represents all of the observations. The median is the point with exactly one-half of the observations on either side, so the median of a density curve is the “equal-area “ point of the distribution. The mean is the “balance “ point of the distribution. It is the point where the curve would balance if it was made of a solid material.
DENSITY CURVES Since a density curve is an idealized graph, we use different symbols to represent its numerical calculations. SYMBOL ACTUAL DATA DENSITY CURVE Mean x bar μ Standard deviation s σ English Greek Uniform Density Curves
NORMAL DISTRIBUTIONS A special density curve that has these three characteristics is considered to be a Normal curve that describes a Normal distribution, 1) symmetric, 2) single-peaked, and 3) bell-shaped. On a Normal curve the standard deviation can be easily located by eye.
The Normal curve is very important in statistics because 1) it is a good description of real-life data (when the sample size is large), 2) it is an excellent approximation of the end results of many different types of chance outcomes, and 3) many statistical inference procedures are based on the data having its characteristics. Be careful, many curves are symmetric, but not Normal. While others may be single-peaked, but not Normal. All three characteristics must be observed for a curve to be Normal.
The 68 - 95 – 99. 7 Rule All Normal distributions adhere to the 68 -95 -99. 7 Rule (the Empirical Rule). The rule states that in a Normal distribution with a mean of μ, and a standard deviation of σ, 1) approximately 68% of the observations fall within 1σ of μ, 2) approximately 95% of the observations fall within 2σ of μ, 3) approximately 99. 7% of the observations fall within 3σ of μ,
Normal Notation We can shorten writing “ a Normal distribution with a mean of μ and a standard deviation of σ” as, N( μ , σ ). This notation tells us we have a function with two variables. Write the normal notation for these distributions. Red curve: x = 12 and s = 4 N(50, 10) not Normal N(100, 15)
The Normal Curve Applet
The Standard Normal Distribution Remember z = (x - µ) ∕ σ This is a linear transformation, so the shape of the distribution does not change. If we were to standardize a variable that has a Normal distribution, the new variable that we create will also have a Normal distribution. We call the new distribution the standard Normal distribution. The standard Normal distribution is the Normal distribution N(0, 1). (mean= 0, standard deviation = 1)
Standard Normal Calculations If your data is normally distributed and you calculate a z = -1. 84, what proportion of the data falls below that z value? . 0329 or 3. 29% What proportion of the data fall above your z value? 1 – 0. 0329 = 0. 9671 or 96. 71% What proportion of the data are above a z = 2. 09? 1 – 0. 9817 = 0. 0183 or 1. 83% What proportion of the data fall between a z = -0. 37 and a z = 1. 92? 0. 9726 – 0. 3557 = 0. 6169 or 61. 69%
Standard Normal Calculations A Normal distribution of test scores has a µ = 78 and σ = 7. Find the proportion of scores that fall between 72 and 100. z = 3. 1429 for 100 and z = -0. 8571 for 72 Using Table A, the proportion is 0. 9992 – 0. 1949 = 0. 8043 or 80. 43% Remember to sketch the distribution of interest.
Standard Normal Calculations You can also find proportions using your graphing calculator. DIST (2 nd VARS) 2: normalcdf If you want to know a proportion below your value, normalcdf (minimum , value, mean, standard deviation) If you want to know a proportion above your value, normalcdf (value, maximum, mean standard deviation) If you want to know the proportion between two given values, normalcdf (low value, high value, mean, standard deviation) For the last example, normalcdf(72, 100, 78, 7) 0. 8035 or 80. 35%
Normal Distribution Calculations Fairly Normal distributions of real-life observations are common in many area. Business - returns on a diversified portfolio - employee performance Manufacturing - product reliability - quality management People - height - test performance Therefore we need to solve problems that involve Normal distributions.
Normal Distribution Calculations The Steps Involved in Normal Distribution Problem Solving 1) State the problem in terms of the observed variable (x). Draw a sketch of the distribution and shade the area of interest. 2) Standardize the variable to restate the problem in terms of the standard Normal variable (z). Sketch the area of interest on the standard Normal curve. 3) Use Table A to find the require area (proportion). 4) State your conclusion in the context of the problem.
Normal Distribution Calculations The army reports that the distribution of head circumference among male soldiers is approximately Normal with a mean of 22. 8 inches and a standard deviation of 1. 1 inches. Mark has a head circumference of 22. 3 inches. 1) What proportion of male soldiers have head circumference greater than Mark? z = -0. 4546 1 – 0. 3264 = 0. 6736 [0. 6753] 2) What percent of male soldiers have head circumference between 19. 8 and 23. 4 inches? z = 0. 5455 z = -2. 7273 0. 7088 – 0. 0032 = 0. 7056 = 70. 56% [70. 41%]
Normal Distribution Calculations IQ scores have a distribution that is fairly Normal. The mean of the distribution is 100. 8 with a standard deviation of 15. 4. If Kelly’s IQ score is at the 91 st percentile, what is her score? 1) Find the z using the percentile. In the inner part of Table A look for 0. 91. z = 1. 34 2) Use the z formula to determine the IQ score (x). z = (x – µ) / σ x = 121. 4 1. 34 = (x – 100. 8) / 15. 4
How can you tell if a distribution is Normal? 1) Graph your data. Make a histogram or stemplot. Look for the graph to be bellshaped and roughly symmetric about the mean. Compare your curve to the 68 -95 -99. 7 Rule. Look for non-Normal features such as outliers, gaps, and/or strong skewedness. Be careful of small sets of data, they are rarely Normal. 2) Make a Normal probability plot.
The basic idea of a Normal probability plot 1) Arrange the data values from least to greatest. 2) Record what percentile each value occupies. 3) Use Table A to find the z-scores at each of the recorded percentiles. 4) Plot the data value (x) on the vertical axis vs. its corresponding zscore on the horizontal axis. If the distribution is approximately Normal, the plotted points will lie on a fairly straight line. Systematic deviations show a non-Normal distribution, while outliers appear as points far away from the overall pattern.
Normal Probability Plot Make a Normal probability plot of this data. 1 1 2 2 2 3 3 4 4 4 5 5 6 7 7 8 8 9 mean = 4. 45, s = 2. 3946, Q 1 = 2. 5, Is the data distributed Normally? Explain. M = 4, Q 3 = 6. 5
- Slides: 24