ESSENTIAL STATISTICS 2 E William Navidi and Barry

ESSENTIAL STATISTICS 2 E William Navidi and Barry Monk ©Mc. Graw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of Mc. Graw-Hill Education.

Measures of Spread Section 3. 2 ©Mc. Graw-Hill Education.

Objectives 1. Compute the range of a data set 2. Compute the variance of a population and a sample 3. Compute the standard deviation of a population and a sample 4. Approximate the standard deviation with grouped data 5. Use the Empirical Rule to summarize data that are unimodal and approximately symmetric 6. Use Chebyshev’s Inequality to describe a data set 7. Compute the coefficient of variation ©Mc. Graw-Hill Education.

Objective 1 Compute the range of a data set ©Mc. Graw-Hill Education.

The Range The range of a data set is the difference between the largest value and the smallest value. The average monthly temperatures, in degrees Fahrenheit, for San Francisco are listed. The range of temperatures is: 63 – 51 = 12. Although the range is easy to compute, it is not often used in practice. The reason is that the range involves only two values from the data set: the largest and smallest. ©Mc. Graw-Hill Education.

Objective 2 Compute the variance of a population and a sample ©Mc. Graw-Hill Education.

Variance When a data set has a small amount of spread, like the San Francisco temperatures, most of the values will be close to the mean. When a data set has a larger amount of spread, more of the data values will be far from the mean. The variance is a measure of how far the values in a data set are from the mean, on the average. The variance is computed slightly differently for populations and samples. The population variance is presented first. ©Mc. Graw-Hill Education.

Population Variance • • ©Mc. Graw-Hill Education.

Example: Population Variance Compute the population variance for the San Francisco temperatures. • ©Mc. Graw-Hill Education.

Example: Population Variance (Continued) • • ©Mc. Graw-Hill Education.

Sample Variance • • ©Mc. Graw-Hill Education.

• ©Mc. Graw-Hill Education.

Example: Sample Variance • ©Mc. Graw-Hill Education.

Objective 3 Compute the standard deviation of a population and a sample ©Mc. Graw-Hill Education.

Standard Deviation • • ©Mc. Graw-Hill Education. •

Example: Standard Deviation • ©Mc. Graw-Hill Education.

Standard Deviation on the TI-84 PLUS The following steps will compute the standard deviation for both sample data and population data on the TI-84 PLUS Calculator. Enter the data into L 1 in the data editor. Run the 1 -Var Stats command (the same command used for means and medians), selecting L 1 as the location of the data. ©Mc. Graw-Hill Education.

Standard Deviation and Resistance Recall that a statistic is resistant if its value is not affected much by extreme values (large or small) in the data set. The standard deviation is not resistant. That is, the standard deviation is affected by extreme values. ©Mc. Graw-Hill Education.

Objective 4 Approximate the standard deviation using grouped data ©Mc. Graw-Hill Education.

Approximating the Standard Deviation Sometimes we don’t have access to the raw data in a data set, but we are given a frequency distribution. In these cases we can approximate the standard deviation using the following steps. Step 1: Compute the midpoint of each class and approximate the mean of the frequency distribution. Step 2: For each class, subtract the mean from the class midpoint to obtain (Midpoint – Mean). Step 3: For each class square the difference obtained in Step 2 to obtain (Midpoint – Mean)2, and multiply by the frequency to obtain (Midpoint – Mean)2 x (Frequency). ©Mc. Graw-Hill Education.

Approximating the Standard Deviation (Continued) • ©Mc. Graw-Hill Education.

Example: Standard Deviation Grouped Data The following table presents the number of text messages sent via cell phone by a sample of 50 high school students. Approximate the standard deviation of the number of messages sent. ©Mc. Graw-Hill Education. Number of Messages Sent Frequency 0 – 49 10 50 – 99 5 100 – 149 13 150 – 199 11 200 – 249 7 250 – 299 4

Solution: Step 1 Compute the midpoint of each class. Number of Messages Sent Class Midpoints 0 – 49 25 50 – 99 75 100 – 149 125 150 – 199 175 200 – 249 225 250 – 299 275 ©Mc. Graw-Hill Education.

Solution: Step 2 For each class, subtract mean from the class midpoint to obtain (Midpoint – Mean). Recall that the mean was calculated earlier to be 137. Number of Messages Sent Class Midpoints (Midpoint – Mean) 0 – 49 25 – 112 50 – 99 75 – 62 100 – 149 125 – 12 150 – 199 175 38 200 – 249 225 88 250 – 299 275 138 ©Mc. Graw-Hill Education.

Solution: Step 3 For each class, square the differences obtained in Step 2 to obtain (Midpoint – Mean)2, and multiply by the frequency to obtain (Midpoint – Mean)2 x (Frequency). Number of Messages Sent Frequency (Midpoint – Mean) 0 – 49 10 – 112 125440 50 – 99 5 – 62 19220 100 – 149 13 – 12 1872 150 – 199 11 38 15884 200 – 249 7 88 54208 250 – 299 4 138 76176 ©Mc. Graw-Hill Education.

Solution: Step 4 Add the products (Midpoint – Mean)2 x (Frequency) over all classes. Frequency 10 125440 5 19220 13 1872 11 15884 7 54208 4 76176 ©Mc. Graw-Hill Education. •

Solution: Step 5 • ©Mc. Graw-Hill Education.

Grouped Data on the TI-84 PLUS The following procedure is used to compute the mean and standard deviation for grouped data in a frequency distribution. Enter the midpoint for each class into L 1 and the corresponding frequencies in L 2. Next, select the 1 -Var stats followed by L 1, comma, L 2. Note: If your calculator supports Stat Wizards, enter L 1 in the List field and L 2 in the Freq. List field. ©Mc. Graw-Hill Education.

Example: Grouped Data on the TI-84 PLUS Class Midpoint Frequency 25 10 75 5 125 13 175 11 225 7 275 4 ©Mc. Graw-Hill Education. The output for the last example on the TI 84 PLUS Calculator is presented below. The value of s represents the approximate sample standard deviation. In this example s = 77. 30142. Therefore the approximate standard deviation is 77. 30142.

Objective 5 Use the Empirical Rule to summarize data that are unimodal and approximately symmetric ©Mc. Graw-Hill Education.

Bell-Shaped Histogram Many histograms have a single mode near the center of the data, and are approximately symmetric. Such histograms are often referred to as bell-shaped. ©Mc. Graw-Hill Education.

The Empirical Rule When a data set has a bell-shaped histogram, it is often possible to use the standard deviation to provide an approximate description of the data using a rule known as The Empirical Rule. • Approximately 68% of the data will be within one standard deviation of the mean. • Approximately 95% of the data will be within two standard deviations of the mean. • All, or almost all, of the data will be within three standard deviations of the mean. ©Mc. Graw-Hill Education.

Example: The Empirical Rule The following table presents the U. S. Census Bureau projection for the percentage of the population aged 65 and over for each state and the District of Columbia. Use the Empirical Rule to describe the data. We first note that the histogram is approximately bell-shaped and we may use the TI-84 PLUS calculator, or other technology, to compute the population mean and standard deviation. ©Mc. Graw-Hill Education.

Example: The Empirical Rule (Continued) • ©Mc. Graw-Hill Education.

Objective 6 Use Chebyshev’s Inequality to describe a data set ©Mc. Graw-Hill Education.

Any Data Set When a distribution is bell-shaped, we use The Empirical Rule to approximate the proportion of data within one or two standard deviations. Another rule called Chebyshev’s Inequality holds for any data set. ©Mc. Graw-Hill Education.

Chebyshev’s Inequality In any data set, the proportion of the data that is within K standard deviations of the mean is at least 1 – 1/K 2. Specifically, by setting K = 2 or K = 3, we obtain the following results. • At least 3/4, or 75%, of the data are within two standard deviations of the mean. • At least 8/9, or 89%, of the data are within three standard deviations of the mean. ©Mc. Graw-Hill Education.

Example: Chebyshev’s Inequality As part of a public health study, systolic blood pressure was measured for a large group of people. The mean was 120 and the standard deviation was 10. What information does Chebyshev’s Inequality provide about these data? Solution: We compute the following: We conclude: • At least 3/4 (75%) had systolic blood pressures between 100 and 140. • At least 8/9 (89%) had systolic blood pressures between 90 and 150. ©Mc. Graw-Hill Education.

Objective 7 Compute the coefficient of variation ©Mc. Graw-Hill Education.

Coefficient of Variation • ©Mc. Graw-Hill Education.

Example: Coefficient of Variation • ©Mc. Graw-Hill Education.

You Should Know. . . • How to compute the range of a data set • The notation for population variance, population standard deviation, sample variance, and sample standard deviation • How to compute the variance and the standard deviation for populations and samples • How to use the TI-84 PLUS calculator to compute the variance and standard deviation for populations and samples • How to approximate the standard deviation for grouped data • How to use The Empirical Rule to describe a bell-shaped data set • How to use Chebyshev’s Inequality to describe any data set • How to compute and interpret the coefficient of variation ©Mc. Graw-Hill Education.