Chapter 3 Displaying and Summarizing Quantitative Data Copyright

  • Slides: 73
Download presentation
Chapter 3 Displaying and Summarizing Quantitative Data Copyright © 2014, 2012, 2009 Pearson Education,

Chapter 3 Displaying and Summarizing Quantitative Data Copyright © 2014, 2012, 2009 Pearson Education, Inc. 1

Objectives The student will be able to: 8. Appropriately display quantitative data using a

Objectives The student will be able to: 8. Appropriately display quantitative data using a frequency distribution, histogram, relative frequency histogram, and stemand-leaf display. 9. Describe the general shape of a distribution in terms of shape, center and spread. 10. Describe any anomalies or extraordinary features revealed by the display of a variable. 11. Compute and apply the concepts of mean and median to a set of data. 12. Compute and apply the concept of the standard deviation and IQR to a set of data. 13. Select a suitable measure of center/spread for a variable based on information about its distribution. 14. Create a five-number summary of a variable. 15. Construct a boxplot by hand using technology, using fences to identify possible outliers. 16. Use the 1. 5 IQR rule to identify possible outliers. Slide 3 - 2 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 2

3. 1 Displaying Quantitative Variables Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3

3. 1 Displaying Quantitative Variables Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3

Histograms A histogram of tsunami generating earthquakes Histogram: A chart that displays quantitative data

Histograms A histogram of tsunami generating earthquakes Histogram: A chart that displays quantitative data • • Great for seeing the distribution of the data Most earthquake generating tsunamis have magnitudes between 6. 5 and 8. • • Japan and Sumatra quakes (9. 0 and 9. 1) are rare. • Quakes under 5 rarely cause tsunamis. Quakes between 7. 0 and 7. 5 most common for causing tsunamis • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 4

Choosing the Bin Width Different bin widths tell different stories. • Choose the width

Choosing the Bin Width Different bin widths tell different stories. • Choose the width that best shows the important features. • Presentations can feature two histograms that present the same data in different ways. • A gap in the histogram means that there were no occurrences in that range. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 5

Relative Frequency Histograms • Relative Frequency Histogram The vertical axis represents the relative frequency,

Relative Frequency Histograms • Relative Frequency Histogram The vertical axis represents the relative frequency, the frequency divided by the total. • The horizontal axis is the same as the horizontal axis for the frequency histogram. • The shape of the relative frequency histogram is the same as the frequency histogram. • Only the scale of the y-axis is different. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 6

Practice Suppose a basketball player scored the following number of points in his last

Practice Suppose a basketball player scored the following number of points in his last 15 games: 4, 4, 3, 4, 7, 16, 12, 15, 6, 8, 5, 9, 8, 25, 11 Fill in the following frequency (and relative frequency) distribution. Bin 1 -6 7 -12 13 -18 19 -24 25 -30 Total Frequency Relative Frequency | | | 15 | 100% Slide 4 - 7 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 7

Practice (continued) What percentage of games did the player score 12 points or less?

Practice (continued) What percentage of games did the player score 12 points or less? What percentage of games did the player score between 7 and 18 points (inclusive i. e. 7<=points<=18)? Slide 4 - 8 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 8

Using the TI to make a histogram First Enter data in L 1: Stat->Edit->Enter

Using the TI to make a histogram First Enter data in L 1: Stat->Edit->Enter Turn STAT PLOT on • [2 nd] [Y=] will enter into the stat plot menu • With cursor on 1: hit enter • with cursor on On for Plot 1, hit enter • Select type of plot desired • If raw data is in L 1, Xlist: L 1, Freq: 1, If frequencies are in L 2 then set Xlist: L 1, Freq: L 2 • Zoom -> Zoom. Stat to display data Example: Lets make a histogram of the following dataset: 23 15 34 67 68 78 93 85 34 12 42 93 34 67 67 67 23 32 15 78 • If we want a histogram that groups the data using the classes: 10 -19, 2029, 30 -39, etc. We must adjust the window (to be discussed in class or see separate technology instructions). Slide 1 - 9 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 9

Histograms and Stat. Crunch Enter Data. Graphics → Histogram • Click on the data

Histograms and Stat. Crunch Enter Data. Graphics → Histogram • Click on the data variable and Next. • Select Frequency or Relative Frequency. • Put in starting value and/or Binwidth if desired. • Click Next twice, and type in labels. Click Create Graph. • • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 10

Stem-and-Leaf Displays Stem-and-Leaf: Shows both the shape of the distribution and all of the

Stem-and-Leaf Displays Stem-and-Leaf: Shows both the shape of the distribution and all of the individual values • Not as visually pleasing as a histogram; more technical looking • • Can only be used for small collections of data • The first column (stems) represents the leftmost digit. • The second column (leaves) shows the remaining digit(s). Copyright © 2014, 2012, 2009 Pearson Education, Inc. 11

Constructing a Stem-and-Leaf Display First, cut each data value into leading digits (“stems”) and

Constructing a Stem-and-Leaf Display First, cut each data value into leading digits (“stems”) and trailing digits (“leaves”). Use the stems to label the bins. Use only one digit for each leaf—either round or truncate the data values to one decimal place after the stem. Write a key at the bottom: 8|3 indicates 83 or 830 Examples – make stem-and-leaf displays for number of siblings, heights, number of college semesters completed using our class data set Slide 4 - 12 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 12

Stem and Leaf with Stat. Crunch • Enter Data • Graphics → Stem and

Stem and Leaf with Stat. Crunch • Enter Data • Graphics → Stem and Leaf Click on the variable name and Next • Select Outlier Trimming Type and Create Graph! • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 13

Dotplots Dotplot: Displays dots to describe the shape of the distribution • There were

Dotplots Dotplot: Displays dots to describe the shape of the distribution • There were 30 races with a winning time of 122 seconds. • • Good for smaller data sets Visually more appealing than stem-and-leaf • In Stat. Crunch: Graphics → Dotplot • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 14

Think Before you Draw Is the variable quantitative? Is the answer to the survey

Think Before you Draw Is the variable quantitative? Is the answer to the survey question or result of the experiment a number whose units are known? • Histograms, stem-and-leaf diagrams, and dotplots can only display quantitative data. • • Bar and pie charts display categorical data. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 15

3. 2 Shape Copyright © 2014, 2012, 2009 Pearson Education, Inc. 16

3. 2 Shape Copyright © 2014, 2012, 2009 Pearson Education, Inc. 16

What is the Shape of the Distribution? 1. 2. 3. Does the histogram have

What is the Shape of the Distribution? 1. 2. 3. Does the histogram have a single, central hump or several separated humps? Is the histogram symmetric? Do any unusual features stick out? Slide 4 - 17 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 17

Modes • A Mode of a histogram is a hump or high-frequency bin. •

Modes • A Mode of a histogram is a hump or high-frequency bin. • One mode → Unimodal • Two modes → Bimodal • 3 or more → Multimodal Unimodal Bimodal Copyright © 2014, 2012, 2009 Pearson Education, Inc. Multimodal 18

Uniform Distributions Uniform Distribution: All the bins have the same frequency, or at least

Uniform Distributions Uniform Distribution: All the bins have the same frequency, or at least close to the same frequency. • The histogram for a uniform distribution will be flat. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 19

Symmetry The histogram for a symmetric distribution will look the same on the left

Symmetry The histogram for a symmetric distribution will look the same on the left and the right of its center. • Symmetric Not Symmetric Copyright © 2014, 2012, 2009 Pearson Education, Inc. Symmetric 20

Skew A histogram is skewed right if the longer tail is on the right

Skew A histogram is skewed right if the longer tail is on the right side of the mode. • A histogram is skewed left if the longer tail is on the left side of the mode. • Skewed Right Skewed Left Copyright © 2014, 2012, 2009 Pearson Education, Inc. 21

Outliers An Outlier is a data value that is far above or far below

Outliers An Outlier is a data value that is far above or far below the rest of the data values. • An outlier is sometimes just an error in the data collection. • An outlier can also be the most important data value. • • Income of a CEO Temperature of a person with a high fever • • Elevation at Death Valley Copyright © 2014, 2012, 2009 Pearson Education, Inc. 22

Summary - Shape of the Distribution Does the histogram have a single, central hump

Summary - Shape of the Distribution Does the histogram have a single, central hump or several separated humps? • unimodal, bimodal, multimodal, uniform 2. Is the histogram symmetric? • Symmetric, skewed left, skewed right 3. Do any unusual features stick out? • Outliers, gaps 1. Slide 4 - 23 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 23

Example The histogram shows the amount of money spent by a credit card company’s

Example The histogram shows the amount of money spent by a credit card company’s customers. Describe and interpret the distribution. • The distribution is unimodal. Customers most commonly spent a small amount of money. • The distribution is skewed right. Many customers spent only a small amount and a few were spread out at the high end. • There is an outlier at around $7000. One customer spent much more than the rest of the customers. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 24

3. 3 Center Copyright © 2014, 2012, 2009 Pearson Education, Inc. 25

3. 3 Center Copyright © 2014, 2012, 2009 Pearson Education, Inc. 25

The Median: The center of the data values • Half of the data values

The Median: The center of the data values • Half of the data values are to the left of the median and half are to the right of the median. • For symmetric distributions, the median is directly in the middle. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 26

Calculating the Median: Odd Sample Size • First order the numbers. If there an

Calculating the Median: Odd Sample Size • First order the numbers. If there an odd number of numbers, n, the median is at position . • • Find the median of the numbers: 2, 4, 5, 6, 7, 9, 9. • • The median is the fourth number: 6 Note that there are 3 numbers to the left of 6 and 3 to the right. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 27

Calculating the Median: Even Sample Size • First order the numbers. If there an

Calculating the Median: Even Sample Size • First order the numbers. If there an even number of numbers, n, the median is the average of the two middle numbers: . • • Find the median of the numbers: 2, 2, 4, 6, 7, 8. • The median is the average of the third and the fourth numbers: • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 28

3. 4 Spread Copyright © 2014, 2012, 2009 Pearson Education, Inc. 29

3. 4 Spread Copyright © 2014, 2012, 2009 Pearson Education, Inc. 29

Spread • Locating the center is only part of the story • Are the

Spread • Locating the center is only part of the story • Are the data all near the center or are they spread out? • Is the highest value much higher than the lowest value? To describe data, we must discuss both the center and the spread. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 30

Range The range is the difference between the maximum and minimum values. Range =

Range The range is the difference between the maximum and minimum values. Range = Maximum – Minimum • The ages of the guests at your dinner party are: 16, 18, 23, 27, 35, 74 • • The range is: 74 – 16 = 58 The range is sensitive to outliers. A single high or low value will affect the range significantly. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 31

Percentiles and Quartiles • Percentiles divide the data in one hundred groups. The nth

Percentiles and Quartiles • Percentiles divide the data in one hundred groups. The nth percentile is the data value such that n percent of the data lies below that value. • • For large data sets, the median is the 50 th percentile. The median of the lower half of the data is the 25 th percentile and is called the first quartile (Q 1). • The median of the upper half of the data is the 75 th percentile and is called the third quartile (Q 3). • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 32

Using the TI to calculate summary statistics To enter raw data in L 1

Using the TI to calculate summary statistics To enter raw data in L 1 STAT -> EDIT [1] With cursor on L 1 hit [CLEAR] to delete old values Fill list with individual values To calculate summary statistics: STAT -> CALC[1] [L 1] [ENTER] (L 1 is found by pressing [2 nd][1]) Scroll down to find median, quartiles, min and max To enter a frequency distribution, enter the values in L 1 and frequency counts in L 2. To calculate summary statistics use: STAT-> CALC[1] [L 1] [ , ] [L 2] [ENTER] Slide 1 - 33 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 33

Using Stat. Crunch to calculate summary statistics Load Data Stat->Summary Stats -> Columns (select

Using Stat. Crunch to calculate summary statistics Load Data Stat->Summary Stats -> Columns (select column) Slide 4 - 34 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 34

Stat. Crunch, Q 1, Median, and Q 3 • Enter the data. Stat →

Stat. Crunch, Q 1, Median, and Q 3 • Enter the data. Stat → Summary Stats → Columns • Click on the variable and then Calculate. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 35

The Interquartile Range (IQR) is the difference between the upper quartile and the lower

The Interquartile Range (IQR) is the difference between the upper quartile and the lower quartile IQR = Q 3 – Q 1 • The IQR measures the range of the middle half of the data. • • Example: If Q 1 = 23 and Q 3 = 44 then IQR = 44 – 23 = 21 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 36

The Interquartile Range for earthquake causing tsunamis is 0. 9. • The picture below

The Interquartile Range for earthquake causing tsunamis is 0. 9. • The picture below shows the meaning of the IQR. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 37

Benefits and Drawbacks of the IQR • The Interquartile Range is not sensitive to

Benefits and Drawbacks of the IQR • The Interquartile Range is not sensitive to outliers. The IQR provides a reasonable summary of the spread of the distribution. • The IQR shows where typical values are, except for the case of a bimodal distribution. • The IQR is not great for a general audience since most people do not know what it is. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 38

3. 5 Boxplots and 5 -Number Summaries Copyright © 2014, 2012, 2009 Pearson Education,

3. 5 Boxplots and 5 -Number Summaries Copyright © 2014, 2012, 2009 Pearson Education, Inc. 39

5 -Number Summary The 5 -Number Summary provides a numerical description of the data.

5 -Number Summary The 5 -Number Summary provides a numerical description of the data. It consists of • • • Minimum First Quartile (Q 1) Median Third Quartile (Q 3) Maximum The list to the right shows the 5 -Number Summary for the tsunami data. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 40

Interpreting the 5 -Number Summary The smallest tsunami-causing earthquake had magnitude 3. 7. •

Interpreting the 5 -Number Summary The smallest tsunami-causing earthquake had magnitude 3. 7. • The largest tsunami-causing earthquake had magnitude 9. 1. • The middle half of tsunami-causing earthquakes is between 6. 7 and 7. 6. • Half of tsunami-causing earthquakes have magnitudes below 7. 2 and half are above 7. 2. • • A tsunami-causing earthquake less than 6. 7 is small. • A tsunami-causing earthquake more than 7. 6 is small. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 41

Boxplots A Boxplot is a chart that displays the 5 -Point Summary and the

Boxplots A Boxplot is a chart that displays the 5 -Point Summary and the outliers. • • The Box shows the Interquartile Range. The dashed lines are called fences, outside the fences lie the outliers. • Above and below the box are the whiskers that display the most extreme data values within the fences. • • The line inside the box shows the median. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 42

Finding the Fences The lower fence is defined by Lower Fence = Q 1

Finding the Fences The lower fence is defined by Lower Fence = Q 1 – 1. 5 × IQR • The upper fence is defined by Upper Fence = Q 3 + 1. 5 × IQR • Tsunami Example: Q 1 = 6. 7, Q 3 = 7. 6 IQR = 7. 6 – 6. 7 = 0. 9 • • Lower Fence = 6. 7 – 1. 5 × 0. 9 = 5. 35 • Upper Fence = 7. 6 + 1. 5 × 0. 9 = 8. 95 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 43

Identifying Outliers Use the 1. 5*IQR rule to identify potential outliers Values above Q

Identifying Outliers Use the 1. 5*IQR rule to identify potential outliers Values above Q 3 + 1. 5*IQR Values below Q 1 - 1. 5*IQR If there any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing. Note: The median and IQR are not likely to be affected by the outliers. Slide 1 - 44 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 44

Practice (by hand) • Suppose a basketball player scored the following number of points

Practice (by hand) • Suppose a basketball player scored the following number of points in his last 15 games: 4, 4, 3, 4, 7, 16, 12, 15, 6, 8, 5, 9, 8, 25, 11. Construct a (modified) boxplot for these scores What were your fences? Slide 1 - 45 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 45

Using the TI to make boxplots Similar to plotting histograms Enter data in lists,

Using the TI to make boxplots Similar to plotting histograms Enter data in lists, To turn on stat plots, STATPLOT-> Plot 1 -> ENTER • Select On -> Enter • select the first boxplot pictured (this is a modified boxplot and indicates outliers rather than a standard boxplot whose whiskers extend to the max and minimum), • Xlist (L 1 or L 2) • Frequency (will be 1 if all data is entered, may be another list if using a frequency table) Use Plot 2 to display another data set Zoom -> 9 (Zoom. Stat) Use trace to explore the box plot Weekly Salaries of Mooseburgers Employees 123 136 144 150 110 131 140 160 120 130 Slide 1 - 46 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 46

Stat. Crunch and Boxplots Enter data and go to Graphics → Boxplot. • Click

Stat. Crunch and Boxplots Enter data and go to Graphics → Boxplot. • Click on the variable and Next. • Check “Use fences to identify outliers. ” Then Next • Type in labels and click on Create Graph. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 47

3. 6 The Center of Symmetric Distributions: The Mean Copyright © 2014, 2012, 2009

3. 6 The Center of Symmetric Distributions: The Mean Copyright © 2014, 2012, 2009 Pearson Education, Inc. 48

The Mean • The Mean is what most people think of as the average.

The Mean • The Mean is what most people think of as the average. Add up all the numbers and divide by the number of numbers. • • Recall that S means “Add them all. ” In Stat. Crunch, the mean is listed in the Summary Statistics. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 49

The Mean is the “Balancing Point” If you put your finger on the mean,

The Mean is the “Balancing Point” If you put your finger on the mean, the histogram will balance perfectly. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 50

Mean Vs. Median For symmetric distributions, the mean and the median are equal. •

Mean Vs. Median For symmetric distributions, the mean and the median are equal. • The balancing point is at the center. • • For skewed distributions, the tail “pulls” the mean towards it more than it does to the median. • The mean is more sensitive to outliers than the median. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 51

The Mean Is Attracted to the Outlier The mean is larger than the median

The Mean Is Attracted to the Outlier The mean is larger than the median since it is “pulled” to the right by the outlier. • The median is a better measure of the center for data that is skewed. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 52

Why Use the Mean? Although the median is a better measure of the center,

Why Use the Mean? Although the median is a better measure of the center, the mean weighs in large and small values better. • • The mean is easier to work with. For symmetric data, statisticians would rather use the mean. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 53

3. 7 The Spread of Symmetric Distributions: The Standard Deviation Copyright © 2014, 2012,

3. 7 The Spread of Symmetric Distributions: The Standard Deviation Copyright © 2014, 2012, 2009 Pearson Education, Inc. 54

The Variance The variance is a measure of how far the data is spread

The Variance The variance is a measure of how far the data is spread out from the mean. • The difference from the mean is: . • To make it positive, square it. • Then find the average of all of these distances, except instead of dividing by n, divide by n – 1. • Use s 2 to represent the variance. • The variance will mostly be used to find the standard deviation s which is the square root of the variance. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 55

Standard Deviation • The variance’s units are the square of the original units. Taking

Standard Deviation • The variance’s units are the square of the original units. Taking the square root of the variance gives the standard deviation, which will have the same units as y. • The standard deviation is a number that is close to the average distances that the y values are from the mean. • If data values are close to the mean (less spread out), then the standard deviation will be small. • If data values are far from the mean (more spread out), then the standard deviation will be large. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 56

The Standard Deviation and Histograms Order the histograms below from smallest standard deviation to

The Standard Deviation and Histograms Order the histograms below from smallest standard deviation to largest standard deviation. A B C Answer: C, A, B Copyright © 2014, 2012, 2009 Pearson Education, Inc. 57

The Standard Deviation (cont. ) A class has been divided into groups of five

The Standard Deviation (cont. ) A class has been divided into groups of five students each. Each group completed an independent study project and then took an individual pop quiz of 20 -points. Their scores are reported by group: Note that all groups had a mean of 10. • Notice that the SD for group 1 is 0 • What are the other standard deviations? 1 2 3 4 5 6 10 8 0 0 0 4 10 10 10 8 2 6 10 10 10 8 10 10 10 12 18 14 10 12 20 20 20 18 Slide 4 - 58 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 58

Recall… Using the TI to calculate summary statistics To enter raw data in L

Recall… Using the TI to calculate summary statistics To enter raw data in L 1 • STAT -> EDIT [1] • With cursor on L 1 hit [CLEAR] to delete old values • Fill list with individual values To calculate summary statistics: • STAT -> CALC[1] [L 1] [ENTER] • (L 1 is found by pressing [2 nd][1]) • Scroll down to find median, quartiles, min and max To enter a frequency distribution, enter the values in L 1 and frequency counts in L 2. To calculate summary statistics use: • STAT-> CALC[1] [L 1] [ , ] [L 2] [ENTER] Slide 1 - 59 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 59

Recall… Using Stat. Crunch to calculate summary statistics Load Data Stat->Summary Stats -> Columns

Recall… Using Stat. Crunch to calculate summary statistics Load Data Stat->Summary Stats -> Columns (select column) Slide 4 - 60 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 60

3. 8 Summary—What to Tell About a Quantitative Variable Copyright © 2014, 2012, 2009

3. 8 Summary—What to Tell About a Quantitative Variable Copyright © 2014, 2012, 2009 Pearson Education, Inc. 61

What to Tell • Histogram, Stem-and-Leaf, Boxplot • Describe modality, symmetry, outliers • Center

What to Tell • Histogram, Stem-and-Leaf, Boxplot • Describe modality, symmetry, outliers • Center and Spread • Median and IQR if not symmetric • Mean and Standard Deviation if symmetric. • Unimodal symmetric data: IQR > s. Check for errors. • Unusual Features • For multiple modes, possibly split the data into groups. • When there are outliers, report the mean and standard deviation with and without the outliers. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 62

Example: Fuel Efficiency The car owner has checked the fuel efficiency each time he

Example: Fuel Efficiency The car owner has checked the fuel efficiency each time he filled the tank. How would you describe the fuel efficiency? • Plan: Summarize the distribution of the car’s fuel efficiency. • • Variable: mpg for 100 fill ups, Quantitative • Mechanics: show a histogram • Fairly symmetric • Low outlier Copyright © 2014, 2012, 2009 Pearson Education, Inc. 63

Fuel Efficiency Continued • Which to report? • The mean and median are close.

Fuel Efficiency Continued • Which to report? • The mean and median are close. • Report the mean and standard deviation. • Conclusion • Distribution is unimodal and symmetric. • Mean is 22. 4 mpg. • Low outlier may be investigated, but limited effect on the mean • s = 2. 45; from one filling to the next, fuel efficiency differs from the mean by an average of about 2. 45 mpg. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 64

What Can Go Wrong? • Don’t make a histogram for categorical data. Don’t look

What Can Go Wrong? • Don’t make a histogram for categorical data. Don’t look for shape, center, and spread for a bar chart. • Choose a bin width appropriate for the data. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 65

What Can Go Wrong? Continued • Do a reality check • Don’t blindly trust

What Can Go Wrong? Continued • Do a reality check • Don’t blindly trust your calculator. For example, a mean student age of 193 years old is nonsense. • Sort before finding the median and percentiles. • 315, 8, 2, 49, 97 does not have median of 2. Don’t worry about small differences in the quartile calculation. • Don’t compute numerical summaries for a categorical variable. • The mean Social Security number is meaningless. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 66

What Can Go Wrong? Continued • Don’t report too many decimal places. • Citing

What Can Go Wrong? Continued • Don’t report too many decimal places. • Citing the mean fuel efficiency as 22. 417822453 is going overboard. • Don’t round in the middle of a calculation. • For multiple modes, think about separating groups. • Heights of people → Separate men and women Beware of outliers, the mean and standard deviation are sensitive to outliers. • Use a histogram or dotplot to ensure that the mean and standard deviation really do describe the data. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 67

Practice Recall: Suppose a basketball player scored the following number of points in his

Practice Recall: Suppose a basketball player scored the following number of points in his last 15 games: 4, 4, 3, 4, 7, 16, 12, 15, 6, 8, 5, 9, 8, 25, 11 Describe the shape of the distribution (modality, skew, and unusual features) What measures of center or spread would be most appropriate for this data set? Slide 4 - 68 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 68

Practice #26: A meteorologist preparing a talk about global warming compiled a list of

Practice #26: A meteorologist preparing a talk about global warming compiled a list of weekly low temperatures (in degrees Fahrenheit) he observed at his south Florida home last year. The coldest temp. for any week was 36 F, but he inadvertently recorded the Celsius value of 2 degrees. Assuming he correctly listed all the other temperatures, explain how this error will affect these summary statistics: • Measures of center: mean and median • Measures of spread: range, IQR, and standard deviation Slide 4 - 69 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 69

Practice The table displays the heights (in inches) of 130 members of a choir

Practice The table displays the heights (in inches) of 130 members of a choir a) Find the median and IQR b) Find the mean and standard deviation c) Display these data with a histogram d) Write a few sentences describing the distribution Height Count 60 2 69 5 61 6 70 11 62 9 71 8 63 7 72 9 64 5 73 4 65 20 74 2 66 18 75 4 67 7 76 1 68 12 Slide 4 - 70 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 70

Practice During his 20 season in the NHL, Wayne Gretzky scored 50% more points

Practice During his 20 season in the NHL, Wayne Gretzky scored 50% more points than anyone who ever played professional hockey. Here are the number of games he played during each season: 79, 80, 80, 74, 80, 79, 64, 78, 73, 78, 74, 45, 81, 48, 80, 82, 70 a) Create a stem and leaf display, using split stems b) Describe the shape of the distribution c) Describe the center and spread of the distribution d) What unusual features do you see? What might explain this? Slide 4 - 71 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 71

Example : weights of pennies (grams) - 2. 57 2. 56 3. 14 3.

Example : weights of pennies (grams) - 2. 57 2. 56 3. 14 3. 03 3. 13 2. 47 2. 43 3. 11 3. 06 2. 48 2. 51 2. 50 3. 07 3. 01 2. 45 2. 50 3. 13 3. 08 2. 51 3. 12 3. 10 3. 08 2. 46 2. 44 2. 47 2. 54 3. 09 3. 13 2. 56 2. 49 Create a histogram using bins which are. 10 grams wide (use Stat. Crunch). Be sure to label your axes. What can be said about the distribution? - In fact we have TWO different distributions here because in the early 1980 s the mint changed from copper to zinc. Lets separate our data into two groups - If we want to compare the two distributions would it be more appropriate to use mean and sd as measures of center and spread or median and IQR? - Calculate the median, quartiles, and IQR for the data (separated by group). Calculate the mean and sd (using your calculator or Stat. Crunch). Slide 4 - 72 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 72

Using Statcrunch and/or your TI Pick one of our class variables from our class

Using Statcrunch and/or your TI Pick one of our class variables from our class survey data set Create a histogram with appropriate sized bins Describe the distribution Calculate the median, quartiles, and interquartile range Calculate the mean and standard deviation Decide which measure of center and spread is most appropriate for the data – why Slide 4 - 73 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 73