Exploring Data Chapter 1 Patterns from Histogram A

  • Slides: 26
Download presentation
Exploring Data Chapter 1

Exploring Data Chapter 1

Patterns from Histogram A Center: the value that divides the observations roughly in half

Patterns from Histogram A Center: the value that divides the observations roughly in half Spread (variability): the extent of the data from smallest to largest value

Histogram A example center: 35, spread: 25 to 45 0 10 20 30 40

Histogram A example center: 35, spread: 25 to 45 0 10 20 30 40 50 60 70 80 90 100

Histogram A example center, spread 0 10 20 30 40 50 60 70 80

Histogram A example center, spread 0 10 20 30 40 50 60 70 80 90 100

Patterns from Histogram B Center: the value that divides the observations roughly in half

Patterns from Histogram B Center: the value that divides the observations roughly in half Spread (variability): the extent of the data from smallest to largest value Shape: overall appearance of distribution

Histogram B example skewed right 0 10 20 30 40 50 60 70 80

Histogram B example skewed right 0 10 20 30 40 50 60 70 80 90 100

Histogram B example skewed left 0 10 20 30 40 50 60 70 80

Histogram B example skewed left 0 10 20 30 40 50 60 70 80 90 100

Histogram B example symmetrical, mound shaped 0 10 20 30 40 50 60 70

Histogram B example symmetrical, mound shaped 0 10 20 30 40 50 60 70 80 90 100

Histogram B example uniform 0 10 20 30 40 50 60 70 80 90

Histogram B example uniform 0 10 20 30 40 50 60 70 80 90 100

Histogram B example bimodal 0 10 20 30 40 50 60 70 80 90

Histogram B example bimodal 0 10 20 30 40 50 60 70 80 90 100

Patterns from Histogram C Center: the value that divides the observations roughly in half

Patterns from Histogram C Center: the value that divides the observations roughly in half Spread (variability): the extent of the data from smallest to largest value Shape: overall appearance of distribution Unusual features: gaps/clusters and outliers

Histogram C example roughly symmetrical with gaps at 30 and 40 0 10 20

Histogram C example roughly symmetrical with gaps at 30 and 40 0 10 20 30 40 50 60 70 80 90 100

Histogram C example uniform with possible outlier at 5 0 10 20 30 40

Histogram C example uniform with possible outlier at 5 0 10 20 30 40 50 60 70 80 90 100

Displaying Distributions with Graphs categorical versus quantitative categorical: bar graphs, pie charts quantitative: dotplots,

Displaying Distributions with Graphs categorical versus quantitative categorical: bar graphs, pie charts quantitative: dotplots, histograms, stemplots, boxplots

Frequency Distributions A frequency distribution is a table that displays the categories, frequencies, relative

Frequency Distributions A frequency distribution is a table that displays the categories, frequencies, relative frequencies and/or cumulative relative frequencies. The frequency for a particular category is the number of observed responses that fall into that category. The corresponding relative frequency is the fraction or proportion of observed responses in the category. The cumulative relative frequency is the fraction or proportion of observed responses in all categories so far including the current.

Creating Histograms The difficulty with continuous data is that there are no natural categories.

Creating Histograms The difficulty with continuous data is that there are no natural categories. We must define our categories, or class intervals. The quantity. often gives a rough estimate for an appropriate number of intervals. Or using Sturgis's Rule is to take classes, rounded to the nearest integer.

Example Exit Name Miles 1 Ohio Gateway * 16 Carlisle 25 1 A New

Example Exit Name Miles 1 Ohio Gateway * 16 Carlisle 25 1 A New Castle 8 17 Gettysburg Pike 9. 8 2 Beaver Valley 3. 4 18 Harrisburg West Shore 5. 9 3 Cranberry 15. 6 19 Harrisburg East 5. 4 4 Butler Valley 10. 7 20 Lebanon-Lancaster 19 5 Allegheny Valley 8. 6 21 Reading 19. 1 6 Pittsburgh 8. 9 22 Morgantown 12. 8 7 Irwin 10. 8 12 Downingtown 13. 7 8 New Stanton 8. 1 24 Valley Forge 14. 3 9 Donegal 15. 2 25 Norristown 6. 8 10 Somerset 19. 2 26 Fort Washington 5. 4 11 Bedford 35. 6 27 Willow Grove 4. 4 12 Breezewood 15. 9 28 Philadelphia 8. 4 13 Fort Littleton 18. 1 29 Delaware Valley 6. 4 14 Willow Hill 9. 1 30 Delaware Valley 6. 4 15 Blue Mountain 12. 7

Stemplots Median: 62 Spread: from 22 to 91 Fairly symmetrical No unusual features 9

Stemplots Median: 62 Spread: from 22 to 91 Fairly symmetrical No unusual features 9 1 8 4 7 2 6 8 5 2 4 7 3 5 2 2 5 3 5 5 1 9 6 8 5 6 Key: 2|2 means 22 wpm

Alfred Hitchcock Stemplot 13 0 6 2 12 0 0 0 6 8 11

Alfred Hitchcock Stemplot 13 0 6 2 12 0 0 0 6 8 11 9 6 6 3 1 7 10 5 8 3 8 8 1 3 9 8 1 Key: 8|1 means 81 minutes

Split Stemplot Similar to a histogram, we want to avoid too many data points

Split Stemplot Similar to a histogram, we want to avoid too many data points in a small range ages of which a sample of 35 American mothers first gave birth 4 3 2 1 0 1 2 3 0 0 1 1 1 2 3 3 4 4 6 7 8 8 4 6 6 6 7 7 8 8 8 9 9 9 Key: 1|4 means 14 years old

Split Stemplot Split stemplot typically breaks each stem into High (5 -9) and Low(0

Split Stemplot Split stemplot typically breaks each stem into High (5 -9) and Low(0 -4) 4 3 2 1 0 1 2 3 0 0 1 1 1 2 3 3 4 4 6 7 8 8 4 6 6 6 7 7 8 8 8 9 9 9 Key: 1|4 means 14 years old

Split Stemplot Split stemplot typically breaks each stem into High (5 -9) and Low(0

Split Stemplot Split stemplot typically breaks each stem into High (5 -9) and Low(0 -4) 4 3 2 1 4 L 3 H 3 L 2 H 2 L 1 H 1 L Key: 1|4 means 14 years old

Split Stemplot Split stemplot typically breaks each stem into High (5 -9) and Low(0

Split Stemplot Split stemplot typically breaks each stem into High (5 -9) and Low(0 -4) 4 3 2 1 4 L 3 H 3 L 2 H 2 L 1 H 1 L 0 1 2 3 0 0 1 1 1 2 3 3 4 4 6 7 8 8 4 6 6 6 7 7 8 8 8 9 9 9 Key: 1|4 means 14 years old

Split Stemplot Split stemplot typically breaks each stem into High (5 -9) and Low(0

Split Stemplot Split stemplot typically breaks each stem into High (5 -9) and Low(0 -4) 4 L 3 H 3 L 2 H 2 L 1 H 1 L 0 1 2 3 0 0 1 1 1 2 3 3 4 4 6 7 8 8 4 6 6 6 7 7 8 8 8 9 9 9 Key: 1|4 means 14 years old

Back to Back Stemplots 1 6 0 5 4 9 4 4 1 6

Back to Back Stemplots 1 6 0 5 4 9 4 4 1 6 7 6 9 6 1 9 3 8 6 3 3 5 4 2 5 2 4 6 3 8 1 0 Key: 4 | 1 means 41

Babe Ruth vs. Roger Maris Generally, we can see that Babe Ruth hit more

Babe Ruth vs. Roger Maris Generally, we can see that Babe Ruth hit more home runs than Roger Maris. The center of Babe Ruth is higher at 46 than Roger Maris at 24. 5 home runs. Roger Maris has an outlier at 61 while Ruth has no outliers. Ruth has a higher spread from 22 to 60 than Maris from 8 to 39 if we exclude the outlier. Both distributions are fairly symmetrical.