HISTOGRAMS Representing Data Why use a Histogram l

  • Slides: 28
Download presentation
HISTOGRAMS Representing Data

HISTOGRAMS Representing Data

Why use a Histogram l l When there is a lot of data When

Why use a Histogram l l When there is a lot of data When data is l Continuous l l a mass, height, volume, time etc Presented in a Grouped Frequency Distribution l Often in groups or classes that are UNEQUAL

Histograms look like this. . . NO GAPS between Bars Continuous data

Histograms look like this. . . NO GAPS between Bars Continuous data

Bars may be different in width Determined by Grouped Frequency Distribution

Bars may be different in width Determined by Grouped Frequency Distribution

So we use FREQUENCY DENSITY = Frequency Class width AREA is proportional to FREQUENCY

So we use FREQUENCY DENSITY = Frequency Class width AREA is proportional to FREQUENCY NOT height, because of UNEQUAL classes!

Grouped Frequency Distribution Classes Speed, km/h Frequency 0< v ≤ 40 40< v ≤

Grouped Frequency Distribution Classes Speed, km/h Frequency 0< v ≤ 40 40< v ≤ 50 50< v ≤ 60 60< v ≤ 90 90< v ≤ 110 80 15 25 90 30 These classes are well defined there are no gaps !

Drawing l l Sensible Scales Bases of rectangles correctly aligned l l Plot the

Drawing l l Sensible Scales Bases of rectangles correctly aligned l l Plot the Class Boundaries carefully Heights of rectangles needs to be correct l Frequency Density

Frequency Densities Speed, kph 0< v ≤ 40 40< v ≤ 50 50< v

Frequency Densities Speed, kph 0< v ≤ 40 40< v ≤ 50 50< v ≤ 60 60< v ≤ 90 90< v ≤ 110 Class width 40 10 10 30 20 Frequency 80 15 25 90 30 Frequency Density 2. 0 1. 5 2. 5 3. 0 1. 5

Freq Dens Frequency = Width x Height Frequency = 40 x 2. 0 =

Freq Dens Frequency = Width x Height Frequency = 40 x 2. 0 = 80 3. 0 2. 0 1. 0 0 20 40 60 80 100 120 Speed (km/h)

Grouped Frequency Distribution GAPS! Need to adjust to Continuous Time taken 5 -9 10

Grouped Frequency Distribution GAPS! Need to adjust to Continuous Time taken 5 -9 10 -19 20 -29 30 -39 40 -59 14 9 18 3 5 (nearest minute) Freq Classes Speed, kph Frequency No gaps Ready to graph 0< v ≤ 40 40< v ≤ 50 50< v ≤ 60 60< v ≤ 90 90< v ≤ 110 80 15 25 90 30

Adjusting Classes 4½ Time taken 9½ 19½ 29½ 39½ 5 -9 10 -19 20

Adjusting Classes 4½ Time taken 9½ 19½ 29½ 39½ 5 -9 10 -19 20 -29 30 -39 40 -59 14 9 18 3 5 10 10 10 20 (nearest minute) Freq 5 Class Widths

Frequency Density Time taken (nearest minute) 5 -9 10 -19 20 -29 30 -39

Frequency Density Time taken (nearest minute) 5 -9 10 -19 20 -29 30 -39 40 -59 Freq 14 9 18 3 5 Class width 5 10 10 10 20 Frequency Density 2. 8 0. 9 1. 8 0. 3 0. 25

Drawing l l Sensible Scales Bases correctly aligned l l Plot the Class Boundaries

Drawing l l Sensible Scales Bases correctly aligned l l Plot the Class Boundaries Heights correct l Frequency Density

Freq Dens 3. 0 2. 0 1. 0 4. 5 9. 5 19. 5

Freq Dens 3. 0 2. 0 1. 0 4. 5 9. 5 19. 5 29. 5 39. 5 49. 5 59. 5 Time (Mins) 5 10 15 20 25 30 35 40 45 50 55 60

Estimating a Frequency l Imagine we want to Estimate the number of people with

Estimating a Frequency l Imagine we want to Estimate the number of people with a time between 12 and 25 mins l Because we have rounded to nearest minute with our classes we. . l Consider the interval from 11. 5 to 25. 5

Freq Dens 11. 5 25. 5 3. 0 Width FD Frequency = 0. 9

Freq Dens 11. 5 25. 5 3. 0 Width FD Frequency = 0. 9 x 8 = 7. 2 Frequency = 1. 8 x 6 = 10. 8 2. 0 Total Frequency = 18 1. 0 4. 5 9. 5 19. 5 29. 5 39. 5 49. 5 59. 5 Time (Mins)

We can estimate the Mode Time taken 5 -9 10 -19 20 -29 30

We can estimate the Mode Time taken 5 -9 10 -19 20 -29 30 -39 40 -59 Freq 14 9 18 3 5 CF 14 23 41 44 49 (nearest minute) Mode is therefore in this Class

Freq Dens Modal class 3. 0 2. 0 1. 0 4. 5 9. 5

Freq Dens Modal class 3. 0 2. 0 1. 0 4. 5 9. 5 19. 5 29. 5 39. 5 49. 5 59. 5 Time (Mins)

…and the other one? Speed, kph Frequency l l 15 25 90 30 No

…and the other one? Speed, kph Frequency l l 15 25 90 30 No adjustments required – class widths friendly No ½ values Estimation from the EXACT values given l l l 80 Simpler to plot l l 0< v ≤ 40 40< v ≤ 50 50< v ≤ 60 60< v ≤ 90 90< v ≤ 110 No adjustment required Estimate 15 to 56 would use 15 and 56! Appear LESS OFTEN in the exam

Why use frequency density for the vertical axes of a Histogram? l The effect

Why use frequency density for the vertical axes of a Histogram? l The effect of unequal class sizes on the histogram can lead to misleading ideas about the data distribution The vertical axis is Frequency Density

Example: Misprediction of Grade Point Average (GPA) The following table displays the differences between

Example: Misprediction of Grade Point Average (GPA) The following table displays the differences between predicted GPA and actual GPA. Positive differences result when predicted GPA > actual GPA. Class Interval Frequency Class width -2. 0 to < -0. 4 23 1. 6 -0. 4 to < -0. 2 55 0. 2 -0. 2 to < -0. 1 97 0. 1 -0. 1 to < 0 210 0. 1 0 to < 0. 1 189 0. 1 to < 0. 2 139 0. 1 0. 2 to < 0. 4 116 0. 2 0. 4 to < 2. 0 171 1. 6 1000 X 10 -3 17. 1% of data 2. 3% of data The frequency histogram considerably exaggerates the incidence of overpredicted and underpredicted values The area of the two most extreme rectangles are much too large. !!

Example: Density Histogram of Misreporting GPA Class Interval Frequency Class width Frequency Density -2.

Example: Density Histogram of Misreporting GPA Class Interval Frequency Class width Frequency Density -2. 0 to < -0. 4 23 1. 6 14 -0. 4 to < -0. 2 55 0. 2 275 -0. 2 to < -0. 1 970 -0. 1 to < 0 210 0. 1 2100 0 to < 0. 1 1890 0. 1 to < 0. 2 139 0. 1 1390 0. 2 to < 0. 4 116 0. 2 580 0. 4 to < 2. 0 171 1. 6 107 To avoid the misleading histogram like the on last slide, display the data with frequency density Frequency=( rectangle height )x( class width ) = area of rectangle

X 10 -3 Frequency density x 10 -3

X 10 -3 Frequency density x 10 -3

Principles of Excellent Graphs § § § The graph should not distort the data.

Principles of Excellent Graphs § § § The graph should not distort the data. The graph should not contain unnecessary things (sometimes referred to as chart junk). The scale on the vertical axis should begin at zero. All axes should be properly labelled. The graph should contain a title. The simplest possible graph should be used for a given set of data. Chap 2 -24

Graphical Errors: Chart Junk Bad Presentation Good Presentation Minimum Wage 1960: $1. 00 $

Graphical Errors: Chart Junk Bad Presentation Good Presentation Minimum Wage 1960: $1. 00 $ Minimum Wage 4 1970: $1. 60 2 1980: $3. 10 0 1990: $3. 80 1960 1970 1980 1990 Chap 2 -25

Graphical Errors: No Relative Basis Bad Presentation A’s received by students. Freq. 300 Good

Graphical Errors: No Relative Basis Bad Presentation A’s received by students. Freq. 300 Good Presentation % 30% 200 20% 100 10% 0 0% FD UG GR SR A’s received by students. FD UG GR SR FD = Foundation, UG = UG Dip, GR = Grad Dip, SR = Senior Chap 2 -26

Graphical Errors: Compressing the Vertical Axis Bad Presentation 200 $ Good Presentation Quarterly Sales

Graphical Errors: Compressing the Vertical Axis Bad Presentation 200 $ Good Presentation Quarterly Sales 50 100 25 0 0 Q 1 Q 2 Q 3 Q 4 $ Quarterly Sales Q 1 Q 2 Q 3 Q 4 Chap 2 -27

Graphical Errors: No Zero Point on the Vertical Axis Bad Presentation $ Monthly Sales

Graphical Errors: No Zero Point on the Vertical Axis Bad Presentation $ Monthly Sales 45 42 39 36 42 39 J $ Monthly Sales 45 36 Good Presentations F M A M J 0 J F M A M J Graphing the first six months of sales Chap 2 -28