2 Frequency distributions Stemplot frequency tables histograms 342021
2: Frequency distributions Stemplot, frequency tables, histograms 3/4/2021 Frequency Distributions 1
Stem-and-leaf plots (stemplots) Analyses start by exploring data with pictures My favorite technique is the stemplot: stemplot a histogram-like display of data points You can observe a lot by looking – Yogi Berra 3/4/2021 Frequency Distributions 2
Illustrative example: sample. sav A SRS of AGE (in years) Data as an ordered array (n = 10): 05 11 21 24 27 28 30 42 50 52 Divide each data point into n n Stem values first one or two digits Leaf values next digit In this example n n n 3/4/2021 Stem values tens place Leaf values ones place e. g. , 21 has a stem value of 2 and leaf value of 1 Frequency Distributions 3
Stemplot (cont. ) Draw stem-like axis from lowest to highest stem 0| 1| 2| 1 3| 4| 5| × 10 axis multiplier (important!) Place leaves next to stem 21 plotted (animation) 3/4/2021 Frequency Distributions 4
Continue plotting … Rearrange leaves in rank order: 0|5 1|1 2|1478 3|0 4|2 5|02 × 10 For discussion, let’s rotate the plot 3/4/2021 Frequency Distributions 8 7 4 2 5 1 1 0 2 0 ------0 1 2 3 4 5 (x 10) ------Rotated stemplot 5
Interpreting frequency distributions Central Location n n Gravitational center mean Middle value median Spread n n Range and inter-quartile range Standard deviation and variance (next week) Shape n n n 3/4/2021 Symmetry Modality Kurtosis Frequency Distributions 6
Mean = arithmetic average “Eye-ball method” visualize where plot would balance Arithmetic method = total divided by n 8 7 4 2 5 1 1 0 2 0 ------0 1 2 3 4 5 ------^ Grav. Center 3/4/2021 Eye-ball method balances around 25 to 30 Actual arithmetic average = 29. 0 Frequency Distributions 7
Middle point median Count from top to depth of (n + 1) ÷ 2 For illustrative data: n n 3/4/2021 n = 10 Depth of median = (10+1) ÷ 2 = 5. 5 Frequency Distributions 8
Spread variability Easiest way to describe spread is by stating its range, e. g. , “from 5 to 52” (not the best way) A better way is to divide the data into low groups and high groups Quartile 1 = median of low group n Quartile 3 = median of high group n 3/4/2021 Frequency Distributions 9
Shape visual pattern Skyline silhouette of plot n n n Symmetry Mounds Outliers (if any) When n is small, it’s too difficult to describe shape accurately 3/4/2021 Frequency Distributions X X X X X ------0 1 2 3 4 5 ------ 10
What to look for in shape Idealized shape = density curve Look for: n n n 3/4/2021 General pattern Symmetry Outliers Frequency Distributions 11
Symmetrical shapes 3/4/2021 Frequency Distributions 12
Asymmetrical shapes 3/4/2021 Frequency Distributions 13
Modality (no. of peaks) 3/4/2021 Frequency Distributions 14
Kurtosis (steepness of peak) fat tails Mesokurtic (medium) Platykurtic (flat) skinny tails Leptokurtic (steep) Kurtosis can NOT be easily judged by eye 3/4/2021 Frequency Distributions 15
Second example (n = 8) Data: 1. 47, 2. 06, 2. 36, 3. 43, 3. 74, 3. 78, 3. 94, 4. 42 Truncate extra digit (e. g. , 1. 47 1. 4) n n n 3/4/2021 Stem = ones-place Leaves = tenths-place Do not plot decimal |1|4 |2|03 |3|4779 |4|4 (× 1) Center: between 3. 4 & 3. 7 (underlined) n Spread: 1. 4 to 4. 4 n Shape: mound, no outliers n Frequency Distributions 16
Third example (pollution. sav) Regular stem: Regular stemplot (top) too squished Split-stem (bottom) n n 3/4/2021 First 1 on stem leaves 0 to 4 Second 1 on stem leaves 5 to 9 |1|4789 |2|223466789 |3|000123445678 (× 1) Split-stem: |1|4 Note negative skew |1|789 |2|2234 |2|66789 |3|00012344 |3|5678 (× 1) Frequency Distributions 17
How many stem-values? Start with between 4 and 12 stemvalues Then, trial and error to draw out shape for the most informative plot (use judgment) 3/4/2021 Frequency Distributions 18
Body weight (n = 53) Data range from 100 to 260 lbs. 100 lb. multiplier seems too broad (only two stem values) 100 lb. multiplier w/ split stem-values still too broad (only 4 stem values) Try 10 pound stem multiplier 3/4/2021 Frequency Distributions 19
Body weight (n = 53) 10|0166 11|009 12|0034578 13|00359 14|08 15|00257 16|555 17|000255 18|000055567 19|245 20|3 21|025 22|0 23| 24| 25| 26|0 (× 10) 3/4/2021 10|0 means “ 100” Shape: Positive skew, high outlier (260) Location: median = 165 (underlined) Spread: from 100 to 260 Frequency Distributions 20
Quintuple split: Body weight data (n = 53) 1*|0000111 1 t|222222233333 1 f|4455555 1 s|666777777 1. |88888999 2*|0111 2 t|2 2 f| 2 s|6 (× 100) 3/4/2021 Codes: * t f s. for leaves 0 and 1 for leaves two and three for leaves four and five for leaves six and seven for leaves eight and nine Example: 2 t| 2 (× 100) means a value of 222 Frequency Distributions 21
Frequency counts (SPSS plot) Age of participants SPSS provides frequency counts w/ stemplot: Frequency Stem & Leaf 3. 0 means 3. 0 years 2. 00 3. 0 9. 00 4. 0000 28. 00 5. 0000000 37. 00 6. 000000000 54. 00 7. 00000000000000 85. 00 8. 000000000000000000000 94. 00 9. 000000000000000000000000 81. 00 10. 00000000000000000000 90. 00 11. 00000000000000000000000 57. 00 12. 00000000000000 43. 00 13. 00000000000 25. 00 14. 000000 19. 00 15. 00000 13. 00 16. 000000 8. 00 17. 0000 9. 00 Extremes (>=18) Stem width: 1 Each leaf: 2 case(s) 3/4/2021 Because of large n, each leaf represents 2 observations Frequency Distributions 22
Frequency tables AGE | Freq Rel. Freq Cum. Frequency = count Relative frequency = proportion or % Cumulative frequency % less than or equal to current value 3/4/2021 ------+----------- 3 | 2 0. 3% 4 | 9 1. 4% 1. 7% 5 | 28 4. 3% 6. 0% 6 | 37 5. 7% 11. 6% 7 | 54 8. 3% 19. 9% 8 | 85 13. 0% 32. 9% 9 | 94 14. 4% 47. 2% 10 | 81 12. 4% 59. 6% 11 | 90 13. 8% 73. 4% 12 | 57 8. 7% 82. 1% 13 | 43 6. 6% 88. 7% 14 | 25 3. 8% 92. 5% 15 | 19 2. 9% 95. 4% 16 | 13 2. 0% 97. 4% 17 | 8 1. 2% 98. 6% 18 | 6 0. 9% 99. 5% 19 | 3 0. 5% 100. 0% ------+-----------Total | 654 100. 0% Frequency Distributions 23
Class intervals When data sparse group data into class intervals Classes can be uniform or non-uniform 3/4/2021 Frequency Distributions 24
Uniform class intervals Create 4 to 12 class intervals Set end-point convention - include left boundary and exclude right boundary n e. g. , first class interval includes 0 and excludes 10 (0 to 9. 99 years of age) Talley frequencies Calculate relative frequency Calculate cumulative frequency (demo) 3/4/2021 Frequency Distributions 25
Here’s age data in sample. sav… Class Freq Rel. Freq. (%) Cum. Freq (%) 0 – 9. 99 1 10 10 10 – 19. 99 1 10 20 20 – 29. 99 4 40 60 30 – 39. 99 1 10 70 40 – 49. 99 1 10 80 50 – 59. 99 2 20 10 100 -- Total 3/4/2021 Frequency Distributions 26
Histogram – for quantitative data Bars are contiguous 3/4/2021 Frequency Distributions 27
Bar chart – for categorical data Bars are discrete 3/4/2021 Frequency Distributions 28
- Slides: 28