Chapter 3 Frequency Distributions September 21 1 In
Chapter 3: Frequency Distributions September 21 1
In Chapter 3: 3. 1 Stemplots 3. 2 Frequency Tables 3. 3 Additional Frequency Charts 2
Stemplots You can observe a lot by looking – Yogi Berra • Start by exploring the data with Exploratory Data Analysis (EDA) • A popular univariate EDA technique is the stemand-leaf plot • The stem of the stemplot is an number-line (axis) • Each leaf represents a data point 3
Stemplot: Illustration • 10 ages (data sequenced as an ordered array) 05 11 21 24 27 28 30 42 50 52 • Draw the stem to cover the range 5 to 52: 0| 1| 2| 1 3| 4| 5| × 10 axis multiplier • Divide each data point into a stem-value (in this example, the tens place) and leaf-value (the ones-place, in this example) • Place leaves next to their stem value • Example of a leaf: 21 (plotted) 4
Stemplot illustration continued … • Plot all data points in rank order: 0|5 1|1 2|1478 3|0 4|2 5|02 × 10 • Here is the plot horizontally 8 7 4 2 5 1 1 0 2 0 ------0 1 2 3 4 5 ------Rotated stemplot 5
Interpreting Distributions • Shape • Central location • Spread 6
Shape • “Shape” refers to the distributional pattern • Here’s the silhouette of our data X X X X X -----0 1 2 3 4 5 ------ • Mound-shaped, symmetrical, no outliers • Do not “over-interpret” plots when n is small 7
Shape (cont. ) Consider this large data set of IQ scores An density curve is superimposed on the graph 8
Examples of Symmetrical Shapes 9
Examples of Asymmetrical shapes 10
Modality (no. of peaks) 11
Kurtosis (steepness) fat tails Mesokurtic (medium) Platykurtic (flat) skinny tails Leptokurtic (steep) Kurtosis is not be easily judged by eye 12
Gravitational Center (Mean) • Gravitational center ≡ arithmetic mean • “Eye-ball method” visualize where plot would balance on see -saw “ – around 30 (takes practice) • Arithmetic method = sum values and divide by n 8 7 4 2 5 1 1 0 2 0 ------0 1 2 3 4 5 ------^ Grav. Center sum = 290 n = 10 mean = 290 / 10 = 29 13
Central location: Median • Ordered array: 05 • • 11 21 24 27 28 30 42 50 52 The median has depth (n + 1) ÷ 2 n = 10, median’s depth = (10+1) ÷ 2 = 5. 5 → falls between 27 and 28 When n is even, average adjacent values Median = 27. 5 14
Spread: Range • For now, report the range (minimum and maximum values) • Current data range is “ 5 to 52” • The range is the easiest but not the best way to describe spread (better methods described later) 15
Stemplot – Second Example • Data: 1. 47, 2. 06, 2. 36, 3. 43, 3. 74, 3. 78, 3. 94, 4. 42 • Stem = ones-place • Leaves = tenths-place • Truncate extra digit (e. g. , 1. 47 1. 4) n n n |1|4 |2|03 |3|4779 |4|4 (× 1) Center: median between 3. 4 & 3. 7 (underlined) Spread: 1. 4 to 4. 4 Shape: mound, no outliers 16
Third Illustrative Example (n = 25) • Data: 14, 17, 18, 19, 22, 23, 24, 26, 27, 28, 29, 30, 30, 31, 32, 33, 34, 35, 36, 37, 38 • Regular stemplot: |1|4789 |2|223466789 |3|000123445678 × 10 • Too squished to see shape 17
Third Illustration; Split Stem • Split stem-values into two ranges, e. g. , first “ 1” holds leaves between 0 to 4, and second “ 1” will holds leaves between 5 to 9 • Split-stem |1|4 |1|789 |2|2234 |2|66789 |3|00012344 |3|5678 × 10 • Negative skew now evident) 18
How many stem-values? • Start with between 4 and 12 stem-values • Then, use trial and error using different stem multipliers and splits → use plot that shows shape most clearly 19
Fourth Example: n = 53 body weights Data range from 100 to 260 lbs: 20
Data range from 100 to 260 lbs: × 100 axis multiplier only two stemvalues (1× 100 and 2× 100) too few × 100 axis-multiplier w/ split stem 4 stem values might be OK(? ) × 10 axis-multiplier 16 stem values next slide 21
Fourth Stemplot Example (n = 53) 10|0166 11|009 12|0034578 13|00359 14|08 15|00257 16|555 17|000255 18|000055567 19|245 20|3 21|025 22|0 23| 24| 25| 26|0 (× 10) Shape: Positive skew high outlier (260) Central Location: L(M) = (53 + 1) / 2 = 27 Median = 165 (underlined) Spread: from 100 to 260 22
Quintuple-Split Stem Values 1*|0000111 1 t|222222233333 1 f|4455555 1 s|666777777 1. |88888999 2*|0111 2 t|2 2 f| 2 s|6 (× 100) Codes for stem values: * t f s. for for for leaves leaves 0 and 1 two and three four and five six and seven eight and nine For example, 120 is: 1 t|2 (x 100) 23
SPSS Stemplot, n = 654 Frequency counts Frequency Stem & 2. 00 3 9. 00 4 28. 00 5 37. 00 6 54. 00 7 85. 00 8 94. 00 9 81. 00 10 90. 00 11 57. 00 12 43. 00 13 25. 00 14 19. 00 15 13. 00 16 8. 00 17 9. 00 Extremes Stem width: Each leaf: Leaf. . . . 3. 0 means 3. 0 years 0 0000000000000000000000000000000000000000000000000000000000000 00000000000000 000000 000000 (>=18) 1 2 case(s) Because n large, each leaf represents 2 observations 24
Frequency Table AGE • Frequency ≡ count • Relative frequency ≡ proportion • Cumulative [relative] frequency ≡ proportion less than or equal to current value | Freq Rel. Freq Cum. Freq. ------+-----------3 | 2 0. 3% 4 | 9 1. 4% 1. 7% 5 | 28 4. 3% 6. 0% 6 | 37 5. 7% 11. 6% 7 | 54 8. 3% 19. 9% 8 | 85 13. 0% 32. 9% 9 | 94 14. 4% 47. 2% 10 | 81 12. 4% 59. 6% 11 | 90 13. 8% 73. 4% 12 | 57 8. 7% 82. 1% 13 | 43 6. 6% 88. 7% 14 | 25 3. 8% 92. 5% 15 | 19 2. 9% 95. 4% 16 | 13 2. 0% 97. 4% 17 | 8 1. 2% 98. 6% 18 | 6 0. 9% 99. 5% 19 | 3 0. 5% 100. 0% ------+-----------Total | 654 100. 0% 25
Class Intervals • When data sparse, group data into class intervals • Classes intervals can be uniform or nonuniform • Use end-point convention, so data points fall into unique intervals: include lower boundary, exclude upper boundary • (next slide) 26
Class Intervals Freq Table Data: 05 11 21 24 27 28 30 42 50 52 Class 0– 9 10 – 19 20 – 29 30 – 39 40 – 44 50 – 59 Total Freq Relative Freq. (%) Cumulative Freq (%) 1 1 4 1 1 2 10 10% 10 40 10 10 20 100% 10% 20 60 70 80 100% -27
Histogram For a quantitative measurement only. Bars touch. 28
Bar Chart For categorical and ordinal measurements and continuous data in non-uniform class intervals bars do not touch. 29
- Slides: 29