Chapter 2 Describing Distributions with Numbers Numerical Summaries
Chapter 2 Describing Distributions with Numbers
Numerical Summaries of: • Central location – mean – median • Spread – Range – Quartiles – Standard Deviation / variance • Shape measures not covered
Arithmetic Mean • Most common measure of central location • Notation (“xbar”): Where n is the sample size ∑ is the summation symbol
Example: Sample Mean Data: Metabolic rates, calories / day: 1792 1666 1362 1614 1460 1867 1439
Median (M) • Half the values are less than the median, half are greater • If n is odd, the median is the middle ordered value • If n is even, the median is the average of the two middle ordered values
Examples: Median • Example 1: 2 4 6 Median = 4 • Example 2: 2 4 6 8 Median = 5 (average of 4 and 6) • Example 3: 6 2 4 Median 2 (Values must first be ordered first 2 4 6 , Median = 4)
Example: Median The location of the median in ordered array: L(M) = (n + 1) / 2 Data = metabolic rates in slide 4 (n = 7) Ordered array: 1362 1439 1460 1614 1666 1792 1867 median Value of median = 1614 6/17/2021 Chapter 2 7
The Median is robust to outliers This data set: 1362 1439 1460 1614 1666 1792 1867 has median 1614 and mean 1600 This similar data with high outlier: 1362 1439 1460 1614 1666 1792 9867 still has median 1614 but now has mean 2742. 9 6/17/2021 Chapter 2 8
The skew pulls the mean • The average salary at a high tech firm is $250 K / year • The median salary is $60 K • What does this tell you? • Answer: There are some very highly paid executives, but most of the workers make modest salaries, i. e. , there is a positive skew to the distribution 6/17/2021 Chapter 2 9
Spread = Variability • Amount of spread around the center! • Statistical measures of spread – Range – Inter-Quartile Range – Standard deviation 6/17/2021 Chapter 2 10
Range and IQR • Range = maximum – minimum • Easy, but NOT as good as the… • Quartiles & Inter-Quartile Range (IQR) – Quartile 1 (Q 1) cuts off bottom 25% of data (“ 25 th percentile”) – Quartile 2 (Q 2) cuts off two-quarters of data – same as the Median! – Quartile 3 (Q 3) cuts off three-quarters of the data (“ 75 th percentile”)
Obtaining Quartiles • Order data • Find the median • Look at the lower half of data set – Find “median” of this lower half – This is Q 1 • Look at the upper half of the data set. – Find “median” of this upper half – This is Q 3 6/17/2021 Chapter 2 12
Example: Quartiles Consider these 10 ages: 05 11 21 24 27 28 median 30 42 50 52 The median of the bottom half (Q 1) = 21 05 11 21 24 27 The median of the top half (Q 3) = 42 28 30 42 50 52 6/17/2021 Chapter 2 13
Example 2: Quartiles, n = 53 Median = 165 L(M)=(53+1) / 2 = 27 6/17/2021 Chapter 2 14
Example 2: Quartiles, n = 53 Bottom half has n* = 26 L(Q 1)=(26 + 1) / 2= 13. 5 from bottom Q 1 = avg(127, 128) = 127. 5 6/17/2021 Chapter 2 15
Example 2: Quartiles, n = 53 Top half has n* = 26 L(Q 3) = 13. 5 from the top! Q 3 = avg(185, 185) = 185 6/17/2021 Chapter 2 16
Example 2 Quartiles Q 1 = 127. 5 Q 2 = 165 Q 3 = 185 "5 point summary" = {Min, Q 1, Median, Q 3, Max} = {100, 127. 5, 165, 185, 260} 6/17/2021 Chapter 2 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 0166 009 0034578 00359 08 00257 555 000255 000055567 245 3 025 0 0 17
Inter-quartile Range (IQR) • Q 1 = 127. 5 • Q 3 = 185 Inter-Quartile Range (IQR) = Q 3 Q 1 = 185 – 127. 5 = 57. 5 “spread of middle 50%” 6/17/2021 Chapter 2 18
Simple Box 5 -point summary graphically min 100 Q 1 125 M 150 Q 3 175 max 200 225 250 275 Weight 6/17/2021 Chapter 2 19
Boxplots are useful for comparing groups 6/17/2021 Chapter 2 20
Standard Deviation & Variance • Most popular measures of spread • Each data value has a deviation, defined as: 6/17/2021 Chapter 2 21
Example: Deviations Metabolic data (n = 7) 6/17/2021 Chapter 2 22
Variance • • • Find the mean Find the deviation of each value Square the deviations Sum the squared deviations Divide by (n − 1) 6/17/2021 Chapter 2 23
Data: Metabolic rates, n = 7 1792 1666 1362 1614 1460 1867 1439 6/17/2021 Chapter 2 24
“Sum of Squares” Deviations 1792 1600 = 192 1666 1600 = 1362 1600 = -238 1614 1600 = 1460 1600 = -140 (-140)2 = 19, 600 1867 1600 = 267 (267)2 = 71, 289 1439 1600 = -161 (-161)2 = 25, 921 0 214, 870 SUMS 11, 200 6/17/2021 Squared deviations Obs Chapter 2 66 14 (192)2 = 36, 864 (66)2 = 4, 356 (-238)2 = 56, 644 (14)2 = 196 25
Variance Sum of Squares 6/17/2021 Chapter 2 26
Standard Deviation Square root of variance 6/17/2021 Chapter 2 27
Standard Deviation Direct Formula 6/17/2021 Chapter 2 28
Use calculator to check work! I’m supporting the TI-30 XIIS only TI-30 XIIS sequence: • On > CLEAR > 2 nd > STAT > Scroll > Clear Data > Enter • 2 nd > STAT > 1 -VAR or 2 -VAR • DATA > “enter data • STATVAR key
Choosing Summary Statistics • Use the mean and standard deviation to describe symmetrical distributions & distributions free of outliers • Use the median and quartiles (IQR) to describe distributions that are skewed or have outliers 6/17/2021 Chapter 2 30
Example: Number of Books Read n = 52 L(M)=(52+1)/2=26. 5 M 6/17/2021 Chapter 2 31
Example: Books read, n = 52 5 -point summary: 0, 1, 3, 5. 5, 99 Highly asymmetric distribution 0 10 20 30 40 50 60 Number of books 70 80 90 100 The mean (“xbar” = 7. 06) and standard deviation (s = 14. 43) give false impressions of location and spread for this distribution and are considered inappropriate. Use the median and 5 -point summary instead. 6/17/2021 Chapter 2 32
- Slides: 32