Other Numerical Measures Median Mode Range Percentiles Quartiles

  • Slides: 16
Download presentation
Other Numerical Measures § Median § Mode § Range § Percentiles § Quartiles, Interquartile

Other Numerical Measures § Median § Mode § Range § Percentiles § Quartiles, Interquartile range BUS 304 – Data Charaterization 1

Median v. The middle value -- The value which divides the data in half,

Median v. The middle value -- The value which divides the data in half, with equal sizes above and below Steps: 1. Put your data in ordered array (sort) 2. If n (or N) is odd, the median is the middle number (i. e. the th number) 3. If n (or N) is even, the median is the average of two middle numbers (i. e. the average of the and the +1 th numbers) BUS 304 – Data Charaterization 2

Sensitivity to outliers 0 1 2 3 4 5 6 7 8 9 10

Sensitivity to outliers 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 2. 5 Median does not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 Median = 3 BUS 304 – Data Charaterization 3

Mode v. The value that occurs most often Steps: 1. Put your data in

Mode v. The value that occurs most often Steps: 1. Put your data in ordered array (sort) Mode does not affected by extreme value either. 2. Find the data value(s) that repeats the most frequently 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 No Mode! Mode=5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode=5 and 9 Boston Austin San Diego Los Angels Mode=San Diego BUS 304 – Data Charaterization 4

Find Mode and Median from Frequency Table Below is a frequency table showing Find

Find Mode and Median from Frequency Table Below is a frequency table showing Find the mean, median and mode. the number of days the teams finish Create a histogram, locate the mode, their projects median and mode. Describe the shape of the histogram, Relative Days to Complete Frequency 5 4 ? 6 12 ? 7 8 ? 8 6 ? 9 4 ? 10 2 ? Frequency and find the relationship between mean, median and mode. BUS 304 – Data Charaterization 5

Shape of a distribution Symmetric Mean = Median = Mode Right-Skewed Left-Skewed Mean <

Shape of a distribution Symmetric Mean = Median = Mode Right-Skewed Left-Skewed Mean < Median < Mode (Longer tail extends to left) Mode < Median < Mean (Longer tail extends to right) Note that Mean is affected by the extreme value the most. So mean is always leaning towards the tail compared to the other two measures. BUS 304 – Data Charaterization 6

Measures of center location v v Mean v Median v Mode Mean is generally

Measures of center location v v Mean v Median v Mode Mean is generally used, unless extreme values (outliers) exist; v the next common is median, since the median is not sensitive to extreme values; v mode is sometime used when there is a really large frequency. Think of the example of house price BUS 304 – Data Charaterization 7

Range v Simplest measure of variation v Describe how wide the data spread v

Range v Simplest measure of variation v Describe how wide the data spread v Formula Range = Maximum Value – Minimum Value Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 BUS 304 – Data Charaterization 8

Disadvantage of Range v Ignores the way in which data are distributed v Sensitive

Disadvantage of Range v Ignores the way in which data are distributed v Sensitive to outliers 1, 1, 1, 2, 2, 3, 3, 4, 5 7 8 9 10 11 12 Range = 5 - 1 = 4 Range = 12 - 7 = 5 1, 1, 1, 2, 2, 3, 3, 4, 120 7 8 9 10 11 12 Range = 120 - 1 = 119 Range = 12 - 7 = 5 Range is affected the most by outliers. BUS 304 – Data Charaterization 9

Break BUS 304 – Data Charaterization 10

Break BUS 304 – Data Charaterization 10

Other measures 1. Percentiles: Measures the percentage of data below the value. e. g.

Other measures 1. Percentiles: Measures the percentage of data below the value. e. g. if the 60 th percentile is 1240 (SAT score), that means there are 60% students getting a score less than 1240. Correspondingly, there are 40% of students getting 1240 or higher. How to find percentile? The pth percentile in an ordered array of n values is the value in the ith position, where BUS 304 – Data Charaterization 11

Example v Find the 80 th percentile from the annual income data v Step:

Example v Find the 80 th percentile from the annual income data v Step: 1. Sort the data 2. Find the location for the 80 th percentile: 3. Find the 81 st person’s income v Think, what does this income mean? v Exercise: find the value where 30% people have the income or higher. v Exercise 2: find the value where 30% people have the income less than it. v Exercise 3: find the value where 50% people have the income less than it. What is the measure also called? BUS 304 – Data Charaterization 12

Quartiles v The 25 th, 50 th, and 75 th percentiles v Called the

Quartiles v The 25 th, 50 th, and 75 th percentiles v Called the first, second, and third quartiles, respectively. v Written as Q 1, Q 2, Q 3, respectively. v The quartiles split the ranked data into 4 equal groups. 25% 25% Q 1 Q 2 Q 3 BUS 304 – Data Charaterization 13

Example: Find the first quartile in the data sample: 22 12 14 16 17

Example: Find the first quartile in the data sample: 22 12 14 16 17 16 13 20 18 Median = the 50 th percentile = the second quartile BUS 304 – Data Charaterization 14

Interquartile Range v Recall: § Range? Disadvantage of range? v Interquartile Range: Interquartile Range

Interquartile Range v Recall: § Range? Disadvantage of range? v Interquartile Range: Interquartile Range = Q 3 – Q 1 Example: 12 13 14 16 16 17 18 20 22 Q 1=13. 5 Q 3=19 Interquartile range = Q 3 – Q 1 = 19 – 13. 5 = 5. 5 BUS 304 – Data Charaterization 15

Summary v Understand compute the following two sets of data measures: § Measures of

Summary v Understand compute the following two sets of data measures: § Measures of central tendency • Mean, Median, and Mode § Measures of variation • Range, Variance, and Standard deviation v Other ways to describe data: § Percentiles, Quartiles, Interquartile range BUS 304 – Data Charaterization 16