Central Tendency and Variability The two most essential
Central Tendency and Variability The two most essential features of a distribution
Questions • Define – Mean – Median – Mode • What is the effect of distribution shape on measures of central tendency? • When might we prefer one measure of central tendency to another?
Questions (2) • Define – – Range Average Deviation Variance Standard Deviation • When might we prefer one measure of variability to another? • What is a z score? • What is the point of Tchebycheff’s inequality?
Variables have distributions • A variable is something that changes or has different values (e. g. , anger). • A distribution is a collection of measures, usually across people. • Distributions of numbers can be summarized with numbers (called statistics or parameters).
Central Tendency refers to the Middle of the Distribution
Variability is about the Spread
1. Central Tendency: Mode, Median, & Mean • The mode – the most frequently occurring score. Midpoint of most populous class interval. Can have bimodal and multimodal distributions.
Median • Score that separates top 50% from bottom 50% • Even number of scores, median is half way between two middle scores. – 1 2 3 4 | 5 6 7 8 – Median is 4. 5 • Odd number of scores, median is the middle number – 1 2 3 4 5 6 7 – Median is 4
Mean • Sum of scores divided by the number of people. Population mean is (mu) and sample mean is (X-bar). • We calculate the sample mean by: • We calculate the population mean by:
Deviation from the mean • x = X –. Deviations sum to zero. • Deviation score – deviation from the mean 9 • Raw scores 8 9 10 7 8 9 10 11 -1 -1 0 0 0 1 1 • Deviation scores -2 2
Comparison of mean, median and mode • Mode – Good for nominal variables – Good if you need to know most frequent observation – Quick and easy • Median – Good for “bad” distributions – Good for distributions with arbitrary ceiling or floor
Comparison of mean, median & mode • Mean – Used for inference as well as description; best estimator of the parameter – Based on all data in the distribution – Generally preferred except for “bad” distribution. Most commonly used statistic for central tendency.
Best Guess interpretations • Mean – average of signed error will be zero. • Mode – will be absolutely right with greatest frequency • Median – smallest absolute error
Expectation • • • Discrete and continuous variables Mean is expected value either way Discrete: Continuous: (The integral looks bad but just means take the average)
Influence of Distribution Shape
Review • • What is central tendency? Mode Median Mean
2. Variability aka Dispersion • 4 Statistics: Range, Average Deviation, Variance, & Standard Deviation • Range = high score minus low score. – 12 14 14 16 16 18 20 – range=20 -12=8 • Average Deviation – mean of absolute deviations from the median: Note difference between this definition & undergrad text- deviation from Median vs. Mean
Variance • Population Variance: • Where means population variance, • means population mean, and the other terms have their usual meaning. • The variance is equal to the average squared deviation from the mean. • To compute, take each score and subtract the mean. Square the result. Find the average over scores. Ta da! The variance.
Computing the Variance (N=5) 5 15 -10 10 15 -5 25 15 15 0 0 20 15 5 25 25 15 10 100 Total: 75 0 250 Mean: Variance Is 50
Standard Deviation • Variance is average squared deviation from the mean. • To return to original, unsquared units, we just take the square root of the variance. This is the standard deviation. • Population formula:
Standard Deviation • Sometimes called the root-mean-square deviation from the mean. This name says how to compute it from the inside out. • Find the deviation (difference between the score and the mean). • Find the deviations squared. • Find their mean. • Take the square root.
Computing the Standard Deviation (N=5) 5 10 15 20 25 Total: Mean: Sqrt 15 15 15 75 Variance SD -10 -5 0 5 10 0 Is 100 250 50
Example: Age Distribution
Review • • Range Average deviation Variance Standard Deviation
Standard or z score • A z score indicates distance from the mean in standard deviation units. Formula: • Converting to standard or z scores does not change the shape of the distribution. Z-scores are not normalized.
Tchebycheff’s Inequality (1) • General form Suppose we know mean height in inches is 66 and SD is 4 inches. We assume nothing about the shape of the distribution of height. What is the probability of finding people taller than 74 inches? (Note that b is a deviation from the mean; in this case 74 -66=8. ). Also 74 inches is 2 SDs above the mean; therefore, z = 2. [If we assume height is normally distributed, p is much smaller. But we will get to that later. ]
Tchebycheff (2) • Z-score form • Probability of z score from any distribution being more than k SDs from mean is at most 1/k 2. • Z-scores from the worst distributions are rarely more than 5 or less than -5. • For symmetric, unimodal distributions, |z| is rarely more than 3. For the problem in the previous slide:
Review • Z-score in words • Z-score in symbols • Meaning of Tchebycheff’s theorem
Median House Price Data • Find data • Show Univariate • Show plots
- Slides: 29