Chapter 4 Variability Variability l In statistics our

Chapter 4 Variability

Variability l In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. l If all the scores are the same no variability l If small difference, variability is small l If large difference, variability is large

Variability provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together. l Goal: to describe how spread out the scores are in a distribution l

Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a

Variability (cont. ) l Variability will serve two purposes l Describe the distribution l Close together l Spread out over a large distance l Measure how well an individual score (or group of scores) represents the entire distribution

Variability (cont. ) Variability provides information about how much error to expect when you are using a sample to represent a population. l Three measures of variability l l Range l Interquartile range l Standard deviation

Range The range is the difference between the upper real limit of the largest (maximum) X value and the lower real limit of the smallest (minimum) X value. l Range is the most obvious way to describe how spread out the scores are. l

Range (cont. ) Problem: Completely determined by the two extreme values and ignores the other scores in the distribution. l It often does not give an accurate description of the variability for the entire distribution. l Considered a crude and unreliable measure of variability l

Interquartile Range and Semi-Interquartile Range Divide the distribution into four equal parts l Q 1, Q 2, Q 3 l The interquartile range is defined as the distance between the first quartile and the third quartile l

Interquartile Range Semi-interquartile Range 25% Q 1 25% Q 2 25% Q 3

Interquartile Range (cont. ) When the interquartile range is used to describe variability, it commonly is transformed into the semi-interquartile range. l Semi-interquartile range is one-half of the interquartile range l

Interquartile Range (cont. ) l Because the semi-interquartile range is derived from the middle 50% of a distribution, it is less likely to be influenced by extreme scores and therefore gives a better and more stable measure of variability than the range.

Interquartile Range (cont. ) Does not take into account distances between individual scores l Does not give a complete picture of how scattered or clustered the scores are. l

Standard Deviation Most commonly used l Most important measure of variability l Standard deviation uses the mean of the distribution as a reference point and measures variability by considering the distance between each score and the mean. l

Standard Deviation (cont. ) Are the scores clustered or scattered? l Deviation is the average distance and direction from the mean. l

Standard Deviation (cont. ) Goal of standard deviation is to measure the standard, or typical, distance from the mean. l Deviation is the distance and direction from the mean deviation score = X - m l

Standard Deviation (cont. ) l Step 1 l Determine the deviation or distance from the mean for each individual score. If m = 50 X = 53 deviation score = X – m = 53 -50 = +3

Standard Deviation (cont. ) If m = 50 X = 45 deviation score = X – m = 45 -50 = -5

Standard Deviation (cont. ) l Step 2: Calculate the mean of the deviation scores l Add the derivation scores l Divide by N

Standard Deviation (cont. ) X X–m 8 +5 1 -2 3 0 0 -3 Deviation scores must add up to zero S(X – m) = 0

Standard Deviation (cont. ) Step 3: Square each deviation score. l Why? The average of the deviation scores will not work as a measure of variability. l Why? They always add up to zero l

Standard Deviation (cont. ) Step 3 cont. : l Using the squared values, you can now compute the mean squared deviation l This is called variance l l Variance = mean squared deviation

Standard Deviation (cont. ) l By squaring the deviation scores: l You get rid of the + and – l You get a measure of variability based on squared distances l This is useful for some inferential statistics l Note: This distance is not the best descriptive measure for variability

Standard Deviation (cont. ) l Step 4: Make a correction for squaring the distances by getting the square root. l Standard deviation = variance

Sum of Squared Deviations (SS) l Variance = mean squared deviation = SS N Definitional Formula SS = S ( X – m)2

Sum of Squared Deviations (SS) Definitional Formula X–m ( X – m)2 =8 1 -1 1 m=2 0 -2 4 6 +4 16 1 -1 1 X 22 = S ( X – m)2

l Computational Formula SS = S X 2 – (SX)2 N

Computational Formula for SS X X 2 1 1 0 0 6 36 1 1 SX = 8 SX 2 = 38 SS = SX 2 – (SX)2 N = 38 – (8)2 4 = 38 – 64 4 = 38 – 16 = 22

Definitional vs. Computational? Definitional is most direct way of calculating the sum of squares l However if you have numbers with decimals, it can become cumbersome l Computation is most commonly used l

Formulas Variance = SS N l Standard deviation = variance = l SS N

Formulas (cont. ) l Variance and standard deviation are parameters of a population and will be identified with a Greek letter – s or sigma Population standard deviation = s = SS N Population variance = s 2 = SS N

Example (pg. 94) X X–M 1 6 4 3 8 7 6 -4 1 -1 -2 3 2 1 ( X – M)2 S X = 35 16 M = 35/7=5 1 n=7 1 4 9 4 1 36 = S ( X – M)2 = SS

Degrees of Freedom Degrees of freedom, use for sample variance l where n is the number of scores in the sample. l With a sample of n scores, the first n-1 scores are free to vary l but the final score is restricted. l As a result, the sample is said to have -1 degrees of freedom l n

Degrees of Freedom Degrees of freedom, or df, for sample variance are defined as df = n – 1 where n is the number of scores in the sample. l