Skewness and Kurtosis Divergence from normality Lecture 8

Symmetry and asymmetry � Given a data set, we say that it is symmetric

�This can be observed in stem-and-leaf chart, frequency table and in histogram. �In a

� Data set showing departure from symmetry are asymmetric or skewed. � In a

� In a positive skewed distribution, the mean is typically greater than the median.

� Is the following data set symmetric, skewed right or skewed left? � 27

�We can conclude that the data set is left- skewed( negatively skewed ) for

� The following data set: � 11. 2 ; 5 ; 9. 4 ;

� Note that we get contradicting indications from the different ways of determining whether

Measures of skewness � Karl Pearson’s coefficient of skewness: � = ( mean –

�Thumb rules to interpret skewness: • If skewness measure is between -1/2 and +1/2

� Caution: � This is an interpretation of the data you actually have. When

Visual aids to know skewness � Histogram gives a fairly good idea about the

Kurtosis �Central tendency, variability and shape are the important characteristics of a data set.

�Based on the extent of peakedness, kurtosis is categorised into three types. • Mesokurtic

�The ratio of fourth central moment to the square of the variance is used

�In a leptokurtic dist. more observations cluster around the mean and the spread may

Remark �If we consider the graphs of three normal curves with common mean (=0)

�Kurtosis as a descriptive measure of data is usually not discussed much in research

Slides: 24

Download presentation

Skewness and Kurtosis (Divergence from normality ) Lecture – 8 15/09/2016 Prof. K. K. Achary YRC

Symmetry and asymmetry � Given a data set, we say that it is symmetric about a central value if the observations are distributed symmetrically about the central value. � In symmetrically distributed dataset, the frequency will be maximum at the central value and will decrease in the same pattern on either side of the central value. The three measures – mean , median and mode coincide. � A symmetric distribution is one where the left and right hand sides of the distribution are roughly equally balanced around the mean.

�This can be observed in stem-and-leaf chart, frequency table and in histogram. �In a symmetric dataset mean, median and mode are equal/lie very close. Median is equidistant from the two quartiles. �The histogram below shows a typical symmetric distribution �A distribution which lacks symmetry is called a skewed distribution

� Data set showing departure from symmetry are asymmetric or skewed. � In a skewed distribution/ dataset , the frequency curve has a long tail. Skewness is right -tailed or positive , if the tail extends to the right , i. e. towards larger values. Frequencies are smaller as the values / measurements increase � Skewness is left- tailed or negative if the tail extends to the left , i. e. towards smaller values. Frequencies are larger as the values increase � The following frequency curve shows positive skewness. � Draw the frequency curve of negative skewed distribution.

Positively skewed distribution

� In a positive skewed distribution, the mean is typically greater than the median. Median is closer to the first quartile. � In a negatively skewed distribution, the mean is typically smaller than the median. Median is closer to third quartile. � An important comment: The relative positions of mean, median and mode in skewed distributions are often given as follows: � For + ve skewed distribution mean > median > mode. � For – ve skewness, mean < median < mode � This relationship is not always true!!!!!!!

� Is the following data set symmetric, skewed right or skewed left? � 27 ; 28 ; 30 ; 32 ; 34 ; 38 ; 41 ; 42 ; 43 ; 44 ; 46 ; 53 ; 56 ; 62 � Answer : � The statistics of the data set are � mean: 41. 14 � first quartile: 31. 75; � median: 41. 5; � third quartile: 47. 75.

�We can conclude that the data set is left- skewed( negatively skewed ) for two reasons. �The mean is less than the median. There is only a very small difference between the mean and median, so this is not a very strong reason. �A better reason is that the median is closer to upper quartile than the lower quartile.

� The following data set: � 11. 2 ; 5 ; 9. 4 ; 14. 9 ; 4. 4 ; 18. 8 ; − 0. 4 ; 10. 5 ; 8. 3 ; 17. 8 � The statistics of the data set are • • mean: 9. 99; first quartile: 6. 65; median: 9. 95; third quartile: 13. 05.

� Note that we get contradicting indications from the different ways of determining whether the data is skewed right or left. • The mean is slightly greater than the median. This would indicate that the data set is skewed right. • The median is slightly closer to the third quartile than the first quartile. This would indicate that the data set is skewed left. � Since these differences are so small and since they contradict each other, we conclude that the data set is symmetric.

Measures of skewness � Karl Pearson’s coefficient of skewness: � = ( mean – mode )/S. D. � If + ve, then the data is positively skewed 2 � Bowley’s coefficient= ( Q 3 + Q 1 - *M)/( Q 3 -Q 1) � How to interpret this ? � In a moderately skewed distribution (mean - mode)= 3(mean – median ) � The measures are free from unit of measurement. One more measure based on third central moment and variance is also used. (β 1) � HW: � Find the nature of skewness of systolic BP data for three groups of individuals.

�Thumb rules to interpret skewness: • If skewness measure is between -1/2 and +1/2 the distribution is approximately symmetric and if the measure is equal to zero , then the distribution is symmetric • If skewness is between − 1 and −½ or between +½ and +1, the distribution is moderately skewed. • If skewness is less than − 1 or greater than +1, the distribution is highly skewed.

� Caution: � This is an interpretation of the data you actually have. When you have data for the whole population, that’s fine. � But when you have a sample, the sample skewness doesn’t necessarily apply to the whole population. In that case the question is, from the sample skewness, can you conclude anything about the population skewness? ( Inference is difficult!)

Visual aids to know skewness � Histogram gives a fairly good idea about the nature of skewness in your data. � Stem and leaf plot also helps. � It is important to understand the nature of skewness in your data , because the inference techniques vary for skewed data and normal(symmetric) data. � The parametric inference largely relies on the assumption of normality in your data. Presence of asymmetry is indication of nonnormality.

Kurtosis �Central tendency, variability and shape are the important characteristics of a data set. The shape of a distribution is described by skewness and kurtosis. � While skewness describes asymmetry in shape, kurtosis typically describes ‘peakedness” of data set/distribution.

�Based on the extent of peakedness, kurtosis is categorised into three types. • Mesokurtic distribution- ideal or benchmark distribution- normal distribution. The peakedness of other distributions is compared with this distribution. • Leptokurtic distribution- a distribution which is more peaked than mesokurtic • Platykurtic distribution - distribution which is flatter than mesokurtic.

�The ratio of fourth central moment to the square of the variance is used as a coefficient of kurtosis, denoted as �For a mesokurtic dist. �For a leptokurtic dist. �For platykurtic dist. �A normal distribution is symmetric and mesokurtic �The following graph shows the three curves

�In a leptokurtic dist. more observations cluster around the mean and the spread may be less. �In a platykurtic dist. the observations are less concentrated around the mean and hence spread may be more. �Some remarks: • Describing kurtosis in terms of peakedness alone is not correct. It should take into consideration the tails of the distribution also.

Remark �If we consider the graphs of three normal curves with common mean (=0) and variances of 2, 0. 5 and 1. 0, the curve with variance 0. 5 looks more peaked and the curve with variance 2 looks less peaked than the curve with variance 1. But all curves represent normal distribution and hence all are mesokurtic. We have to be very careful when comparing kurtosis of distributions with different variances.

�Kurtosis as a descriptive measure of data is usually not discussed much in research applications. Since in most of the data analysis, the focus is on normality assumption, researchers ignore kurtosis. � But, skewness and kurtosis are very important to understand departure from normality. �Kurtosis of any distribution is studied in relation to a normal distribution.