Intro to Descriptive Statistics GTECH 201 Lecture 12

  • Slides: 27
Download presentation
Intro to Descriptive Statistics GTECH 201 Lecture 12

Intro to Descriptive Statistics GTECH 201 Lecture 12

Topics for Today n Measures of Central Tendency n n n Mean, Median, Mode

Topics for Today n Measures of Central Tendency n n n Mean, Median, Mode Sample and Population Mean Weighted Means Selecting Appropriate Measures of Central Tendency Measures of Dispersion n n Variance Standard Deviation

Descriptive vs. Inferential n Descriptive Statistics n n Methods for organizing and summarizing information

Descriptive vs. Inferential n Descriptive Statistics n n Methods for organizing and summarizing information Inferential Statistics n Methods for drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population

Looking at This Data Set… Student Performance in Class Tests

Looking at This Data Set… Student Performance in Class Tests

Overview n n n n Mean Median Mode Sample and Population Mean Weighted Means

Overview n n n n Mean Median Mode Sample and Population Mean Weighted Means Selecting Appropriate Measures of Central Tendency Applying these measures

Mean The mean of a set of n observations is the arithmetic average Mean

Mean The mean of a set of n observations is the arithmetic average Mean of n observations x 1, x 2, x 3, …. xn is In Excel, =AVERAGE(insert range)

Median n The data value that is exactly in the middle of an ordered

Median n The data value that is exactly in the middle of an ordered list if the number of pieces of data is odd The mean of the two middle pieces of data in an ordered list if the number of pieces of data is even The median is a typical value; it is the midpoint of observations when they are arranged in an ascending or descending order

Mode n n The most frequent data value; i. e. , any value having

Mode n n The most frequent data value; i. e. , any value having the highest frequency among the observations In Excel, you use the functions =MEDIAN (insert range) =MODE (insert range) Unimodal, Bimodal, Multimodal data sets Outliers

Sample and Population Means n Mean of a data set n n Population mean

Sample and Population Means n Mean of a data set n n Population mean if data set includes entire population Sample mean if data set is only a sample of the population

Weighted Means n To calculate the mean when your information is available only in

Weighted Means n To calculate the mean when your information is available only in the form of summary data C Interval 25 – 29. 9 30 – 34. 9 35 – 39. 9 Freq 4 5 12

Skewed Distributions

Skewed Distributions

Skewed Distributions n When there is one mode and the distribution is symmetric n

Skewed Distributions n When there is one mode and the distribution is symmetric n n Positive skew n n n mean, median, mode are the same mean moves towards the positive tail median also pulls towards the positive tail Negative skew n n mean moves towards the negative tail median also moves towards the negative tail

Selecting Appropriate Measures n Mean n Median n affected by extreme values includes all

Selecting Appropriate Measures n Mean n Median n affected by extreme values includes all observations, therefore comprehensive (useful for interval/ratio data) not affected by the number of observations reveals typical situations (used for ordinal data) Mode n useful for nominal variables

Other Useful Calculations n In addition to the sum of data, Sx we need

Other Useful Calculations n In addition to the sum of data, Sx we need to be able to calculate:

Variability or Spread n n n Mean and the median - limits Range –

Variability or Spread n n n Mean and the median - limits Range – coarse measure of variability Percentiles n n n kth percentile is the point at which k percent of the numbers fall below it and the rest are fall above it 25 th percentile (lower quartile) 50 th percentile (median) 75 th percentile (upper quartile) Interquartile range (difference between the 25 th percentile value and the 75 th percentile value)

Describing the Spread n A five number summary n n Median Quartiles Extremes Variance

Describing the Spread n A five number summary n n Median Quartiles Extremes Variance and Standard Deviation n n Measures spread about the mean Standard deviation cannot be discussed without the mean

Calculating Percentiles In the list of twelve observations 2 4 7 11 11 14

Calculating Percentiles In the list of twelve observations 2 4 7 11 11 14 16 16 24 29 3 Compute median, 25 th and 75 th percentiles The lower quartile is the median of the 6 observations that fall below the median The upper quartile is the median of the 6 observations that fall above the median

Five Number Summary n n n Median = 11 Lower Quartile = 9 Upper

Five Number Summary n n n Median = 11 Lower Quartile = 9 Upper Quartile = 16 Extremes are 2 and 29 Can compute the range = 27 In a symmetric distribution, the lower and upper quartiles are equally distant from the median

Variance n Is the mean of the squares of the deviations of the observations

Variance n Is the mean of the squares of the deviations of the observations from their mean Population variance n Sample variance n

Example The heights, in inches for five starting players in a men’s college basket

Example The heights, in inches for five starting players in a men’s college basket ball team are: 67 72 76 76 84 Compute the mean and standard deviation. = 75

Standard Deviation n n Standard deviation is positive square root of the variance Variance

Standard Deviation n n Standard deviation is positive square root of the variance Variance in our basketball example: = 39

Formulas – Standard Deviation Standard deviation of a sample Standard deviation of a population

Formulas – Standard Deviation Standard deviation of a sample Standard deviation of a population

Example (Continued)

Example (Continued)

Short Cut – Simpler Formula Standard Deviation of a sample Sum of the squares

Short Cut – Simpler Formula Standard Deviation of a sample Sum of the squares of data values, i. e. , you square each data value and then sum those squared values Square of the sum of data values, i. e. , you sum all the data values and then square that sum

Example (using the short cut)

Example (using the short cut)

Interpreting Std. Deviation n n s and s 2 will be small when all

Interpreting Std. Deviation n n s and s 2 will be small when all the data are close together The deviations from the mean n n n Will be both positive and negative Sum will always be 0 s is always 0 or a positive number s = 0 means no spread; as s value increases, the spread of the data increases The units of s are the same as the original observations s is heavily influenced by outliers

Coefficient of Variation CV is the standard deviation described as a percent of the

Coefficient of Variation CV is the standard deviation described as a percent of the mean CV = CV is useful when comparing different sets of data where sample size and standard deviation are different