Univariate Descriptive Statistics Heibatollah Baghi and Mastee Badii
Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University 1
Objectives Define measures of central tendency and dispersion. • Select the appropriate measures to use for a particular dataset. • 2
How to Summarize Data? Graphs may be useful, but the information they offer is often inexact. • A frequency distribution provides many details, but often we want to condense a distribution further. • 3
Two Characteristics of Distributions Measures of Central Tendency. 2. Measures of Variability or Scatter. 1. 4
Measures of Central Tendency: Mean l The mean describes the center or the balance point of a frequency distribution. The sample mean: Calculate the mean value for the following data: 23, 24, 25, 26, 27, 28. l 25. 2 l 5
Measures of Central Tendency: Mode The most frequent value or category in a distribution. • Calculate the mode for the following set of values: 20, 21, 22, 22, 23, 24. • 22 • 6
Measures of Central Tendency: Median • • • The middle value of a set of ordered numbers. Calculate for an even number of cases. 21, 22, 23, 24, 26, 27, 28, 29. 25 Calculate for odd number of data with no duplicates: 22, 23, 24, 25, 26, 27, 28. 25 • Median changes when data at center repeats. 7
Comparison of Measures of Central Tendency Mode Most frequently occurring value Nominal, Ordinal, and (sometimes) Interval/Ratio-Level Data Median Ordinal-Level Data and Interval/Ratio-Level Exact center (when odd N) of rank-ordered data or average of data (particularly when two middle values (when even N) skewed) Mean Arithmetic average (Sum of Xs/N) Interval/Ratio-Level Data 8
Comparison of Measures of Central Tendency in Normal Distribution Mean, median and mode are the same • Shape is symmetric • 9
Comparison of Measures of Central Tendency in Bimodal Distribution Mean & median are the same • Two modes different from mean and median • 10
Comparison of Measures of Central Tendency in Negatively Skewed Distributions Mean, median & mode are different • Mode > Median > Mean • Outliers pull the mean away From the median 11
Comparison of Measures of Central Tendency in Positively Skewed Distributions Mean, median & mode are different • Mean > Median > Mode • Outliers pull the mean away From the median 12
Comparison of Measures of Central Tendency in Uniform Distribution • Mean, median & mode are the same point 13
Comparison of Measures of Central Tendency in J-shape Distribution Mode to extreme right • Mean to the right of median • 14
Measures of Variability or Scatter • Reporting only an average without an accompanying measure of variability may misrepresent a set of data. • Two datasets can have the same average but very different variability. 15
Measures of Variability or Scatter: Range • The difference between the highest and lowest score • • Easy to calculate Highly unstable Calculate range for the data: 110, 120, 130, 140, 150, 160, 170, 180, 190 • 190 – 110 = 80 • 16
Measures of Variability or Scatter: Semi Inter-quartile Range • Half of the difference between the 25% quartile and 75% quartile • SQR = (Q 3 -Q 1)/2 • More stable than range 17
Measures of Variability: Sample Variance The sum of squared differences between observations and their mean [ss = Σ (X - M)2 ] divided by n -1. • Sample variance : Standard deviation squared • Formula for sample variance • 18
Measures of Variability or Scatter: Standard Deviation • The squared root of the variance. • Calculate standard deviation for the data: 110, 120, 130, 140, 150, 160, 170, 180, 190. 19
Calculating Standard Deviation • Sample Sum of Squares: • Sample Variance • Sample Standard Deviation SS is the key to many statistics 20
Calculating Standard Deviation Data X-M (X - M)2 110 -40 1600 N-1 9 120 -30 900 130 -20 400 Sample Variance 667 140 -10 100 150 0 0 Standard Deviation 25. 8 150 0 0 160 10 100 170 20 400 180 30 900 190 40 1600 Total 0 6000 (SS) SS is the key to many statistics 21
Formula Variations Calculating formula Defining formula Sum of squares Variance Standard deviation 22
Comparison of Measures of Variability and Scatter • In Normal Distribution • • Range ~ 6 standard deviation Standard Deviation partitions data in Normal Distribution 23
Standardized Scores: Z Scores • Mean & standard deviations are used to compute standard scores Z = (x-m) / s Calculate standard deviation for blood pressure of 140 if the sample mean is 110 and the standard deviation is 10 • Z = 140 – 110 / 10 = 3 • 24
Value of Z Scores • Allows comparison of observed distribution to expected distribution Observed Expected 25
Take Home Lesson Measures of Central Tendency & Variability Can Describe the Distribution of Data 26
- Slides: 26