Summary Five numbers summary percentiles mean Box plot
Summary • Five numbers summary, percentiles, mean • Box plot, modified box plot • Robust statistic – mean, median, trimmed mean • outlier • Measures of variability • range, IQR, average absolute deviation, variation and standard deviation • Average distance between each data value and the mean is zero.
Standard deviation – empirical rule
Standard deviation – empirical rule
Standard deviation – empirical rule
population (census) vs. sample parameter (population) vs. statistic (sample)
Bias, sampling •
SRS • sampling with replacement • Generates independent samples. • Two sample values are independent if that we get on the first one doesn't affect what we get on the second. • sampling without replacement • Deliberately avoid choosing any member of the population more than once. • This type of sampling is not independent, however it is more common. • The error is small as long as 1. the sample is large 2. the sample size is no more than 10% of population size
Bessel’s game Sample average Sample variance (n-1) Sample variance (n) 0, 2 1 0, 4 2 8 4 2, 0 1 2, 4 3 2 1 4, 0 2 8 4 4, 2 3 2 1 0, 0 0 2, 2 2 0 0 4, 4 4 0 0 average
Histogram revision • Distribution – the pattern of values in the data • Histogram – visualizing the distribution • We can see • whether the data tend to be close to the particular value • whether the data varies a lot or a little about the most common values • whether that variation tends to be more above or below the common values • whethere are unusually large or small values in the data
Life expectancy data – histogram • Use interactive histogram applet to generate histogram frequency with bin size of 10, starting at 40. life expectancy
frequency Life expectancy data – histogram life expectancy
Making conclusions from a histogram frequency • What all you can tell for life expectancy data? • how many modes? • where is the mode? • symmetric, left skewed or right skewed? • outliers – yes or no? life expectancy
Making conclusions from a histogram frequency • Where is the mode, the median, the mean? life expectancy
Five numbers summary Min. Q 1 Median Q 3 Max. 47. 79 64. 67 73. 24 76. 65 83. 39 What is the position of the mean and the median?
symmetric, left or rigt skewed?
STANDARDIZING normování
Playing chess • Pretend I am a chess player. • Which of the following tells you most about how good I am: 1. 2. 3. My rating is 1800. 8110 th place among world competitive chess players. Ranked higher than 88% of competitive chess players.
Distribution We should use relative frequencies and convert all absolute frequencies to proportions. Distribution of scores in one particular year
Height data – absolute frequencies http: //wiki. stat. ucla. edu/socr/index. php/SOCR_Data_Dinov_020108_Heights. Weights
Height data – relative frequencies
Height data – relative frequencies What proportion of values is between 170 cm and 173. 75 cm? 30%
Height data – relative frequencies What proportion of values is between 170 cm and 175 cm? We can’t tell for certain.
• How should we modify data/histogram to allow us a more detail? 1. 2. 3. Adding more value to the dataset Increasing the bin size A smaller bin size
Height data – relative frequencies What proportion of values is between 170 cm and 175 cm? 36%
Height data – relative frequencies
Decreasing bin size • Check out what happens with the smallest bin size for Physics Test Scores from http: //quarknet. fnal. gov/cosmics/histo. shtml.
Height
Height data – relative frequencies
Normal distribution recall the empirical rule 68 -95 -99. 7
Empirical rule 0 1 2 3 4 5 6 -3 -2 -1 0 +1 +2 +3
Z Z – number of standard deviations away from the mean If the Z-value is 1, how many percent are less than that value? cca 84 % -3 -2 -1 0 +1 +2 +3
Who is more popular? Let’s demonstrate the importance of Z-scores with the following example.
Who is more popular s. d. = 36 Z = -3. 53 s. d. = 60 Z = -2. 57
Standardizing
Formula •
Quiz • What does a negative Z-score mean? 1. The original value is negative. 2. The original value is less than mean. 3. The original value is less than 0. 4. The original value minus the mean is negative.
Quiz II • If we standardize a distribution by converting every value to a Z-score, what will be the new mean of this standardized distribution? • If we standardize a distribution by converting every value to a Z-score, what will be the new standard deviation of this standardized distribution?
Standard normal distribution
Standard normal distribution
Meaning of relative frequencies 5 2 3 2 4 1 3 4 3 3 3 4 4 5 3 1 3 2
Histogram of these data
Probability density function (PDF) Hustota pravděpodobnosti
Standard normal distribution
- Slides: 46