Aditi Vaishali Thapar thapar 9osu edu JOHN GLENN

  • Slides: 25
Download presentation
Aditi Vaishali Thapar thapar. 9@osu. edu

Aditi Vaishali Thapar thapar. 9@osu. edu

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Sampling Terms: An Example § Measurement §

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Sampling Terms: An Example § Measurement § Descriptive Statistics: § Measures of Central Tendency § Measures of Dispersion § The Normal Distribution § Inferential Statistics: § Correlation vs. Causation § Hypothesis testing § P-values § Standard Error § Confidence Intervals and Z-Scores 1

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Population vs. Sample § Population § The entire

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Population vs. Sample § Population § The entire group of people or things about which we want information § Sample § Unlikely that we will be able to collect data for the entire population § Representative portion of population about which data is collected. 2

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Statistics vs. Parameters § Summarise data for an

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Statistics vs. Parameters § Summarise data for an entire population § Statistics § Summarise data for a sample § Unit of Analysis: Entity that is being analyzed in a study § Variable: A characteristic of the unit of analysis Image source: https: //www. cliffsnotes. com/study-guides/statistics/sampling/populations-samples-parameters-and-statistics 3

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS What is the demographic information for students who

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS What is the demographic information for students who attend statistics boot camp? § Population: § Sample: § Unit of Analysis: § Variables: § Parameter: § Statistics: 4

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS What is the demographic information for students who

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS What is the demographic information for students who attend statistics boot camp? § Population: § All students who attend statistics boot camp § Sample: § 20 randomly selected students at statistics boot camp § Unit of Analysis: § The individual (i. e. student) § Variables: § Age, gender, income, race, etc. § Parameter: § Average of all students at statistics boot camp, etc. § Statistics: § Average of the randomly 20 selected students at boot camp, etc. 5

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Nominal § Numerical values just "name" the

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Nominal § Numerical values just "name" the attribute uniquely § No ordering of the cases is implied § Example: Numbers on football/basketball jerseys § Ordinal § Attributes can be rank-ordered, numerically § Distances between attributes do not have any meaning. § Example: Coding educational Attainment as 0 = less than high school 1 = high school degree 2 = college degree 3 = Masters, Ph. D, etc. 6

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Interval § The distance between attributes does

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Interval § The distance between attributes does have meaning § Example: When measuring temperature, the distance between 30 F and 40 F is the same as that between 70 F and 80 F. § Ratio § There is always an absolute zero that is meaningful. § i. e. you can construct a meaningful fraction/ratio 7 Source: http: //www. socialresearchmethods. net/kb/measlevl. php

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Central tendencies tell us where most of the

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Central tendencies tell us where most of the data lie § Mean: also known as the average § Add up all the values for your variable, then divide by the total number of values § Median: The middle score for a set of data that has been arranged in order of magnitude. § Mode: The most frequent value in the dataset 8

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS It depends on, both, the type of variable

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS It depends on, both, the type of variable and the distribution of the data § Mode: Typically used when we have categorical data (i. e. gender, race, educational attainment etc. ) § Mean: When we want the average value of a variable, UNLESS our data is skewed. § Median: When we have skewed data and/or outliers Question: What measure of central tendency would you use to calculate the average salary for a group of 10 people where 9 people earn $1 and 1 person earns $100? 9

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Dispersion studies the spread of the data §

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Dispersion studies the spread of the data § Range § | Maximum – Minimum | § Variance § How far each of the observations in the sample dataset lie away from the mean § Standard Deviation § Square root of the variance § A low standard deviation tells us that data points tend to be close to the mean 10

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Question: Given the data below on test

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Question: Given the data below on test scores what is the sample size (N), mean, median, mode, range, standard deviation and variance? 6 10 8 7 6 6 0 4 9 3 11

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § 12

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § 12

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § The normal distribution is a symmetric, bell-shaped

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § The normal distribution is a symmetric, bell-shaped distribution that is completely described by the mean and the standard deviation § The mean describes the centre of the curve § The standard deviation determines the shape 13

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS As the sample size of a random variable

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS As the sample size of a random variable grows larger, the sampling distribution of mean approaches a normal distribution § What does this theorem tell us? § A sample with more observations gives us a truer picture of the actual population § Making assumptions based on samples that are “too small” may make for a biased analysis 14

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Correlation: A single number that describes the

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Correlation: A single number that describes the degree of relationship between two variables. § The value of correlation ranges from -1 to 1 § If the correlation coefficient is positive, this means that the two variables move together § Example: Education and salary (as level of education increases, as does salary) § If the correlation coefficient is negative, this means that the two variables have an inverse relationship § Example: Education and unemployment rate (as the level of education increases, the unemployment rate decreases) § If the correlation coefficient is zero, the two variables do not have a relationship § Example: The weather and salary 15

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Causation is a much stronger relationship than

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § Causation is a much stronger relationship than just correlation Image source: https: //www. dreamstime. com/royalty-free-stock-images-causation-correlation-difference-explained-image 37881989; https: //xkcd. com/925/ 16

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Hypothesis testing is used to compare our observed

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Hypothesis testing is used to compare our observed statistic to other statistics/parameter. § But what does that really mean? § You’re testing whether your results are valid by calculating the odds that your results are a product of chance. § The null hypothesis (H 0) is the hypothesis that we are trying to disprove. Usually, the null hypothesis is a statement of no effect or no difference § The alternative hypothesis (H 1) describes the relationship as we expect it to be § Tests can be either one-tailed or two-tailed 17

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Two-tailed test example: A researcher claims that individuals

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Two-tailed test example: A researcher claims that individuals aged 17 have an average body temperature higher than the commonly accepted average of 98. 6 F. H 0: Individuals aged 17 have an average body temperature that is not greater than 98. 6 F average temp <= 98. 6 F H 1: Individuals aged 17 have an average body temperature that is greater than 98. 6 F average temp > 98. 6 F 18

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS One-tailed test example: A researcher claims that consuming

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS One-tailed test example: A researcher claims that consuming a drug she developed increases student performance on exams. The average student test score is 87. H 0: The drug will have no effect on average student test scores (i. e. they stay constant) average test score = 87 H 1: The drug will increase average student test scores (i. e. they stay constant) average test score > 87 19

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS P-value is the probability of finding an observed

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS P-value is the probability of finding an observed result, assuming that the null hypothesis is true. § There are multiple critical values (1%, 5% and 10%) that we use to test the validity of our claims § The most frequently used critical value is 5% (0. 05) § If the p-value obtained is higher than the 0. 05 threshold, we say that our finding is not statistically significant § Therefore, we cannot reject our null hypothesis. § If the p-value obtained is lower than the 0. 05 threshold, we say that our finding is statistically significant § Therefore, we can reject our null hypothesis, and accept the alternate hypothesis. 20

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Standard error is how far the sample mean

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Standard error is how far the sample mean is likely to be from the population mean. § How does this differ from the standard deviation? § Standard deviation is the degree to which individuals within the sample differ from the sample mean. § Calculated using: § Example: if we only sample 5 universities to examine the impact of ownership on the test score, what is the likelihood that the true average test score is equivalent to that in our sample? 21

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § 22

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS § 22

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Question: You want to investigate the impact of

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS Question: You want to investigate the impact of college degree on income. Therefore, you sample 20 persons that have college degree (Group A) and 20 persons that do not have (Group B). You get the following statistics. What is the 95% confidence intervals of each group? How can we interpret the results? Mean Min Max SE Variance Group A 70, 000 20, 000 130, 000 25, 000 200 Group B 68, 000 0 200, 000 15, 000 400 23

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS 24

JOHN GLENN COLLEGE OF PUBLIC AFFAIRS 24