Introduction to Statistics for the Social Sciences SBS
Introduction to Statistics for the Social Sciences SBS 200 - Lecture Section 001, Fall 2019 Social Sciences Room 100 10: 00 - 10: 50 Mondays, Wednesdays & Fridays. November 18
A note on doodling
Schedule of readings Before next exam (November 22) Please read chapters 1 - 11 in Open. Stax textbook Please read Chapters 2, 3, and 4 in Plous Chapter 2: Cognitive Dissonance Chapter 3: Memory and Hindsight Bias Chapter 4: Context Dependence
Labs continue this week
This lab builds on the work we did in our very first lab. But now we are using the correlation for prediction. This is called regression analysis 4 Project lations e r r o C ses y l a - Two n A ssion e r g e R - Two
We refer to the predicted variable as the dependent variable (Y) and the predictor variable (X) as the independent variable Why are we finding the regression line? How would we use it? ion s s re ent g e r ici e) f f coe (slop corr e coef lation ficie (“r”) nt
Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number) r = -0. 91 This shows a strong negative relationship (r = -0. 91) between the distance that a golf ball is hit and the accuracy of the drive
Five steps to hypothesis testing Step 1: Identify the research problem (hypothesis) Describe the null and alternative hypotheses For correlation null is that r = 0 (no relationship) Step 2: Decision rule • Alpha level? (α =. 05 or. 01)? • Critical statistic (e. g. critical r) value from table? • Degrees of Freedom = (n – 2) Step 3: Calculations df = # pairs - 2 Step 4: Make decision whether or not to reject null hypothesis If observed r is bigger than critical r then reject null Step 5: Conclusion - tie findings back in to research problem
Finding a statistically significant correlation The result is “statistically significant” if: • the observed correlation is larger than the critical correlation we want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0. 05 (which is our alpha) we want our “p” to be small!! • we reject the null hypothesis • then we have support for our alternative hypothesis
Five steps to hypothesis testing Problem 1 Is there a relationship between the: • Price • Square Feet We measured 150 homes recently sold
Five steps to hypothesis testing Step 1: Identify the research problem (hypothesis) Is there a relationship between the cost of a home and the size of the home Describe the null and alternative hypotheses • null is that there is no relationship (r = 0. 0) • alternative is that there is a relationship (r ≠ 0. 0) Step 2: Decision rule – find critical r (from table) • Alpha level? (α =. 05) • Degrees of Freedom = (n – 2) • 150 pairs – 2 = 148 pairs df = # pairs - 2
Critical r value from table α =. 05 df = 148 pairs Critical value r(148) = 0. 195 df = # pairs - 2
Five steps to hypothesis testing Step 3: Calculations
Five steps to hypothesis testing Step 3: Calculations
Five steps to hypothesis testing Step 3: Calculations r = 0. 726965 Critical value r(148) = 0. 195 Observed correlation r(148) = 0. 726965 Step 4: Make decision whether or not to reject null hypothesis If observed r is bigger than critical r then reject null Yes we reject the null 0. 727 > 0. 195
Conclusion: Yes we reject the null. The observed r is bigger than critical r (0. 727 > 0. 195) Yes, this is significantly different than zero – something going on These data suggest a strong positive correlation between home prices and home size. This correlation was large enough to reach significance, r(148) = 0. 73; p < 0. 05
The correlation result is “significantly different from zero” if • the observed correlation is larger than the critical correlation we want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0. 05 (which is our alpha) we want our “p” to be small!! Correlation matrix: Table showing correlations for all possible pairs of variables Education Age IQ Remember, Correlation = “r” Income Age IQ Income 1. 0** 0. 41* 0. 38* 0. 65** 0. 41* 1. 0** -0. 02 0. 52* 0. 38* -0. 02 1. 0** 0. 27* 0. 65** 0. 52* 0. 27* 1. 0** * p < 0. 05 ** p < 0. 01
The correlation result is “significantly different from zero” if • the observed correlation is larger than the critical correlation we want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0. 05 (which is our alpha) we want our “p” to be small!! Correlation matrix: Table showing correlations for all possible pairs of variables Education Age IQ Income 0. 41* 0. 38* 0. 65** -0. 02 0. 52* 0. 27* Income * p < 0. 05 ** p < 0. 01
The correlation result is “significantly different from zero” if • the observed correlation is larger than the critical correlation we want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0. 05 (which is our alpha) we want our “p” to be small!! Correlation matrix: Table showing correlations for all possible pairs of variables Education Age IQ Income 0. 41* 0. 38* 0. 65** -0. 02 0. 52* 0. 27* Income * p < 0. 05 ** p < 0. 01
The correlation result is “significantly different from zero” if • the observed correlation is larger than the critical correlation we want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0. 05 (which is our alpha) we want our “p” to be small!! Correlation matrices Correlation of X with X Correlation of Y with Y Correlation of Z with Z
The correlation result is “significantly different from zero” if • the observed correlation is larger than the critical correlation we want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0. 05 Does(which this is our alpha) we want our “p” to be small!! correlation reach statistical significance? Correlation matrices Correlation of X with Y valuefor ppvalue correlationofof correlation with. YY XXwith
The correlation result is “significantly different from zero” if • the observed correlation is larger than the critical correlation we want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0. 05 Does(which this is our alpha) we want our “p” to be small!! correlation reach statistical significance? Correlation matrices Correlation of X with Z pp value for correlationofof. X Xwith. ZZ
The correlation result is “significantly different from zero” if • the observed correlation is larger than the critical correlation we want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0. 05 Does(which this is our alpha) we want our “p” to be small!! correlation reach statistical significance? Correlation matrices Correlation of Y with Z ppvaluefor correlationofof YYwith. ZZ
The correlation result is “significantly different from zero” if • the observed correlation is larger than the critical correlation we want our r to be big if we want it to be significantly different from zero!! (either negative or positive but just far away from zero) • the p value is less than 0. 05 (which is our alpha) we want our “p” to be small!! Correlation matrices What do we care about?
Correlation matrices Finding multiple correlations with a single analysis We measured the following characteristics of 150 homes recently sold • Price • Square Feet • Number of Bathrooms • Lot Size • Median Income of Buyers
Correlation matrices What do we care about?
Correlation matrices What do we care about?
Correlation matrices What do we care about?
Correlation matrices What do we care about? Critical value r(148) = 0. 195
Regression: Using the correlation to predict the value of one variable based on its relationship with the other variable The predicted variable goes on the “Y” axis and is called the dependent variable. You probably make this much Yearly Income The predictor variable goes on the “X” axis and is called the independent variable Expenses per year If you spend this much
Jay Z Buys Beyonce a $20 million Heart-Shaped Island for her 29 th Birthday Yearly Income Jay Z probably makes this much Dustin probably makes this much Expenses Dustin spent per year this much Jay Z spent this much Dustin spends $12 for his Birthday
Simple Regression Assumptions Underlying Linear Regression • For each value of X, there is a group of Y values • • These Y values are normally distributed. The means of these normal distributions of Y values all lie on the straight line of regression. • The standard deviations of these normal distributions are equal.
Simple Regression – The prediction line, what is it good for? Prediction line • makes the relationship easier to see (even if specific observations - dots - are removed) • identifies the center of the cluster of (paired) observations • identifies the central tendency of the relationship (kind of like a mean) • can be used for prediction • should be drawn to provide a “best fit” for the data • should be drawn to provide maximum predictive power for the data • should be drawn to provide minimum predictive error
Simple Regression Correlation: Independent and dependent variables When used for prediction (regression) we refer to the predicted variable as the dependent variable and the predictor variable as the independent variable What are we predicting? Dependent Variable Independent Variable Dependent Variable What are we predicting? Independent Variable
Simple Regression What are we predicting? Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down Yearly Income Yearly income by expenses per year Expenses per year Positive Correlation
Simple Regression What are we predicting? Positive correlation: as values on one variable go up, so do values for the other variable Temperature Negative correlation: as values on one variable go up, the values for the other variable go down Temperatures by time spent outside in Tucson in summer Negative Correlation Time outside
Simple Regression What are we predicting? Positive correlation: as values on one variable go up, so do values for the other variable Height Negative correlation: as values on one variable go up, the values for the other variable go down Height by average driving speed Zero Correlation Average Speed
Simple Regression What are we predicting? Positive correlation: as values on one variable go up, so do values for the other variable Amount of sales Negative correlation: as values on one variable go up, the values for the other variable go down Amount spent on advertising by sales in the month Positive Correlation Amount spent On Advertising
Simple Regression - What do we need to define a line Slope = “b” (also “b 1”) How steep the line is Yearly Income Y-intercept = “a” (also “b 0”) Where the line crosses the Y axis Expenses per year
Interpreting regression equation Prediction line Y’ = a + b 1 X 1 The expected cost for dinner as predicted by the number of people Cost = 15. 22 + 19. 96 Persons Cost will be about 95. 06 Y-intercept People If People = 4 If “Persons” = 4, what is the prediction for “Cost”? Cost = 15. 22 + 19. 96 Persons Cost = 15. 22 + 19. 96 (4) Cost = 15. 22 + 79. 84 = 95. 06 If “Persons” = 1, what is the prediction for “Cost”? Cost = 15. 22 + 19. 96 Persons Cost = 15. 22 + 19. 96 (1) Cost = 15. 22 + 19. 96 = 35. 18 Slope
- Slides: 43