Introduction to Statistics for the Social Sciences SBS
Introduction to Statistics for the Social Sciences SBS 200 - Lecture Section 001, Fall 2019 Social Sciences Room 100 10: 00 - 10: 50 Mondays, Wednesdays & Fridays. November 13
A note on doodling
Schedule of readings Before next exam (November 22) Please read chapters 1 - 11 in Open. Stax textbook Please read Chapters 2, 3, and 4 in Plous Chapter 2: Cognitive Dissonance Chapter 3: Memory and Hindsight Bias Chapter 4: Context Dependence
No Labs this week
This lab builds on the work we did in our very first lab. But now we are using the correlation for prediction. This is called regression analysis 4 Project lations e r r o C ses y l a - Two n A ssion e r g e R - Two
We refer to the predicted variable as the dependent variable (Y) and the predictor variable (X) as the independent variable Why are we finding the regression line? How would we use it? ion s s re t reg ficien ) f coe (slope corr e coef lation ficie (“r”) nt
Correlation: Measure of how two variables co-occur and also can be used for prediction • Range between -1 and +1 • The closer to zero the weaker the relationship and the worse the prediction • Positive or negative Remember, We’ll call the correlations “r”
Correlation Range between -1 and +1 +1. 00 perfect relationship = perfect predictor +0. 80 strong relationship = good predictor +0. 20 weak relationship = poor predictor 0 no relationship = very poor predictor -0. 20 weak relationship = poor predictor -0. 80 strong relationship = good predictor -1. 00 perfect relationship = perfect predictor
The more closely the dots approximate a straight line, the stronger the relationship is. Time in the house by time outside of house Time in house Perfect correlation = +1. 00 or -1. 00 • One variable perfectly predicts the other • No variability in the scatterplot • The dots approximate a straight line Negative Correlation Speed (mph) and time to finish race Percent correct on exam by number correct on exam Number Correct Height in inches and height in feet Time outside Percent Correct Positive correlation Negative correlation Positive correlation
Positive correlation Remember, Correlation = “r” Positive correlation: • as values on one variable go up, so do values for other variable • pairs of observations tend to occupy similar relative positions • higher scores on one variable tend to co-occur with higher scores on the second variable • lower scores on one variable tend to co-occur with lower scores on the second variable • scatterplot shows clusters of point from lower left to upper right
Negative correlation Remember, Correlation = “r” Negative correlation: • as values on one variable go up, values for other variable go down • pairs of observations tend to occupy dissimilar relative positions • higher scores on one variable tend to co-occur with lower scores on the second variable • lower scores on one variable tend to co-occur with higher scores on the second variable • scatterplot shows clusters of point from upper left to lower right
Zero correlation • as values on one variable go up, values for the other variable go. . . anywhere • pairs of observations tend to occupy seemingly random relative positions • scatterplot shows no apparent slope
Correlation does not imply causation Is it possible that they are causally related? Yes, but the correlational analysis does not answer that question What if it’s a perfect correlation – isn’t that causal? Number of Birthdays No, it feels more compelling, but is neutral about causality Number of Birthday Cakes
Positive correlation: as values on one variable go up, so do values for other variable Negative correlation: as values on one variable go up, the values for other variable go down Number of bathrooms in a city and number of crimes committed Positive correlation
Linear vs curvilinear relationship Linear relationship is a relationship that can be described best with a straight line Curvilinear relationship is a relationship that can be described best with a curved line
http: //www. ruf. rice. edu/~lane/stat_sim/reg_by_eye/index. html http: //argyll. epsb. ca/jreed/math 9/strand 4/scatter. Plot. htm Let’s estimate the correlation coefficient for each of the following r = +. 98 Remember, Correlation = “r” r =. 20
http: //www. ruf. rice. edu/~lane/stat_sim/reg_by_eye/index. html http: //argyll. epsb. ca/jreed/math 9/strand 4/scatter. Plot. htm Let’s estimate the correlation coefficient for each of the following r = +. 83 r = -. 63
http: //www. ruf. rice. edu/~lane/stat_sim/reg_by_eye/index. html http: //argyll. epsb. ca/jreed/math 9/strand 4/scatter. Plot. htm Let’s estimate the correlation coefficient for each of the following r = +. 04 r = -. 43
Correlation The more closely the dots approximate a straight line, the stronger the relationship is. • Perfect correlation = +1. 00 or -1. 00 • One variable perfectly predicts the other • No variability in the scatter plot • The dots approximate a straight line
This shows a strong positive relationship (r = 0. 97) between the price of the house and its eventual sales price r = +0. 97 Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number)
r = +0. 97 r = -0. 48 This shows a moderate negative relationship (r = -0. 48) between the amount of pectin in orange juice and its sweetness Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number)
Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number) r = -0. 91 This shows a strong negative relationship (r = -0. 91) between the distance that a golf ball is hit and the accuracy of the drive
Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number) This shows a moderate positive relationship (r = 0. 61) between the price of the length of stay in a hospital and the number of services provided r = -0. 91 r = 0. 61
r = +0. 97 r = -0. 48 r = -0. 91 r = 0. 61
Bothaxes have real and values numbers listed are labeled 48 52 5660 64 68 72 Height of Mothers (in) Variable name is listed clearly This shows the strong positive (r = +0. 8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 56 60 64 68 72 76 Height of Daughters (inches) Variable name is listed clearly Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number)
Bothaxes have real and values numbers listed are labeled 48 52 5660 64 68 72 Height of Mothers (in) Variable name is listed clearly This shows the strong positive (r = +0. 8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 56 60 64 68 72 76 Height of Daughters (inches) Variable name is listed clearly Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number)
Bothaxes have real and values numbers listed are labeled 48 52 5660 64 68 72 Height of Mothers (in) Variable name is listed clearly This shows the strong positive (r = +0. 8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 56 60 64 68 72 76 Height of Daughters (inches) Variable name is listed clearly Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number)
Bothaxes have real and values numbers listed are labeled 48 52 5660 64 68 72 Height of Mothers (in) Variable name is listed clearly This shows the strong positive (r = +0. 8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 56 60 64 68 72 76 Height of Daughters (inches) Variable name is listed clearly Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number)
Bothaxes have real and values numbers listed are labeled 48 52 5660 64 68 72 Height of Mothers (in) Variable name is listed clearly This shows the strong positive (r = +0. 8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 56 60 64 68 72 76 Height of Daughters (inches) Variable name is listed clearly Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number)
Break into groups of 2 or 3 Each person hand in own worksheet. Be sure to list your name and names of all others in your group Use examples that are different from those is lecture 1. Describe one positive correlation Draw a scatterplot (label axes) 2. Describe one negative correlation Draw a scatterplot (label axes) 3. Describe one zero correlation Draw a scatterplot (label axes) You h a 12 mi ve nutes (appr oxim minut ately r exa 2 mple es pe ) 4. Describe one perfect correlation (positive or negative) Draw a scatterplot (label axes) 5. Describe curvilinear relationship Draw a scatterplot (label axes)
Bothaxes have real and values numbers listed are labeled 48 52 5660 64 68 72 Height of Mothers (in) Variable name is listed clearly This shows the strong positive (r = +0. 8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 56 60 64 68 72 76 Height of Daughters (inches) Variable name is listed clearly Description includes: Both variables Strength (weak, moderate, strong) Direction (positive, negative) Estimated value (actual number) 1. Describe one positive correlation Draw a scatterplot (label axes) 2. Describe one negative correlation Draw a scatterplot (label axes) 3. Describe one zero correlation Draw a scatterplot (label axes) 4. Describe one perfect correlation (positive or negative) Draw a scatterplot (label axes) 5. Describe curvilinear relationship Draw a scatterplot (label axes) n i d n a H n o i t a l e Corr t e e h s work
- Slides: 33