# Social Statistics Correlation This week What is correlation

Social Statistics: Correlation

Correlation Coefficients ¥ The ¥ relations between two variables How the value of one variable changes when the value of another variable changes ¥A correlation coefficient is a numerical index to reflect the relationship between two variables. Range: -1 ~ +1 ¥ Bivariate correlation (for two variables) ¥ 3

Correlation Coefficients ¥ Parametric ¥ Pearson product-moment correlation (named for inventor Karl Pearson) ¥ Non-parametric Spearman’s rank correlation ¥ Kendall tau rank correlation coefficient ¥ 4

Pearson correlation coefficient ¥ For two variables which are continuous in nature ¥ Height, age, test score, income ¥ But ¥ not for discrete or categorical variables Race, political affiliation, social class, rank Rxy is the correlation between variable X and variable Y 5

Types of correlation coefficients ¥ Direct ¥ correlation (positive correlation): If both variables change in the same direction ¥ Indirect ¥ 6 correlation (negative correlation): If both variables change in opposite directions

Types of correlation coefficients ¥ ¥ 7 Below is Correlation Report of different Currency Exchange Rate on November 13 – 2014 (source: Bloomberg Terminal) -0. 8 and 0. 5, which is stronger?

Pearson product-moment correlation coefficient n X Y XY X 2 Y 2 8 the correlation coefficient between X and Y the size of the sample the individual’s score on the X variable the individual’s score on the Y variable the product of each X score times its corresponding Y score the individual X score, squared the individual Y score, squared

Exercise ¥ Calculate Pearson correlation coefficient for US school enrollment (unit: k) in some time points of previous 50 years. (Source: United States Census Bureau) G 9 -12 Year Public Private 1965 11610 1970 13336 1975 14304 1980 13231 1985 12388 1990 11341 1995 12502 2000 13517 2005 14909 College. Public College-Private 1400 3970 1951 1311 6428 2153 1300 8836 2350 1339 9457 2640 1362 9479 2768 1136 10845 2974 1163 11092 3169 1264 11753 3560 1349 13022 4466 1. Select two columns of data – are they correlated? 2. What does this correlated mean? 9

Using Excel to calculate ¥ CORREL function ¥ Or PEARSON function 16000 14000 12000 10000 G 9 -12 Public 8000 G 9 -12 Private College-Public 6000 College-Private 4000 2000 0 1955 10 1965 1975 1985 1995 2005 2015

Visualizing a correlation ¥ Scatterplot or scattergram X Y X 11 Y 2 4 5 6 4 7 8 5 6 7 3 2 6 5 3 6 5 4 4 5

Visualizing a correlation 12

Direct (positive) correlation 10 8 6 4 2 0 0 ¥ ¥ 13 2 4 6 8 10 r =1, a perfect direct (or positive) correlation In real life case, 0. 7 and 0. 8 could be the highest you will see

Indirect (or negative) correlation 10 8 6 4 2 0 0 ¥ Strength 14 2 4 6 8 10 and direction are important

Excel Scatterplot Four sets of data with the same correlation of 0. 816 15

Linear correlation ¥ Linear correlation means that X and Y are in one straight line ¥ Curvlilinear ¥ 16 correlation Age and memory

More than 2 variables? How to calculate the correlation coefficient? 1. CORREL() 2. Correlation in data analysis toolset 17 income education attitude 74190 13 80931 12 81314 11 73089 11 62023 11 61217 10 84526 11 87251 11 62659 12 76450 10 70512 12 78858 9 78628 13 86212 14 74962 9 58828 11 61471 10 78621 12 60071 9 vote 1 3 4 5 4 5 6 7 8 8 9 8 7 8 1 2 2 2 1 1 2 2 4 5 5 4

More than 2 variables? ¥ Correlation Income Education Attitude Vote 18 matrix Income Education Attitude Vote 1. 00 0. 35 -0. 19 -0. 51 1. 00 -0. 21 -0. 20 1. 00 0. 55 1. 00

Excel ¥ Data 19 Analysis tool - correlation

Meaning of Correlation coefficient ¥ Correlation ¥ - finite number ~ + finite number ¥ Correlation ¥ coefficient value: -1. 00 ~ +1. 00 rxy value 20 value: Interpretation 0. 8 ~ 1. 0 Very strong relationship (share most of the things in common) 0. 6 ~0. 8 Strong relationship (share many things in common) 0. 4 ~ 0. 6 Moderate relationship (share something in common) 0. 2 ~ 0. 4 Weak relationship (share a little in common) 0. 0 ~ 0. 2 Weak or no relationship (share very little or nothing in common)

Coefficient of determination ¥ Coefficient of determination: The percentage of variance in one variable that is accounted for by the variance in the other variable. ¥ = square of coefficient ¥ 49% of the variance in GPA can be explained by the variance in studying time 21

Coefficient of nondetermination ¥ The amount of unexplained variance is called the coefficient of undetermination (coefficient of alienation) 22 correlation determination 0 0 0. 5 0. 25 0. 9 0. 81 interpretation

Ice cream and crime ¥ In a small town in Greece, ¥ The local police found the direct correlation between ice cream and crime 23

Correlation vs. causality ¥ The correlation represents the association between two or more variables ¥ It has nothing to do with causality (there is no cause relation between two correlated variables) Ices cream and crime are correlated, but ¥ Ices cream does not cause crime ¥ 24

Correlation vs. causality Summer is when people get together. More specifically, casual drinkers and drug users are more likely to go to bars or parties on weekends and evenings, as opposed to a Tuesday morning. These people in the social mix, flooding the city’s streets and neighborhood bars, feed the peak times for murder, experts say. 25