S 519 Evaluation of Information Systems Social Statistics

  • Slides: 23
Download presentation
S 519: Evaluation of Information Systems Social Statistics Ch 5: Correlation

S 519: Evaluation of Information Systems Social Statistics Ch 5: Correlation

This week l l l What is correlation? How to compute? How to interpret?

This week l l l What is correlation? How to compute? How to interpret?

Correlation Coefficients l The relations between two variables l l How the value of

Correlation Coefficients l The relations between two variables l l How the value of one variable changes when the value of another variable changes A correlation coefficient is a numerical index to reflect the relationship between two variables. l l Range: -1 ~ +1 Bivariate correlation (for two variables)

Correlation Coefficients l Parametric l l Pearson product-moment correlation (named for inventor Karl Pearson)

Correlation Coefficients l Parametric l l Pearson product-moment correlation (named for inventor Karl Pearson) Non-parametric l l Spearman’s rank correlation Kendall tau rank correlation coefficient

Pearson correlation coefficient l For two variables which are continuous in nature l l

Pearson correlation coefficient l For two variables which are continuous in nature l l Height, age, test score, income But not for discrete or categorical variables l Race, political affiliation, social class, rank Rxy is the correlation between variable X and variable Y

Types of correlation coefficients l Direct correlation (positive correlation): l l If both variables

Types of correlation coefficients l Direct correlation (positive correlation): l l If both variables change in the same direction Indirect correlation (negative correlation): l If both variables change in opposite directions l See table 5. 1 (S-p 112) l -0. 70 and +0. 5, which is stronger?

Pearson product-moment correlation coefficient n X Y XY X 2 Y 2 The correlation

Pearson product-moment correlation coefficient n X Y XY X 2 Y 2 The correlation coefficient between X and Y the size of the sample the individual’s score on the X variable the individual’s score on the Y variable the product of each X score times its corresponding Y score the individual X score, squared the individual Y score, squared

Exercise l Calculate Pearson correlation coefficient X Y 2 4 5 6 4 7

Exercise l Calculate Pearson correlation coefficient X Y 2 4 5 6 4 7 8 5 6 7 1. Is variable X and variable Y correlated? 2. What does this correlated mean? 3 2 6 5 3 6 5 4 4 5

Using Excel to calculate l l CORREL function Or Pearson function

Using Excel to calculate l l CORREL function Or Pearson function

Visualizing a correlation l Scatterplot or scattergram X Y 2 4 5 6 4

Visualizing a correlation l Scatterplot or scattergram X Y 2 4 5 6 4 7 8 5 6 7 Y X 3 2 6 5 3 6 5 4 4 5

Visualizing a correlation

Visualizing a correlation

Direct (positive) correlation 9 8 7 6 5 4 3 2 1 0 0

Direct (positive) correlation 9 8 7 6 5 4 3 2 1 0 0 l l 1 2 3 4 5 6 7 8 9 r =1, a perfect direct (or positive) correlation In real life case, 0. 7 and 0. 8 could be the highest you will see

Indirect (or negative) correlation 9 8 7 6 5 4 3 2 1 0

Indirect (or negative) correlation 9 8 7 6 5 4 3 2 1 0 0 l 1 2 3 4 5 6 7 8 9 Strength and direction are important

Excel Scatterplot Four sets of data with the same correlation of 0. 816

Excel Scatterplot Four sets of data with the same correlation of 0. 816

Linear correlation l Linear correlation means that X and Y are in one straight

Linear correlation l Linear correlation means that X and Y are in one straight line l Curvlilinear correlation l Age and memory

More than 2 variables? income How to calculate the correlation coefficient? education 74190 80931

More than 2 variables? income How to calculate the correlation coefficient? education 74190 80931 81314 73089 62023 61217 84526 87251 62659 76450 70512 78858 78628 86212 74962 58828 61471 78621 60071 attitude 13 12 11 11 11 10 11 11 12 10 12 9 13 14 9 11 10 12 9 vote 1 3 4 5 4 5 6 7 8 8 9 8 7 8 1 2 2 2 1 1 2 2 4 5 5 4 1. CORREL() 2. Correlation in data analysis toolset

More than 2 variables? l Correlation matrix Income Education Attitude Vote 1. 00 0.

More than 2 variables? l Correlation matrix Income Education Attitude Vote 1. 00 0. 35 -0. 19 0. 51 1. 00 -0. 21 0. 43 1. 00 0. 55 1. 00

Excel l Data Analysis tool - correlation

Excel l Data Analysis tool - correlation

Meaning of Correlation coefficient l Correlation value: l l - finite number ~ +

Meaning of Correlation coefficient l Correlation value: l l - finite number ~ + finite number Correlation coefficient value: l -1. 00 ~ +1. 00 rxy value Interpretation 0. 8 ~ 1. 0 Very strong relationship (share most of the things in common) 0. 6 ~0. 8 Strong relationship (share many things in common) 0. 4 ~ 0. 6 Moderate relationship (share something in common) 0. 2 ~ 0. 4 Weak relationship (share a little in common) 0. 0 ~ 0. 2 Weak or no relationship (share very little or nothing in common)

Coefficient of determination l Coefficient of determination: l l The percentage of variance in

Coefficient of determination l Coefficient of determination: l l The percentage of variance in one variable that is accounted for by the variance in the other variable. = square of coefficient 49% of the variance in GPA can be explained by the variance in studying time

Coefficient of nondetermination l The amount of unexplained variance is called the coefficient of

Coefficient of nondetermination l The amount of unexplained variance is called the coefficient of undetermination (coefficient of alienation) correlation determination 0 0 0. 5 0. 25 0. 9 0. 81 interpretation

Ice cream and crime l l In a small town in Greece, The local

Ice cream and crime l l In a small town in Greece, The local police found the direct correlation between ice cream and crime

Correlation vs. causality l l The correlation represents the association between two or more

Correlation vs. causality l l The correlation represents the association between two or more variables It has nothing to do with causality (there is no cause relation between two correlated variables) l l Ices cream and crime are correlated, but Ices cream does not cause crime