Correlation Correlation Correlation measures the strength of the

  • Slides: 9
Download presentation
Correlation

Correlation

Correlation • Correlation measures the strength of the linear association between two quantitative variables

Correlation • Correlation measures the strength of the linear association between two quantitative variables • Get the correlation coefficient (r) from your calculator or computer • r has a value between -1 and +1 • Correlation has no units • Correlation is not a resistant measure, it can be dramatically affected by one or more outlying observations. • Scaling has no effect on the correlation coefficient. r = -1 Points fall exactly on a straight line r = -0. 7 r = -0. 4 r = 0 No linear relationship (uncorrelated) r = 0. 3 r = 0. 8 r = 1 Points fall exactly on a straight line

What can go wrong? • Use correlation only if you have two quantitative variables.

What can go wrong? • Use correlation only if you have two quantitative variables. • There is an association between gender and weight, but there isn’t a correlation between gender and weight. • Use correlation only if there is a linear relationship • Beware of outliers • Always plot the data before looking at the correlation. r = 0 No linear relationship, but there is a relationship! r = 0. 9 No linear relationship, but there is a relationship!

Causality

Causality

Correlation and Causality Many times we can examine a set of data and discover

Correlation and Causality Many times we can examine a set of data and discover that there is a relationship between two or more variables. However, this does not necessarily imply that there is a causality, or a “cause and effect” relationship. For Example It was discovered that during the 20 th Century there was a strong negative relationship between fashionable skirt lengths and the value of shares listed on the New York Stock Exchange. As skirt lengths became shorter, the share prices went up. As skirt lengths became longer, share values dropped. Does this mean that short skirts are good for the economy? Or does it mean that a thriving economy causes women to wear skirts shorter? No – this was a coincidental correlation.

When two variables are found to be correlated, there are three possible explanations: 1.

When two variables are found to be correlated, there are three possible explanations: 1. One variable may cause the other to change, e. g. smoking has been found to increase the risk of developing lung cancer. 2. The variables may both be affected by a third factor, e. g. when it was discovered that more people wear warm clothing in winter, and also more people get the flu in winter, it was not implied that there was a causal relationship between these, but that both were caused by the colder weather. 3. There may be no causal relationship at all – it could be just a coincidence.

Statisticians have three rules to help determine whethere is a causal link between two

Statisticians have three rules to help determine whethere is a causal link between two variables: 1. They must be strongly correlated. 2. The “cause” must occur before the “effect”. 3. There must be no possible other explanation. This is why statistical surveys try to eliminate other possible causes. e. g. if they were trying to discover whether taller people were richer than shorter people, they would select samples of tall and short people who were “the same” in all other ways – same occupations, health, education, age, marital status, etc.

Look at this data from Utopia – a fictitious nation. Month Icecreams sold Drownings

Look at this data from Utopia – a fictitious nation. Month Icecreams sold Drownings Jan 45600 12 Feb 43000 8 Mar 32040 5 Apr 21900 3 May 16700 1 Jun 15870 2 Jul 18700 0 Aug 16700 1 Sep 18900 2 Oct 34700 4 Nov 41900 7 Dec 45300 10 Correlation Coefficient: 0. 940923596 Does the number of ice creams sold cause the number of drowning’s to increase? What is the underlying variable?