SCATTER DIAGRAMS CCEA GCSE Statistics Investigating Correlation Scatter

  • Slides: 9
Download presentation
SCATTER DIAGRAMS CCEA GCSE Statistics

SCATTER DIAGRAMS CCEA GCSE Statistics

Investigating Correlation Scatter diagrams are used to investigate if two variables are correlated: �

Investigating Correlation Scatter diagrams are used to investigate if two variables are correlated: � Positive Correlation: an increase in one variable is matched by an increase in the other variable � Negative Correlation: an increase in one variable is matched by a decrease in the other variable � Zero Correlation: no relationship between the two variables The strength of the correlation is indicated by how closely the points conform to a straight line

Measuring Correlation ranges from perfect positive to perfect negative correlation The strength of the

Measuring Correlation ranges from perfect positive to perfect negative correlation The strength of the correlation can be represented by a statistic known as the Product Moment Correlation Coefficient (PMCC) which ranges from +1 (perfect positive correlation) to -1 (perfect negative correlation) The PMCC (denoted by the letter r) is explained in more detail elsewhere in the course notes

Positive Correlation Examples of positive correlation

Positive Correlation Examples of positive correlation

Zero/No Correlation Example of zero/no correlation

Zero/No Correlation Example of zero/no correlation

Negative Correlation Examples of negative correlation

Negative Correlation Examples of negative correlation

Outliers Even when strong correlation exists there may exist a data point which does

Outliers Even when strong correlation exists there may exist a data point which does not fit the overall trend This data point is called an outlier Outliers can sometimes occur randomly but are usually due an error in the measurement process for collecting the data

Outliers In the diagram we see strong positive correlation (r = +0. 92) but

Outliers In the diagram we see strong positive correlation (r = +0. 92) but point A is clearly an outlier Point B is perfectly in line with the general trend in the dataset and so is not an outlier

Outliers Outliers need to be identified during the modelling process and removed from the

Outliers Outliers need to be identified during the modelling process and removed from the dataset If outliers are not removed they could adversely affect the fitting process The line of best fit in the previous diagram ignored the outlier and hence was an excellent fit for all the other points It the outlier was included the resulting line of best fit would be flatter and not such a good fit overall