2 4 Cautions about Regression and Correlation Cautions
2. 4: Cautions about Regression and Correlation
Cautions: Regression & Correlation • Correlation measures only linear association. • Extrapolation often produces unreliable predictions. • Correlation and least-squares regression are not resistant. • Lurking variables can make a correlation or regression misleading.
Residual Plots • A residual plot is a scatterplot of the regression residuals (i. e. , errors) against the explanatory variable. • Residual plots make patterns in the original scatterplot of data more apparent. – If the regression catches the overall pattern of the data, there should be no evident pattern to the residuals.
Cautions: Regression & Correlation • Correlation measures only linear association. • Extrapolation often produces unreliable predictions. • Correlation and least-squares regression are not resistant. • Lurking variables can make a correlation or regression misleading.
Cautions: Regression & Correlation • Correlation measures only linear association. • Extrapolation often produces unreliable predictions. • Correlation and least-squares regression are not resistant. • Lurking variables can make a correlation or regression misleading.
Outliers & Influential Data Points • Remember, an outlier is an observation that lies outside the overall pattern of the other observations. • In a least-squares regression, does an outlier have to have a large residual?
Outliers & Influential Data Points • Points that are outliers in the y direction have large regression residuals. • Other outliers need not have large residuals.
Outliers & Influential Data Points • An observation is influential if removing it would markedly change the result of the regression. • Outliers in the x direction of a scatterplot are often influential in least-squares regression.
Cautions: Regression & Correlation • Correlation measures only linear association. • Extrapolation often produces unreliable predictions. • Correlation and least-squares regression are not resistant. • Lurking variables can make a correlation or regression misleading.
Lurking Variable • A lurking variable is a variable that is not among the explanatory and response variables, yet may influence the interpretation of the relationships among those variables. • Association does not imply causation! – A lurking variable may have a cause-and-effect relationship with the x and y variables, creating a strong association between x and y.
- Slides: 11