Logical Line Fitting One Step in the EDA

  • Slides: 8
Download presentation
Logical Line Fitting: One Step in the EDA Process by Shannon Guerrero Northern Arizona

Logical Line Fitting: One Step in the EDA Process by Shannon Guerrero Northern Arizona University NCTM 2008 Annual Meeting & Exposition Salt Lake City, UT April 2008

EDA (Exploratory Data Analysis) Mostly graphical approach to data analysis ¡ Emphasizes uncovering underlying

EDA (Exploratory Data Analysis) Mostly graphical approach to data analysis ¡ Emphasizes uncovering underlying structure of data, extract important variables, detect outliers/anomolies, test underlying assumptions, maximize insight into data set ¡ Graph the data, graph the data ¡ Focus on sense-making rather than theory ¡

Why curve fitting? Applications in data analysis & algebra ¡ “Analyses of the relationships

Why curve fitting? Applications in data analysis & algebra ¡ “Analyses of the relationships between two sets of measurement data are central in high school mathematics” (p. 328 NCTM PSSM) ¡ modeling, prediction, symbolic representation, correlation, regression, residuals ¡

“Line of Best Fit” Explains relationship between two variables with a straight line that

“Line of Best Fit” Explains relationship between two variables with a straight line that “best fits” the data ¡ Line may pass through some, none, or all of the points ¡ Used to predict future values from existing values (interpolate vs extrapolate) ¡

Outliers ¡ ¡ An observation that lies outside the overall pattern of a distribution

Outliers ¡ ¡ An observation that lies outside the overall pattern of a distribution For one variable, a convenient def’n is a point that falls more than 1. 5 times the IQR above the 3 rd quartile or below the 1 st quartile Examine outliers carefully and understand their appearance in your data set Need to decide what to do with outliers – include or discard?

Curve Fitting vs. Regression Power of curve fitting often lost as we revert right

Curve Fitting vs. Regression Power of curve fitting often lost as we revert right to regression calculations ¡ Curve fitting is more general and an approximation ¡ Equation found (using either method) can help uncover underlying structure of data, predict future values from past ones, model causal relationships, and maximize insight into a data set ¡

Linear Regression Statistical approach to finding relationship between two variables ¡ Least squares regression

Linear Regression Statistical approach to finding relationship between two variables ¡ Least squares regression attempts to minimize the squared residuals (residual – difference between observed value and value given by model) ¡ Assumption: for a fixed value of x the value of y is normally distributed with equal variations across x ¡

r 2 and residuals ¡ ¡ residual – difference between an observed value and

r 2 and residuals ¡ ¡ residual – difference between an observed value and value predicted by regression line residual plot is a scatterplot of regression residuals against the explanatory variable helps us assess fit of regression line r 2 is another way to assess how well the line fits the data (the closer to 1 the better the fit)