Regression Part II Regression analysis Regression analysis answers

  • Slides: 16
Download presentation
Regression Part II

Regression Part II

Regression analysis • Regression analysis answers three questions about the relationships among variables –

Regression analysis • Regression analysis answers three questions about the relationships among variables – Does an association (relationship) exist between two or more variables? – How strong is the relationship (correlation)? – Is is possible to predict (estimate) the value of the dependent variable if we know the value of the independent variables?

Why bivariate regression is preferable to just correlation • It fits our theories better

Why bivariate regression is preferable to just correlation • It fits our theories better - sets us up to work towards causation • Allows for prediction - not simply a measure of association • Can control for other potential causes of Y • More flexible – # and type of Xs – functional form - can examine nonlinear relationships

Review from last week • A regression model can best be expressed as an

Review from last week • A regression model can best be expressed as an equation.

 • = the constant or Y intercept (where the regression line crosses the

• = the constant or Y intercept (where the regression line crosses the Y axis) • = the regression coefficient or slope – is the change in Y for a one-unit increase in X • Both and are referred to as the regression coefficients – These values are constant. For any particular straight line, they will remain the same for any value of X and/or Y

 • Y = the predicted value of the dependent variable • X =

• Y = the predicted value of the dependent variable • X = the independent variable • X and Y are variables – They change with each individual case in the sample/population.

Linear Regression Model • In practice, the parameters of the linear regression function are

Linear Regression Model • In practice, the parameters of the linear regression function are unknown. • The linear regression model uses a method called least squares to estimate the population parameters based on sample data. • For any single fixed value of X, the equation estimates the mean of Y for all subjects in the population having that value of X.

 • Models cannot perfectly represent the real world. • We add e (the

• Models cannot perfectly represent the real world. • We add e (the error term). • One of the goals of developing a regression model is to minimize e – so that your model works as well as possible

Interpreting the regression equation • r 2 - the correlation coefficient – how well

Interpreting the regression equation • r 2 - the correlation coefficient – how well does the equation fit the data – the proportion of the variation in the dependent variable that is associated with or explained by the independent variable

Interpreting the regression equation • - the regression coefficient • The size of is

Interpreting the regression equation • - the regression coefficient • The size of is determined by the scale on which the original variable is measured. – You can not compare unstandardized ’s – a large simply means the original variable was measured in large units

Interpreting the regression equation • In addition to the value of (a measure of

Interpreting the regression equation • In addition to the value of (a measure of the magnitude of the relationship between two variables), we are also interested in whether or not the relationship is statistically significant. • We determine this using a t test. • Ho : = 0 • t can be found by dividing the sample slope (b) by the standard error of b • Remember the t > 2 rule of thumb

Using regression equation for prediction • Meier/Brudney 17. 17 • Presidential election; key issue

Using regression equation for prediction • Meier/Brudney 17. 17 • Presidential election; key issue inflation • Is the inflation rate related to the % of vote received by candidate of party in power? • If inflation is 8%, predict the vote for the incumbent’s party

Multiple linear regression • In social science, most of the phenomenon in which we

Multiple linear regression • In social science, most of the phenomenon in which we are interested are best explained by examining multiple causes or independent variables.

 • Often the independent variables are not only related to the dependent variable

• Often the independent variables are not only related to the dependent variable but also to each other. • In order to tease out the effect of a single IV, we control for or hold constant all other independent variables (at their mean value). • b represents the effect of that IV on the DV while controlling for all other Ivs.

F-ratio • Another measure of statistical significance • Tests the multiple regression equation as

F-ratio • Another measure of statistical significance • Tests the multiple regression equation as a whole • Is the ratio between explained and unexplained variance in the model • Indicates the probability that the regression equation could have occurred by chance

Example • Study by Greg Lewis: regression equation to predict entry grade into the

Example • Study by Greg Lewis: regression equation to predict entry grade into the US civil service