Univariate Linear Regression Chapter Eight Basic Problem Definition

Univariate Linear Regression • Chapter Eight – Basic Problem – Definition of Scatterplots – What to check for

Basic Empirical Situation • Unit of data. • Two interval (or ratio) scales measured for each unit. – Example: observational study, independent variable is score of student on first exam in AMS 315, dependent variable is score on final exam. – Objective is to assess the strength of the association between score on first exam and final.

Examining the scatterplot • Regression techniques ASSUME – 1. Linear regression function – 2. Independent errors of measurement – 3. Constant error variance – 4. Normal distribution of errors. • If assumptions 1 and 3 met, scatterplot is a football shaped cloud of points.

How to use a scatterplot • Look at it! • Check whether linear regression function appears reasonable (pencil test). • Check whethere is a “horn” shaped pattern in the scatterplot (homoscedasticity violated). • Check for outliers or other unusual patterns.

Ordinary Least Squares Line • Residual – ASSUME intercept is a and slope b – ASSUME dependent variable value is y 1 and independent variable value is x 1 – Residual r 1(a, b)=(y 1 -a-bx 1) • Chose slope b and intercept a so that the sum of the residuals squared is as small as possible.

OLS Estimate for the Slope • The solution is always the same; you should memorize the following.

OLS Estimate of the Slope • • The correlation coefficient is r. The standard deviation of the y data is s. Y. The standard deviation of the x data is s. X There are other formulas as well that are useful for solving specific distributional problems

Point Slope Form of the Regression Line • Memorize the following formula:

Univariate Linear Regression Model • Value of dependent variable on i-th unit is Yi and independent variable is xi. • There are three quantities to be estimated: β 0, β 1, and σ. These are the intercept, slope, and standard deviation of error.

Four Assumptions of Univariate Linear Regression • Regression function is linear. • Observations have independent errors. • Variance of error is the same for all observations. • Errors are normally distributed.

Implication of Assumptions • Each Yi is normally distributed with expected value β 0+β 1 xi and variance σ2. • The most important question is whether the data indicates that the slope is different from zero. • From these facts, we can derive the distribution of the OLS estimate of the slope.

OLS estimate of the slope • The estimate given in the last class is the most practical and interpretable estimate. • There is another formula that gives exactly the same result but is easier to work with:

Using the new formula • The estimate is a linear combination of the Yi, which are normally distributed. • Therefore, the distribution of the estimate is normal. • If only we knew its expected value and variance!

Using the new formula • The estimate can be rewritten • where

Using the new formula • When we write in what the model is, we get

Expected value of estimated slope • Expectation is a linear operator. • We apply the standard calculations to the previous formula to find:

Variance of the OLS estimate of the slope • The formula for the variance of the sum of two random variables generalizes. The general result is:

Variance of the OLS estimate of the slope • We apply this formula to the last term in the formula for the OLS estimate of the slope:

Variance of the OLS estimate of the slope • Remember that the Zi are independent standard normal random variables. • That is, each variance is one. • Each covariance is zero.

Variance of the OLS estimate of the slope • Then, the variance of the OLS estimate of the slope is given by:

Summary of Results • When the model is correct, the distribution of the OLS estimate of the slope is

Tests of Hypotheses and Confidence Intervals • ASSUME σ2 is known. • Then you can test a null hypothesis and find confidence interval for β 1 using procedures as before. • These results are most useful for designing studies. • We will focus on this next class.

Tests of Hypotheses and Confidence Intervals • • ASSUME σ2 is unknown. Estimate σ2 by MSE. Use a t-test Also have an F test:

Tests of Hypotheses and Confidence Intervals • For t-test, degrees of freedom from MSE. – Degrees of freedom is n-2. – Alternatives can be right, left, or two-sided. • For F-test, one numerator and n-2 degrees of freedom. – Test is always right-sided. – With respect to coefficients, F-test is a twosided test about the coefficients.

Additional Tests and Confidence Intervals • Can get confidence interval for β 0. • Can get confidence interval for the value of the regression function at a specific argument.

Prediction Intevals • Covered in a later lecture.

Next Class • Design issues in two independent sample studies. • Design issues in regression analysis.