Simple Linear Regression Statistics 515 Lecture Simple Linear

  • Slides: 17
Download presentation
Simple Linear Regression Statistics 515 Lecture Simple Linear Regression

Simple Linear Regression Statistics 515 Lecture Simple Linear Regression

Example for Illustration The human body takes in more oxygen when exercising than when

Example for Illustration The human body takes in more oxygen when exercising than when it is at rest. To deliver oxygen to the muscles, the heart must beat faster. Heart rate is easy to measure, but measuring oxygen uptake requires elaborate equipment. If oxygen uptake (VO 2) can be accurately predicted from heart rate (HR), the predicted values may replace actually measured values for various research purposes. Unfortunately, not all human bodies are the same, so no single prediction equation works for all people. Researchers can, however, measure both HR and VO 2 for one person under varying sets of exercise conditions and calculate a regression equation for predicting that person’s oxygen uptake from heart rate. 3/11/2021 Simple Linear Regression 2

Data From An Individual • • • Goals in this illustration: Scatterplot: linear relationship

Data From An Individual • • • Goals in this illustration: Scatterplot: linear relationship or not? Obtain the best-fitting line using least-squares. To test whether the model is significant or not. To obtain a confidence interval for the regression coefficient. • To obtain predictions. 3/11/2021 Simple Linear Regression 3

The Scatterplot 3/11/2021 Simple Linear Regression 4

The Scatterplot 3/11/2021 Simple Linear Regression 4

Simple Linear Regression Model 1. Conditional on X=x, the response variable Y has mean

Simple Linear Regression Model 1. Conditional on X=x, the response variable Y has mean equal to m(x) = a + bx. 2. a is the y-intercept; while b is the slope of the regression line, which could be interpreted as the change in the mean value per unit change in the independent variable. 3. For each X = x, the conditional distribution of Y is normal with mean m(x) and variance s 2. 4. Y 1, Y 2, …, Yn are independent of each other. Shorthand: Yi = a + bxi + ei with ei IID N(0, s 2) 3/11/2021 Simple Linear Regression 5

Least-Squares (LS) Regression One of the goals in regression analysis is to estimate the

Least-Squares (LS) Regression One of the goals in regression analysis is to estimate the parameters a, b, and s 2 of the regression model. Denote by The estimate of the regression line, so that a estimates a, and b estimates b. Then for the observed values of X, which are x 1, x 2, …, xn, we may obtain the predicted values of the response variable Y for each of these X-values. These are: 3/11/2021 Simple Linear Regression 6

Predicted Values A good estimate of the regression line should produce predicted values that

Predicted Values A good estimate of the regression line should produce predicted values that are close to the actual observed values of the response variable. That is, the set of deviations Should ideally be close (if not equal) to zeros. These deviations between observed and predicted values are also called as residuals. 3/11/2021 Simple Linear Regression 7

Principle of Least-Squares (LS) In least-squares regression, the best-fitting regression line is that which

Principle of Least-Squares (LS) In least-squares regression, the best-fitting regression line is that which will make the sum of these squared deviations or residuals as small as possible. Thus, the regression coefficients a and b are chosen in order to minimize the quantity: Using calculus, the values of a and b that will minimize this quantity are given by: 3/11/2021 Simple Linear Regression 8

Least-Squares Solution 3/11/2021 Simple Linear Regression 9

Least-Squares Solution 3/11/2021 Simple Linear Regression 9

Estimating the Variance 3/11/2021 Simple Linear Regression 10

Estimating the Variance 3/11/2021 Simple Linear Regression 10

Interpretations of Quantities • SSE : measures variation not explained by the predictor variable.

Interpretations of Quantities • SSE : measures variation not explained by the predictor variable. • SSR : measures the amount of variation explained by the predictor variable. • SYY: total variation in the Y-values. This is partitioned into SSR and SSE. • R 2 = (SSR)/(SYY) : coefficient of determination; indicates proportion of variation in Y-values explained by the predictor variable. • MSE = (SSE)/(n-2) : is the mean-squared error. This provides an unbiased estimate of the common variance s 2. 3/11/2021 Simple Linear Regression 11

Sampling Distributions of Estimators To estimate the variance, s 2 is replaced by the

Sampling Distributions of Estimators To estimate the variance, s 2 is replaced by the MSE. 3/11/2021 Simple Linear Regression 12

Testing Hypothesis • To test the null hypothesis H 0: b = b 0

Testing Hypothesis • To test the null hypothesis H 0: b = b 0 versus H 1: b not equal to b 0 we use the t-statistic given by: Which follows a t-distribution with degrees-of-freedom equal to n-2 under the null hypothesis. Thus, we reject H 0 if |Tc| > tn-2; a/2. Similarly, for testing H 0: a = a 0, we use: 3/11/2021 Simple Linear Regression 13

Confidence Interval for Mean and Predicting the Value of Y of a new Unit

Confidence Interval for Mean and Predicting the Value of Y of a new Unit Estimate of Mean and Predicted Value at x 0: Variance: CI for m(x 0): CI for Y(x 0): 3/11/2021 Simple Linear Regression 14

Results of Regression Analysis (using Minitab) Prediction Line P-value for regression P-Value Coefficient of

Results of Regression Analysis (using Minitab) Prediction Line P-value for regression P-Value Coefficient of Determination (MSR)/(MSE) 3/11/2021 Simple Linear Regression 15

Fitted Line on the Scatterplot 3/11/2021 Simple Linear Regression 16

Fitted Line on the Scatterplot 3/11/2021 Simple Linear Regression 16

Confidence Interval for Mean and Prediction Interval For predicting the mean value For predicting

Confidence Interval for Mean and Prediction Interval For predicting the mean value For predicting the value of the response 3/11/2021 Simple Linear Regression 17