Chapter 12 Simple Linear Regression and Correlation Copyright

  • Slides: 47
Download presentation
Chapter 12 Simple Linear Regression and Correlation Copyright (c) 2004 Brooks/Cole, a division of

Chapter 12 Simple Linear Regression and Correlation Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

12. 1 The Simple Linear Regression Model Copyright (c) 2004 Brooks/Cole, a division of

12. 1 The Simple Linear Regression Model Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Linear Relationship The simplest deterministic mathematical relationship between two variables x and y is

Linear Relationship The simplest deterministic mathematical relationship between two variables x and y is a linear relationship The set of pairs (x, y) for which determines a straight line. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Terminology The variable whose value is fixed by the experimenter, denoted x, is the

Terminology The variable whose value is fixed by the experimenter, denoted x, is the independent (predictor, explanatory) variable. For a fixed x, the second variable will be a random variable Y with observed value y, referred to as the dependent (response) variable. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

The Simple Linear Regression Model There exists parameters such that for any fixed value

The Simple Linear Regression Model There exists parameters such that for any fixed value of x, the dependent variable is related to x through the model equation is a random variable (called the random deviation) with Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Linear Regression Model (x 1, y 1) True regression line x 1 Copyright (c)

Linear Regression Model (x 1, y 1) True regression line x 1 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Distribution of Normal, mean = 0, standard deviation 0 Copyright (c) 2004 Brooks/Cole, a

Distribution of Normal, mean = 0, standard deviation 0 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Distribution of Y for Different Values of x x 1 x 2 x 3

Distribution of Y for Different Values of x x 1 x 2 x 3 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

12. 2 Estimating Model Parameters Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning,

12. 2 Estimating Model Parameters Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Principle of Least Squares The vertical deviation of the point (xi, yi) from the

Principle of Least Squares The vertical deviation of the point (xi, yi) from the line y = b 0 + b 1 x is yi – (b 0 + b 1 xi) The sum of squared vertical deviations from the points to the line is: Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Principle of Least Squares The least-squares (regression) line for the data is given by

Principle of Least Squares The least-squares (regression) line for the data is given by where and Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Ex. Find the equation of least-squares for the data Sum: x 1 2 y

Ex. Find the equation of least-squares for the data Sum: x 1 2 y 2 3 xy 2 6 x 2 1 4 3 7 21 9 12 29 14 6 = 2. 5 = – 1 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Fitted Values and Residuals The fitted (predicted) values are obtained by substituting into the

Fitted Values and Residuals The fitted (predicted) values are obtained by substituting into the equation of the estimated regression line: The residuals are the vertical deviations from the estimated line. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Error Sum of Squares The error sum of squares, denoted SSE, is and the

Error Sum of Squares The error sum of squares, denoted SSE, is and the estimate of is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Computational Formula A computational formula for the SSE, is Copyright (c) 2004 Brooks/Cole, a

Computational Formula A computational formula for the SSE, is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Total Sum of Squares The total sum of squares, denoted SST, is Copyright (c)

Total Sum of Squares The total sum of squares, denoted SST, is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Coefficient of Determination The coefficient of determination, denoted by r 2, is given by

Coefficient of Determination The coefficient of determination, denoted by r 2, is given by It is interpreted as the proportion of observed y variation that can be explained by the simple linear regression model. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Regression Sum of Squares SSR = SST – SSE Regression sum of squares is

Regression Sum of Squares SSR = SST – SSE Regression sum of squares is interpreted as the amount of variation that is explained by the model. We have Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

12. 3 Inferences About the Slope Parameter Copyright (c) 2004 Brooks/Cole, a division of

12. 3 Inferences About the Slope Parameter Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

1. The mean of 2. The variance and standard deviation are 3. has a

1. The mean of 2. The variance and standard deviation are 3. has a normal distribution. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

T Variable The assumptions of the simple linear regression model imply that the standardized

T Variable The assumptions of the simple linear regression model imply that the standardized variable has a t distribution with n – 2 df. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Interval of the true regression line is Copyright (c) 2004 Brooks/Cole, a division

Confidence Interval of the true regression line is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Hypothesis-Testing Procedures Null hypothesis: Test statistic value: Copyright (c) 2004 Brooks/Cole, a division of

Hypothesis-Testing Procedures Null hypothesis: Test statistic value: Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Hypothesis-Testing Procedures Alternative Hypothesis Rejection Region for Approx. Level Test or A P-value based

Hypothesis-Testing Procedures Alternative Hypothesis Rejection Region for Approx. Level Test or A P-value based on n – 2 df can be calculated as in Chap 8 and 9. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Hypothesis-Testing The model utility test is the test of in which case the test

Hypothesis-Testing The model utility test is the test of in which case the test statistic value is the ratio Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

ANOVA Table Source of Variation df Sum of squares Mean Square Regression 1 SSR

ANOVA Table Source of Variation df Sum of squares Mean Square Regression 1 SSR Error n– 2 SSE Total n– 1 SST f Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

12. 4 Inferences Concerning and the Prediction of Future Y Values Copyright (c) 2004

12. 4 Inferences Concerning and the Prediction of Future Y Values Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

is some fixed value of x. 1. The mean of is 2. Variance and

is some fixed value of x. 1. The mean of is 2. Variance and standard deviation: Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

2. (continued) 3. has a normal distibution. Copyright (c) 2004 Brooks/Cole, a division of

2. (continued) 3. has a normal distibution. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

T Variable The variable has a t distribution with n – 2 df. Copyright

T Variable The variable has a t distribution with n – 2 df. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Interval expected value of Y when x = x*, is Copyright (c) 2004

Confidence Interval expected value of Y when x = x*, is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Prediction Interval A future value of Y is not a parameter but instead a

Prediction Interval A future value of Y is not a parameter but instead a random variable; its interval of plausible values is referred to as a prediction interval. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Prediction Interval A PI for a future Y observation to be made when x

Prediction Interval A PI for a future Y observation to be made when x = x*, is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

12. 5 Correlation Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

12. 5 Correlation Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Sample Correlation Coefficient The sample correlation coefficient, denoted r, of n pairs (x 1,

Sample Correlation Coefficient The sample correlation coefficient, denoted r, of n pairs (x 1, y 1), …, (xn, yn) is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Ex. Find the correlation coefficient for the least-squares line from the points = 0.

Ex. Find the correlation coefficient for the least-squares line from the points = 0. 9449 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Properties of r Important properties of r 1. The value of r does not

Properties of r Important properties of r 1. The value of r does not depend on which of the two variables under study is labeled x and which is labeled y. 2. The value of r is independent of the units in which x and y are measured. 3. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Properties of r 4. r = 1 iff all (xi, yi) pairs lie on

Properties of r 4. r = 1 iff all (xi, yi) pairs lie on straight line with positive slope, and r = – 1 iff all (xi, yi) pairs lie on a straight line with negative slope. 5. The square of the sample correlation coefficient gives the value of the coefficient of determination that would result from fitting the simple linear regression model. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Different Values of r r near 1 r near 0, no relationship r near

Different Values of r r near 1 r near 0, no relationship r near -1 r near 0, nonlinear relationship Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

The Population Correlation Coefficient where depending on whether (X, Y) is discrete or continuous.

The Population Correlation Coefficient where depending on whether (X, Y) is discrete or continuous. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Estimator Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Estimator Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Assumption The joint probability distribution of (X, Y ) is specified by is called

Assumption The joint probability distribution of (X, Y ) is specified by is called the bivariate normal probability distribution. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Testing for the Absence of Correlation When statistic: is true, the test Has a

Testing for the Absence of Correlation When statistic: is true, the test Has a t distribution with n – 2 df. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Hypothesis-Testing Alternative Hypothesis Rejection Region for Approx. Level Test or A P-value based on

Hypothesis-Testing Alternative Hypothesis Rejection Region for Approx. Level Test or A P-value based on n – 2 df can be calculated as described previously. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Other Inferences Concerning When (X 1, Y 1), …, (Xn, Yn) is a sample

Other Inferences Concerning When (X 1, Y 1), …, (Xn, Yn) is a sample from a bivariate normal distribution, the rv has approximately a normal distribution with mean and variance Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

The test statistic for testing Alternative Hypothesis Rejection Region for Level Test or Copyright

The test statistic for testing Alternative Hypothesis Rejection Region for Level Test or Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

CI for where c 1 and c 2 are the left and right endpoints,

CI for where c 1 and c 2 are the left and right endpoints, of the CI interval for Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.