Decomposition of Sum of Squares The total sum

Claims • First, SSTO = SSR +SSE, that is • Proof: …. • Alternative

Analysis of Variance Table • The decomposition of SS discussed above is usually summarized

Coefficient of Determination • The coefficient of determination is • It must satisfy 0

Claim • R 2 = r 2, that is the coefficient of determination is

Important Comments about R 2 • It is a useful measure but… • There

ANOVE F Test • The ANOVA table gives us another test of H 0:

Prediction of Mean Response • Very often, we would want to use the estimated

Confidence Interval for E(Y | X = x*) • For a given x, x*

Example • Consider the smoking and cancer data. • Suppose we wish to predict

Prediction of New Observation • Suppose we want to predict a particular value of

Prediction Interval for New Observation • 100(1 -α)% prediction interval for when X =

Dummy Variable Regression • Dummy or indicator variable takes two values: 0 or 1.

Slides: 13

Download presentation

Decomposition of Sum of Squares • The total sum of squares (SS) in the response variable is • The total SS can be decompose into two main sources; error SS and regression SS… • The error SS is • The regression SS is It is the amount of variation in Y’s that is explained by the linear relationship of Y with X. STA 302/1001 - week 4 1

Claims • First, SSTO = SSR +SSE, that is • Proof: …. • Alternative decomposition is • Proof: Exercises. STA 302/1001 - week 4 2

Analysis of Variance Table • The decomposition of SS discussed above is usually summarized in analysis of variance table (ANOVA) as follow: • Note that the MSE is s 2 our estimate of σ2. STA 302/1001 - week 4 3

Coefficient of Determination • The coefficient of determination is • It must satisfy 0 ≤ R 2 ≤ 1. • R 2 gives the percentage of variation in Y’s that is explained by the regression line. STA 302/1001 - week 4 4

Claim • R 2 = r 2, that is the coefficient of determination is the correlation coefficient square. • Proof: … STA 302/1001 - week 4 5

Important Comments about R 2 • It is a useful measure but… • There is no absolute rule about how big it should be. • It is not resistant to outliers. • It is not meaningful for models with no intercepts. • It is not useful for comparing models unless same Y and one set of predictors is a subset of the other. STA 302/1001 - week 4 6

ANOVE F Test • The ANOVA table gives us another test of H 0: β 1 = 0. • The test statistics is • Derivations … STA 302/1001 - week 4 7

Prediction of Mean Response • Very often, we would want to use the estimated regression line to make prediction about the mean of the response for a particular X value (assumed to be fixed). • We know that the least square line is an estimate of • Now, we can pick a point, X = x* (in the range in the regression line) then, is an estimate of • Claim: • Proof: • This is the variance of the estimate of E(Y | X=x*). STA 302/1001 - week 4 8

Confidence Interval for E(Y | X = x*) • For a given x, x* , a 100(1 -α)% CI for the mean value of Y is where STA 302/1001 - week 4 9

Example • Consider the smoking and cancer data. • Suppose we wish to predict the mean mortality index when the smoking index is 101, that is, when x* = 101…. STA 302/1001 - week 4 10

Prediction of New Observation • Suppose we want to predict a particular value of Y* when X = x*. • The predicted value of a new point measured when X = x* is • Note, the above predicted value is the same as the estimate of E(Y | X = x*). • The predicted value has two sources of variability. One is due to the regression line being estimated by b 0+b 1 X. The second one is due to ε* i. e. , points don’t fall exactly on line. • To calculated the variance in error of prediction we look at the difference STA 302/1001 - week 4 11

Prediction Interval for New Observation • 100(1 -α)% prediction interval for when X = x* is • This is not a confidence interval; CI’s are for parameters and we are estimating a value of a random variable. • Prediction interval is wider than CI for E(Y | X = x*). STA 302/1001 - week 4 12

Dummy Variable Regression • Dummy or indicator variable takes two values: 0 or 1. • It indicates which category an observation is in. • Example… • Interpretation of regression coefficient in a dummy variable regression… STA 302/1001 - week 4 13