Lecture 1 b Inferences in Regression and Correlation
Lecture 1 b: Inferences in Regression and Correlation Analysis 732 G 21/732 A 35/732 G 28 1
Normal error linear model Formal statement � � Yi is i th response value β 0 β 1 model parameters, regression parameters (intercept, slope) Xi is i th predictor value is i. i. d. normally distributed random vars with expectation zero and variance σ2 732 G 21/732 A 35/732 G 28 2
Overview Inference about regression coefficients and response: � Interval estimates and test concerning coefficients � Confidence interval for Y � Prediction interval for Y � ANOVA-table 732 G 21/732 A 35/732 G 28 3
Inferences about slope � � � After fitting the data, we may obtain a regr. line Is 0. 00005 significant or just because of random variation? (hence, no linear dependence between Y and X) How to do? ◦ Use Hypothesis testing (later) ◦ Derive confindence interval for β 0. If ” 0” does not fall within this interval, there is dependence 732 G 21/732 A 35/732 G 28 4
Inferences about slope � Estimated slope b 1 is a random variable (look at formula) Properties of b 1 � Normally distributed (show) � E(b 1)= β 1 � Variance Further: Test statistics is distributed as t(n-2) 732 G 21/732 A 35/732 G 28 5
T statistics � � See table B. 2 (p. 1317) Example one-sided interval t(95%), 15 observations t 13=1. 771 732 G 21/732 A 35/732 G 28 6
Inferences about slope � Confidence interval for β 1 (show…) � If variance in the data is unknown, Example Compute confidence interval for slope, Salary dataset 732 G 21/732 A 35/732 G 28 7
From previous lecture 732 G 21/732 A 35/732 G 28 8
About hypothesis testing � Often, we have sample and we test at some confidence level α How to do? � � � Step 1: Find and compute appropriate test function T=T(sample, λ 0) Step 2: Plot test function’s distrubution and mark a critical area dependent on α If T is in the critical area, reject H 0 otherwise do not reject H 0 (accept H 1) 732 G 21/732 A 35/732 G 28 9
Inferences about slope � Test � Step 1: compute � � Step 2: Plot the distribution , mark the points and the critical area. Step 3: define where t* is and reject H 0 if it is in the critical area Example Test the hypothesis for Salary dataset: � Manually, compute also P-values � By Minitab 732 G 21/732 A 35/732 G 28 10
Inferences about intercept Sometimes, we need to know ” β 0=0? ” Do confidence intervals and hypothesis testing in the same way using folmulas below! � Properties of b 0 � Normally distributed (show) � E(b 0)= β 0 � Variance (show. . ) Further: Test statistics is distributed as t(n-2) 732 G 21/732 A 35/732 G 28 11
Inference about model parameters � � If distribution not normal (if slightly, OK, otherwise asymptotic) Spacing affects variance (larger spacing –smaller variance) Example Test β 0=0 for Salary data 732 G 21/732 A 35/732 G 28 12
Expected response Estimate at X=Xh (Xh – any): Properties of E(Yh) � Normally distributed (show) � � � Variance Further: Test statistics is distributed as t(n-2) Confidence interval 732 G 21/732 A 35/732 G 28 13
Prediction of new observation � Make a plot… CONFIDENCE INTERVAL We estimate the position of the mean in the population with X = Xh POINT ESTIMATE PREDICTION INTERVAL We estimate the position of the individual observation in the population with X = Xh 732 G 21/732 A 35/732 G 28 14
Prediction of new observation � When parameters are unknown, the mean E(Yh) may have more than one possible location New observation = mean + random error -> prediction interval should be wider � 732 G 21/732 A 35/732 G 28 15
Prediction of new observation Further: Test statistics is distributed as t(n-2) Prediction interval � � How to estimate s(pred) ? New observ. is any within b 0+b 1 Xh+ε. Hence Standard error (show) � 732 G 21/732 A 35/732 G 28 16
Prediction of new observation Example � Calculate confidence and prediction intervals for 35 years old person � Compare with output in Minitab 732 G 21/732 A 35/732 G 28 17
Analysis of Variance approach � Total sum of squares � Error sum of squares � Regression sum of squares 732 G 21/732 A 35/732 G 28 18
Degrees of freedom � SSTO has n-1 (sum up to zero) � SSE has n-2 ( 2 model parameters) � SSR has 1 (fitted values lie on regression line= 2 degrees-sum up to zero 1 degree) n-1 = n-2 + 1 SSTO =SSE + SSR Important : MSxx= SSxx/degrees_of_freedom 732 G 21/732 A 35/732 G 28 19
Analysis of Variance table � ANOVA table Source of variation SS df MS Regression 1 Error n-2 Total n-1 732 G 21/732 A 35/732 G 28 20
Analysis of Variance approach Expected mean squares � � � E(MSE) does not depend on the slope, even when zero E(MSR) =E(MSE) when slope is zero -> IF MSR much more than MSE, slope is not zero, if approximately same, can be zero 732 G 21/732 A 35/732 G 28 21
Hypothesis testing using ANOVA � Test statistics F* = MSR/MSE , use F(1, n-2) (see p. 1320) Decision rules: � � If F* > F(1 -α; 1, n-2) conclude Ha If F* ≤ F(1 -α; 1, n-2) conclude H 0 Note: F test and t test about β 1 are equivalent 732 G 21/732 A 35/732 G 28 22
Hypothesis testing using ANOVA � General approach � Full model: (linear) � Reduced model: (constant) 732 G 21/732 A 35/732 G 28 23
Hypothesis testing using ANOVA It is known (why? . . ) SSE(F)≤SSE(R). Large difference -different models, small difference – can be same � � Test statistics � For univariate linear model, equivalent to F* = MSR/MSE � � F* belongs to F(df. R-df. F, df. F) distribution (plot critical area. . ) Test rule: F*> F(1 -α; df. R-df. F, df. F) reject H 0 732 G 21/732 A 35/732 G 28 24
Hypothesis testing using ANOVA Example For Salary dataset � Compose ANOVA table and compare with MINITAB � Perform F-test and compare with MINITAB 732 G 21/732 A 35/732 G 28 25
Measures of linear association � Coefficient of determination: � Coefficient of correlation: Limitations: � High R does not mean a good fit � Low R does not mean than X and Y are not related Example: For Salary dataset, compute R 2 and compare with MINITAB 732 G 21/732 A 35/732 G 28 26
Reading � Chapter 2 up to page 78 732 G 21/732 A 35/732 G 28 27
- Slides: 27