Statistics and Data Analysis Professor William Greene Stern

  • Slides: 39
Download presentation
Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department of

Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department of Economics 17 -1/38 Part 17: Regression Residuals

Statistics and Data Analysis Part 17 – The Linear Regression Model 17 -2/38 Part

Statistics and Data Analysis Part 17 – The Linear Regression Model 17 -2/38 Part 17: Regression Residuals

Regression Modeling Theory behind the regression model p Computing the regression statistics p Interpreting

Regression Modeling Theory behind the regression model p Computing the regression statistics p Interpreting the results p Application: Statistical Cost Analysis p 17 -3/38 Part 17: Regression Residuals

A Linear Regression Predictor: Box Office = -14. 36 + 72. 72 Buzz 17

A Linear Regression Predictor: Box Office = -14. 36 + 72. 72 Buzz 17 -4/38 Part 17: Regression Residuals

Data and Relationship p We suggested the relationship between box office sales and internet

Data and Relationship p We suggested the relationship between box office sales and internet buzz is Box Office = -14. 36 + 72. 72 Buzz p p 17 -5/38 Box Office is not exactly equal to -14. 36+72. 72 x. Buzz How do we reconcile the equation with the data? Part 17: Regression Residuals

Modeling the Underlying Process p A model that explains the process that produces the

Modeling the Underlying Process p A model that explains the process that produces the data that we observe: n n n p Regression model n 17 -6/38 Observed outcome = the sum of two parts (1) Explained: The regression line (2) Unexplained (noise): The remainder. Internet Buzz is not the only thing that explains Box Office, but it is the only variable in the equation. The “model” is the statement that part (1) is the same process from one observation to the next. Part 17: Regression Residuals

The Population Regression p THE model: n n p Model statement n n 17

The Population Regression p THE model: n n p Model statement n n 17 -7/38 (1) Explained: Explained Box Office = α + β Buzz (2) Unexplained: The rest is “noise, ε. ” Random ε has certain characteristics Box Office = α + β Buzz + ε Box Office is related to Buzz, but is not exactly equal to α + β Buzz Part 17: Regression Residuals

The Data Include the Noise 17 -8/38 Part 17: Regression Residuals

The Data Include the Noise 17 -8/38 Part 17: Regression Residuals

What explains the noise? What explains the variation in fuel bills? 17 -9/38 Part

What explains the noise? What explains the variation in fuel bills? 17 -9/38 Part 17: Regression Residuals

Noisy Data? What explains the variation in milk production other than number of cows?

Noisy Data? What explains the variation in milk production other than number of cows? 17 -10/38 Part 17: Regression Residuals

Assumptions p (Regression) The equation linking “Box Office” and “Buzz” is stable E[Box Office

Assumptions p (Regression) The equation linking “Box Office” and “Buzz” is stable E[Box Office | Buzz] = α + β Buzz p Another sample of movies, say 2012, would obey the same fundamental relationship. 17 -11/38 Part 17: Regression Residuals

Model Assumptions p yi = α + β xi + εi n n p

Model Assumptions p yi = α + β xi + εi n n p The Disturbance is Random Noise n n n 17 -12/38 α + β xi is the “regression function” εi is the “disturbance. It is the unobserved random component Mean zero. The regression is the mean of yi. εi is the deviation from the regression. Variance σ2. Part 17: Regression Residuals

We will use the data to estimate and β 17 -13/38 Part 17: Regression

We will use the data to estimate and β 17 -13/38 Part 17: Regression Residuals

We also want to estimate 2 =√E[εi 2] e=y-a-b. Buzz 17 -14/38 Part 17:

We also want to estimate 2 =√E[εi 2] e=y-a-b. Buzz 17 -14/38 Part 17: Regression Residuals

Standard Deviation of the Residuals p p p Standard deviation of εi = yi-α-βxi

Standard Deviation of the Residuals p p p Standard deviation of εi = yi-α-βxi is σ σ = √E[εi 2] (Mean of εi is zero) Sample a and b estimate α and β Residual ei = yi – a – bxi estimates εi Use √(1/N-2)Σei 2 to estimate σ. Why N-2? Relates to the fact that two parameters (α, β) were estimated. Same reason N-1 was used to compute a sample variance. 17 -15/38 Part 17: Regression Residuals

Residuals 17 -16/38 Part 17: Regression Residuals

Residuals 17 -16/38 Part 17: Regression Residuals

Summary: Regression Computations 17 -17/38 Part 17: Regression Residuals

Summary: Regression Computations 17 -17/38 Part 17: Regression Residuals

Using se to identify outliers Remember the empirical rule, 95% of observations will lie

Using se to identify outliers Remember the empirical rule, 95% of observations will lie within mean ± 2 standard deviations? We show (a+bx) ± 2 se below. ) This point is 2. 2 standard deviations from the regression. Only 3. 2% of the 62 observations lie outside the bounds. (We will refine this later. ) 17 -18/38 Part 17: Regression Residuals

17 -19/38 Part 17: Regression Residuals

17 -19/38 Part 17: Regression Residuals

Linear Regression Sample Regression Line 17 -20/38 Part 17: Regression Residuals

Linear Regression Sample Regression Line 17 -20/38 Part 17: Regression Residuals

17 -21/38 Part 17: Regression Residuals

17 -21/38 Part 17: Regression Residuals

17 -22/38 Part 17: Regression Residuals

17 -22/38 Part 17: Regression Residuals

Results to Report 17 -23/38 Part 17: Regression Residuals

Results to Report 17 -23/38 Part 17: Regression Residuals

The Reported Results 17 -24/38 Part 17: Regression Residuals

The Reported Results 17 -24/38 Part 17: Regression Residuals

Estimated equation 17 -25/38 Part 17: Regression Residuals

Estimated equation 17 -25/38 Part 17: Regression Residuals

Estimated coefficients a and b 17 -26/38 Part 17: Regression Residuals

Estimated coefficients a and b 17 -26/38 Part 17: Regression Residuals

S = se = estimated std. deviation of ε 17 -27/38 Part 17: Regression

S = se = estimated std. deviation of ε 17 -27/38 Part 17: Regression Residuals

Square of the sample correlation between x and y 17 -28/38 Part 17: Regression

Square of the sample correlation between x and y 17 -28/38 Part 17: Regression Residuals

N-2 = degrees of freedom N-1 = sample size minus 1 17 -29/38 Part

N-2 = degrees of freedom N-1 = sample size minus 1 17 -29/38 Part 17: Regression Residuals

Sum of squared residuals, Σi e i 2 17 -30/38 Part 17: Regression Residuals

Sum of squared residuals, Σi e i 2 17 -30/38 Part 17: Regression Residuals

S 2 = s e 2 17 -31/38 Part 17: Regression Residuals

S 2 = s e 2 17 -31/38 Part 17: Regression Residuals

17 -32/38 Part 17: Regression Residuals

17 -32/38 Part 17: Regression Residuals

17 -33/38 Part 17: Regression Residuals

17 -33/38 Part 17: Regression Residuals

The Model p Constructed to provide a framework for interpreting the observed data n

The Model p Constructed to provide a framework for interpreting the observed data n p What is the meaning of the observed relationship (assuming there is one) How it’s used n n 17 -34/38 Prediction: What reason is there to assume that we can use sample observations to predict outcomes? Testing relationships Part 17: Regression Residuals

A Cost Model Electricity. mpj Total cost in $Million Output in Million KWH N

A Cost Model Electricity. mpj Total cost in $Million Output in Million KWH N = 123 American electric utilities Model: Cost = α + βKWH + ε 17 -35/38 Part 17: Regression Residuals

Cost Relationship 17 -36/38 Part 17: Regression Residuals

Cost Relationship 17 -36/38 Part 17: Regression Residuals

Sample Regression 17 -37/38 Part 17: Regression Residuals

Sample Regression 17 -37/38 Part 17: Regression Residuals

Interpreting the Model Cost = 2. 44 + 0. 00529 Output + e p

Interpreting the Model Cost = 2. 44 + 0. 00529 Output + e p Cost is $Million, Output is Million KWH. p Fixed Cost = Cost when output = 0 Fixed Cost = $2. 44 Million p Marginal cost = Change in cost/change in output =. 00529 * $Million/Million KWH =. 00529 $/KWH = 0. 529 cents/KWH. p 17 -38/38 Part 17: Regression Residuals

Summary p Linear regression model n n p Estimating the parameters of the model

Summary p Linear regression model n n p Estimating the parameters of the model n n p 17 -39/38 Assumptions of the model Residuals and disturbances Regression parameters Disturbance standard deviation Computation of the estimated model Part 17: Regression Residuals