Linear Regression Mc GrawHillIrwin Copyright 2015 by The
Linear Regression Mc. Graw-Hill/Irwin Copyright © 2015 by The Mc. Graw-Hill Companies, Inc. All rights reserved.
Regression Analysis 2
Regression Analysis EXAMPLES § Assuming a linear relationship between the size of a home, measured in square feet, and the cost to heat the home in January, how does the cost vary relative to the size of the home? § In a study of automobile fuel efficiency, assuming a linear relationship between miles per gallon and the weight of a car, how does the fuel efficiency vary relative to the weight of a car? 3
Regression Analysis: Variables n 4
Regression Analysis – Example n LEAST SQUARES PRINCIPLE Determining a regression equation by minimizing the sum of the squares of the vertical distances between the actual Y values and the predicted values of Y. 5
Regression Analysis – Example Recall the example involving Copier Sales of America. The sales manager gathered information on the number of sales calls made and the number of copiers sold for a random sample of 15 sales representatives. Use the least squares method to determine a linear equation to express the relationship between the two variables. In this example, the number of sales calls is the independent variable, X, and the number of copiers sold is the dependent variable, Y. What is the expected number of copiers sold by a representative who made 20 calls? 6
Regression Analysis – Example Descriptive statistics: Correlation coefficient: 7
Regression Analysis - Example Step 1: Find the slope (b) of the line. Step 2: Find the y-intercept (a). Step 3: Create the regression equation. Number of Copiers Sold = 19. 9632 + 0. 2608 ( Number of Sales Calls) Step 4: What is the predicted number of sales if someone makes 20 sales calls? Number of Copiers Sold = 25. 1792 = 19. 9632 + 0. 2608(20) 8
Regression Analysis ANOVA (Excel) – Example a b Number of Copiers Sold = 19. 9800 + 0. 2606 ( Number of Sales Calls) 9
Regression Analysis: Testing the Significance of the Slope – Example Step 1: State the null and alternate hypotheses. H 0: β = 0 (the slope of the regression equation is 0) H 1: β ≠ 0 (the slope of the regression equation is not 0) Step 2: Select a level of significance. We select a. 05 level of significance. Step 3: Identify the test statistic. To test a hypothesis about the slope of a regression equation, we use the tstatistic. For this analysis, there will be n-2 degrees of freedom. 10
Regression Analysis: Testing the Significance of the Slope – Example Step 4: Formulate a decision rule. Reject H 0 if: t > t /2, n-2 or t < -t /2, n-2 t > t 0. 025, 13 or t < -t 0. 025, 13 t > 1. 771 or t < -1. 771 13 - 11
Regression Analysis: Testing the Significance of the Slope – Example Step 5: Take a sample, calculate the ANOVA (Excel), arrive at a decision. Decision: Reject the null hypothesis that the slope of the regression equation is equal to zero. 12
Regression Analysis: Testing the Significance of the Slope – Example Step 6: Interpret the result. For the regression equation that predicts the number of copier sales based on the number of sales calls, the data indicate that the slope, (0. 2606), is not equal to zero. Therefore, the slope can be interpreted and used to relate the dependent variable (number of copier sales) to the independent variable (number of sales calls). In fact, the value of the slope indicates that for an increase of 1 sales call, the number of copiers sold will increase 0. 2606. If a salesperson increases their number of sales calls by 10, the value of the slope indicates that the number of copiers sold is predicted to increase by 2. 606. As in correlation analysis, please note that this statistical analysis does not provide any evidence of a causal relationship. Another type of study is needed to test that hypothesis. 13
Regression Analysis: The Standard Error of Estimate n The standard error of estimate measures the scatter, or dispersion, of the observed values around the line of regression for a given value of X. n The standard error of estimate is important in the calculation of confidence and prediction intervals. n Formula used to compute the standard error: 14
Regression Analysis ANOVA: The Standard Error of Estimate – Example Recall the example involving Copier Sales of America. The sales manager determined the least squares regression. Determine the standard error of estimate as a measure of how well the values fit the regression line. 15
Regression Analysis: Coefficient of Determination The coefficient of determination (r 2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). It is the square of the coefficient of correlation. n n It ranges from 0 to 1. It does not provide any information on the direction of the relationship between the variables. 16
Regression Analysis: Coefficient of Determination – Example § The coefficient of determination, r 2, is 0. 748. It can be computed as the correlation coefficient, squared: (0. 865)2. § The coefficient of determination is expressed as a proportion or percent; we say that 74. 8 percent of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls. 17
Coefficient of Determination The Coefficient of Determination can also be computed based on its definition. We can divide the Regression Sum of Squares (the variation in the dependent variable explained by the regression equation) divided by the Total Sum of Squares (the total variation in the dependent variable). 18
Regression Analysis: Computing Interval Estimates for Y A regression equation is used to predict or estimate the population value of the dependent variable, Y, for a given X. In general, estimates of population parameters are subject to sampling error. Recall that confidence intervals account for sampling error by providing an interval estimate of a population parameter. In regression analysis, interval estimates are also used to provide a complete picture of the point estimate of Y for a given X by computing an interval estimate that accounts for sampling error. In regression analysis, there are two types of intervals: § § A confidence interval reports the interval estimate for the mean value of Y for a given X. A prediction interval reports the interval estimate for an individual value of Y for a particular value of X. 19
Regression Analysis: Computing Interval Estimates for Y Assumptions underlying linear regression: n n n For each value of X, the Y values are normally distributed. The means of these normal distributions of Y values all lie on the regression line. The standard deviations of these normal distributions are equal. The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values. 20
Regression Analysis: Computing Interval Estimates for Y – Example We return to the Copier Sales of America illustration. Determine a 95 percent confidence interval for all sales representatives, that is, the population mean number of copiers sold, who make 50 sales calls. *Note the values of “a” and “b” differ from the EXCEL values due to rounding. Thus, the 95% confidence interval for all sales representatives who make 50 calls is from 27. 3942 up to 38. 6122. To interpret, let’s round the values. For all sales representative who make 50 calls, the predicted mean number of copiers sold is 33. The mean sales will range from 27 to 39 copiers. 21
Regression Analysis: Computing Interval Estimates for Y – Example Comments on calculation: The t-statistic is 2. 160 based on a two-tailed test with n – 2 = 15 – 2 = 13 degrees of freedom. The only new value is . Note that the width of the interval or the margin of error when predicting the dependent variable is related to the standard error of the estimate. *Note the values of “a” and “b” differ from the EXCEL values due to rounding. 22
Regression Analysis: Computing Interval Estimates for Y – Example We return to the Copier Sales of America illustration. Determine a 95 percent prediction interval for individual sales representatives, such as Sheila Baker, who makes 50 sales calls. *Note the values of “a” and “b” differ from the EXCEL values due to rounding. Thus, the prediction interval of copiers sold by an individual sales person, such as Sheila Baker, who makes 50 sales calls is from 17. 442 up to 48. 5644 copiers. Rounding these results, the predicted number of copiers sold will be between 17 and 49. This interval is quite large. It is much larger than the confidence interval for all sales representatives who made 50 calls. It is logical, however, that there should be more variation in the sales estimate for an individual than for the mean of a group. 23
Regression Analysis: Computing Interval Estimates for Y – Example Comments on calculation: The t-statistic is 2. 160 based on a two-tailed test with n – 2 = 15 – 2 = 13 degrees of freedom. The only new value is . Note that the width of the interval or the margin of error when predicting the dependent variable is related to the standard error of the estimate. Also, note that the prediction interval is wider because 1 is added to the sum under the square root sign. 24
Regression Analysis: Computing Interval Estimates for Y Prediction Intervals Confidence Intervals 13 - 25
Regression Analysis: Transforming Non-linear Relationships One of the assumptions of regression analysis is that the relationship between the dependent and independent variables is LINEAR. Sometimes, two variables have a NON-LINEAR relationship. When this occurs, the data can be transformed to create a linear relationship. The regression analysis is applied on the transformed variables.
Regression Analysis: Transforming Non-Linear Relationships In this case, the dependent variable, sales, is transformed to the log(sales). The graph shows that the relationship between log(sales) and price is linear. Now regression analysis can be used to create the regression equation between log(sales) and price.
- Slides: 27