Everyday is a new beginning in life Every
Everyday is a new beginning in life. Every moment is a time for self vigilance. 1
Simple Linear Regression l. Scatterplot l. Regression equation l. Correlation 2
Example: Computer Repair A company markets and repairs small computers. How fast (Time) an electronic component (Computer Unit) can be repaired is very important to the efficiency of the company. The Variables in this example are: Time and Units. 3
Humm… How long will it take me to repair this unit? Goal: to predict the length of repair Time for a given number of computer Units 4
Computer Repair Data Units Min’s 1 23 6 97 2 29 7 109 3 49 8 119 4 64 9 149 4 74 9 145 5 87 10 154 6 96 10 166 5
Graphical Summary of Two Quantitative Variable Scatterplot of response variable against explanatory variable l l 6 What is the overall (average) pattern? What is the direction of the pattern? How much do data points vary from the overall (average) pattern? Any potential outliers?
Summary for Computer Repair Data Scatterplot (Time vs Units) Some Simple Conclusions l l 7 Time is Linearly related with computer Units. (The length of) Time is Increasing as (the number of) Units increases. Data points are closed to the line. No potential outlier.
Numerical Summary of Two Quantitative Variable 8 l Regression equation l Correlation
Review: Math Equation for a Line l l Y: the response variable X: the explanatory variable Y=b 0+b 1 X Y } b 1 1 } b 0 9 X
Regression Equation 10 l The regression line models the relationship between X and Y on average. l The math equation of a regression line is called regression equation.
The Predicted Y Value l l We use the regression line to estimate the average Y value for a specified X value and use this Y value to predict what Y value we might observe at this X value in the near future. This predicted Y value, denoted as and pronounced as “y hat, ” is the Y value on the regression line. So, Regression equation 11
The Usage of Regression Equation Predict the value of Y for a given X value Eg. Wish to predict a lady’s weight by her height. ** What is X? Y? ** Suppose b 0 = -205 and b 1 = 5: ** For ladies with HT of 60”, their WT will be predicted as b 0+b 1 x 60=95 pounds, the (estimated) average WT of all ladies with HT of 60’’. l 12
The Usage of Regression Equation Eg. How long will it take to repair 3 computer units? ** Suppose b 0= 4. 16 and b 1=15. 51: ** the predicted time = 4. 16+15. 51 x 3 = 50. 69 ** It will take about 50. 69 minutes. 13
Examples of the Predicted Y • The predicted WT of a given HT • The predicted repair time of a given # of units 14
The Limitation of the Regression Equation l The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed. Eg. Given HT of 40”, the regression equation will give us WT of -205+5 x 40 = -5 pounds!! 15
The Unpredicted Part l 16 The value is the part the regression equation (model) cannot catch, and it is called “residual. ”
residual { 17
Correlation between X and Y l 18 X and Y might be related to each other in many ways: linear or curved.
Examples of Different Levels of Correlation r =. 98 Strong Linearity r =. 71 Median Linearity 19
Examples of Different Levels of Correlation r = -. 09 Nearly Uncorrelated r =. 00 Nearly Curved 20
Correlation Coefficient of X and Y l l 21 A measurement of the strength of the “LINEAR” association between X and Y Sx: the standard deviation of the data values in X, Sy: the standard deviation of the data values in Y; the correlation coefficient of X and Y is:
Correlation Coefficient of X and Y l l l -1< r < 1 The magnitude of r measures the strength of the linear association of X and Y, which is the overall closeness of the points to a line. The sign of r indicate the direction of the association: “-” negative association “+” positive association ** visit the previous 4 plots again 22
Correlation Coefficient l l 23 The value r is almost 0 the best line to fit the data points is exactly horizontal the value of X won’t change our prediction on Y The value r is almost 1 A line fits the data points almost perfectly.
Correlation does not Prove Causation Four Ways to interpret an observed association: l Causation l There might be causation, but other variables contribute as well l The association is explained by how other variables affect X and Y l Y is causing a change in X 24
Table for Computing Mean, St. Deviation, and Corr. Coef. i 1 2 … n … …. Total 25
Example: Computer Repair Time 26
Exercise (1) Fill the following table, then compute the mean and st. deviation of Y and X (2) Compute the corr. coef. of Y and X (3) Draw a scatterplot i 1 -. 3 . 09 . 1 -. 9 . 81 . 27 2 -. 2 . 04 . 4 -. 6 . 36 . 12 3 -. 1 . 01 . 7 4 . 1 . 01 1. 2 5 . 2 . 04 1. 6 6 . 3 . 09 2. 0 Total 0 * . 1 6. 0 * 27
The Influence of Outliers 28 l The slope becomes larger (toward the outlier) l The size of r becomes smaller
The Influence of Outliers 29 l The slope becomes clear (toward outliers) l The size of r becomes larger (more linear: 0. 159 0. 935)
- Slides: 29