Chapter 11 Correlation and Regression The Mc GrawHill

  • Slides: 46
Download presentation
Chapter 11 Correlation and Regression © The Mc. Graw-Hill Companies, Inc. , 2000

Chapter 11 Correlation and Regression © The Mc. Graw-Hill Companies, Inc. , 2000

Outline l 11 -1 Introduction l 11 -2 Scatter Plots l 11 -3 Correlation

Outline l 11 -1 Introduction l 11 -2 Scatter Plots l 11 -3 Correlation l 11 -4 Regression © The Mc. Graw-Hill Companies, Inc. , 2000

Objectives l l l Draw a scatter plot for a set of ordered pairs.

Objectives l l l Draw a scatter plot for a set of ordered pairs. Find the correlation coefficient. Find the equation of the regression line. © The Mc. Graw-Hill Companies, Inc. , 2000

v. Correlation Relation exists between two variables when one of them is related to

v. Correlation Relation exists between two variables when one of them is related to the other in some way © The Mc. Graw-Hill Companies, Inc. , 2000

l l l Def. : The coefficient of correlation (r) is a numerical measure

l l l Def. : The coefficient of correlation (r) is a numerical measure of the strength of the linear relationship between 2 variables. Values of r are always between -1 & 1; i. e. , between 0 and 1 in absolute value. r = 0 means no correlation; r = +-1 means perfect correlation; both rare. © The Mc. Graw-Hill Companies, Inc. , 2000

Definition of correlation l l Two variable are said to be correlated if they

Definition of correlation l l Two variable are said to be correlated if they tends to simultaneously vary in some directions. If both variables tends to increase or decrease together the correlation is said to be direct of positive. The length of an iron bar will increase as the temperature increase. If one variable tends to increase as the other variable decrease the correlation is said to be negative or inverse correlation like the volume of gas will decrease as the pressure increases © The Mc. Graw-Hill Companies, Inc. , 2000

Range of Values for the Correlation Coefficient Strong negative relationship No linear relationship Strong

Range of Values for the Correlation Coefficient Strong negative relationship No linear relationship Strong positive relationship © The Mc. Graw-Hill Companies, Inc. , 2000

Scatter Plots l A scatter plot is a graph of the ordered pairs (x,

Scatter Plots l A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable, x, and the dependent variable, y. © The Mc. Graw-Hill Companies, Inc. , 2000

Correlation A relationship between two variables Explanatory (Independent) Variable x Response (Dependent) Variable y

Correlation A relationship between two variables Explanatory (Independent) Variable x Response (Dependent) Variable y Hours of Training Number of Accidents Shoe Size Height Cigarettes smoked per day Lung Capacity Score on SAT Grade Point Average Height IQ What type of relationship exists between the two variables and is the correlation significant? © The Mc. Graw-Hill Companies, Inc. , 2000

Scatter Plots and Types of Correlation x = hours of training y = number

Scatter Plots and Types of Correlation x = hours of training y = number of accidents 60 50 Accident s 40 30 20 10 0 0 2 4 6 8 10 12 14 16 18 20 Hours of Training Negative Correlation–as x increases, y decreases © The Mc. Graw-Hill Companies, Inc. , 2000

Scatter Plots and Types of Correlation x = SAT score y = GPA 4.

Scatter Plots and Types of Correlation x = SAT score y = GPA 4. 00 3. 75 3. 50 3. 25 GPA 3. 00 2. 75 2. 50 2. 25 2. 00 1. 75 1. 50 300 350 400 450 500 550 600 650 700 750 800 Math SAT Positive Correlation–as x increases, y©increases The Mc. Graw-Hill Companies, Inc. , 2000

IQ x = height y = IQ 160 150 140 130 120 110 100

IQ x = height y = IQ 160 150 140 130 120 110 100 90 80 60 64 68 Height 72 76 80 No linear correlation © The Mc. Graw-Hill Companies, Inc. , 2000

No Relationship Y y 10 10 55 00 00 10 10 20 20 30

No Relationship Y y 10 10 55 00 00 10 10 20 20 30 30 x X 40 40 50 50 60 60 70 70 © The Mc. Graw-Hill Companies, Inc. , 2000

Positive Correlation N 1 Ht. In. Wt. Lbs. 60 102 2 62 120 3

Positive Correlation N 1 Ht. In. Wt. Lbs. 60 102 2 62 120 3 63 130 4 65 150 5 65 120 6 68 145 7 69 175 8 70 170 9 72 185 10 74 210 When one variable increases, the other also increases. Example of a Positive © The Mc. Graw-Hill Companies, Inc. , Correlation 2000

Negative Correlation N Study Time Minutes 1 2 3 4 5 6 7 8

Negative Correlation N Study Time Minutes 1 2 3 4 5 6 7 8 9 10 90 100 130 150 180 200 220 300 350 400 # Error 25 28 20 20 15 12 13 10 8 6 When one variable increases, the other decreases. Example of a Negative Correlation © The Mc. Graw-Hill Companies, Inc. , 2000

Perfect Positive Correlation (r = 1 Cars sold $ 10 1000 2 15 1500

Perfect Positive Correlation (r = 1 Cars sold $ 10 1000 2 15 1500 3 20 2000 4 25 2500 5 30 3000 6 35 3500 7 40 4000 8 45 4500 N 1 Notice the straight line. When r=+1 or -1, all the points will fall on a line. © The Mc. Graw-Hill Companies, Inc. , 2000

Correlation Coefficient A measure of the strength and direction of a linear relationship between

Correlation Coefficient A measure of the strength and direction of a linear relationship between two variables The range of r is from – 1 to 1. – 1 If r is close to – 1 there is a strong negative correlation. 0 If r is close to 0 there is no linear correlation. 1 If r is close to 1 there is a strong positive correlation. © The Mc. Graw-Hill Companies, Inc. , 2000

Notation for the Linear Correlation Coefficient n number of pairs of data presented. denotes

Notation for the Linear Correlation Coefficient n number of pairs of data presented. denotes the addition of the items indicated. x denotes the sum of all x values. x 2 indicates that each x score should be squared and then those squares added. ( x)2 indicates that the x scores should be added and the total then squared. xy indicates that each x score should be first multiplied by its corresponding y score. After obtaining all such products, their sum. r represents linear correlation coefficient for a sample represents linear correlation coefficient for a population © The Mc. Graw-Hill Companies, Inc. , find 2000

Application Final Grade Absences 95 90 85 80 75 70 65 60 55 50

Application Final Grade Absences 95 90 85 80 75 70 65 60 55 50 45 40 0 2 4 6 8 10 12 Absences X 14 16 x 8 2 5 12 15 9 6 Final Grade y 78 92 90 58 43 74 81 © The Mc. Graw-Hill Companies, Inc. , 2000

Computation of r 1 2 3 4 5 6 7 x 8 2 5

Computation of r 1 2 3 4 5 6 7 x 8 2 5 12 15 9 6 57 y 78 92 90 58 43 74 81 516 xy 624 184 450 696 645 666 486 3751 x 2 64 4 25 144 225 81 36 579 y 2 6084 8464 8100 3364 1849 5476 6561 39898 © The Mc. Graw-Hill Companies, Inc. , 2000

l The value or r that is computed represents the correlation coefficient of the

l The value or r that is computed represents the correlation coefficient of the sample. Have students interpret this result. Since r is close to -1, there is a strong negative correlation. As the number of absences increase, grades tend to decrease. Since there are 7 ordered pairs, n = 7. © The Mc. Graw-Hill Companies, Inc. , 2000

Scatter Plots - Example l l Construct a scatter plot for the data obtained

Scatter Plots - Example l l Construct a scatter plot for the data obtained in a study of age and systolic blood pressure of six randomly selected subjects. The data is given on the next slide. © The Mc. Graw-Hill Companies, Inc. , 2000

Scatter Plots - Example © The Mc. Graw-Hill Companies, Inc. , 2000

Scatter Plots - Example © The Mc. Graw-Hill Companies, Inc. , 2000

Scatter Plots - Example Positive Relationship © The Mc. Graw-Hill Companies, Inc. , 2000

Scatter Plots - Example Positive Relationship © The Mc. Graw-Hill Companies, Inc. , 2000

Formula for the Correlation Coefficient r r n xy x y n x x

Formula for the Correlation Coefficient r r n xy x y n x x n y y 2 2 Where n is the number of data pairs © The Mc. Graw-Hill Companies, Inc. , 2000

Correlation Coefficient Example (Verify) l Compute the correlation coefficient for the age and blood

Correlation Coefficient Example (Verify) l Compute the correlation coefficient for the age and blood pressure data. © The Mc. Graw-Hill Companies, Inc. , 2000

Def : Regression Def: The relation between the expected value of the dependent variable

Def : Regression Def: The relation between the expected value of the dependent variable and the independent variable is called regression relation. Types of regression: I. Simple regression : When the dependent variable depends on a single independent variable called simple or two variable regression. II. Multiple regression: When the dependent variable depends on two or more than two independent variable is called multiple regression. © The Mc. Graw-Hill Companies, Inc. , 2000

A regression line can be used to show mathematically how variables are related. To

A regression line can be used to show mathematically how variables are related. To determine the equation of a line, we need to find slope and Y-intercept. l Example: Pizza House builds restaurants near college campuses. l Before building another one, it plans to use l X = student enrollment (1000 s) to estimate l Y = quarterly sales ($1000 s). l A sample of 6 existing restaurants is chosen. © The Mc. Graw-Hill Companies, Inc. , 2000

Resulting data pairs are shown below. X 4 6 9 11 12 15 Y

Resulting data pairs are shown below. X 4 6 9 11 12 15 Y 95 155 140 210 250 260 © The Mc. Graw-Hill Companies, Inc. , 2000

X Y XY 4 95 380 6 155 930 9 140 1260 11 210

X Y XY 4 95 380 6 155 930 9 140 1260 11 210 2310 12 250 3000 15 260 3900 SUM 57 1110 11780 X 2 Y 2 16 9025 36 24025 81 19600 121 44100 144 62500 225 67600 623 226850 © The Mc. Graw-Hill Companies, Inc. , 2000

Formulas for the Regression Line y = a + bx. y x x xy

Formulas for the Regression Line y = a + bx. y x x xy a n x x n xy x y b n x x 2 2 2 Where a is the y intercept and b is the slope of the line. © The Mc. Graw-Hill Companies, Inc. , 2000

Regression The scatter plot for the age and blood pressure data displays a linear

Regression The scatter plot for the age and blood pressure data displays a linear pattern. We can model this relationship with a straight line. This regression line is called the line of best fit or the regression line. The equation of the line is y = a + bx. © The Mc. Graw-Hill Companies, Inc. , 2000

Find the regression line from the following data © The Mc. Graw-Hill Companies, Inc.

Find the regression line from the following data © The Mc. Graw-Hill Companies, Inc. , 2000

Formulas for the Regression Line y = a + bx. y x x xy

Formulas for the Regression Line y = a + bx. y x x xy a n x x n xy x y b n x x 2 2 2 Where a is the y intercept and b is the slope of the line. © The Mc. Graw-Hill Companies, Inc. , 2000

Example l l Find the equation of the regression line for the age and

Example l l Find the equation of the regression line for the age and the blood pressure data. Substituting into the formulas give a = 81. 048 and b = 0. 964 (verify). Hence, y = 81. 048 + 0. 964 x. Note, a represents the intercept and b the slope of the line. © The Mc. Graw-Hill Companies, Inc. , 2000

Example y = 81. 048 + 0. 964 x © The Mc. Graw-Hill Companies,

Example y = 81. 048 + 0. 964 x © The Mc. Graw-Hill Companies, Inc. , 2000

Using the Regression Line to Predict l l The regression line can be used

Using the Regression Line to Predict l l The regression line can be used to predict a value for the dependent variable (y) for a given value of the independent variable (x). Caution: Use x values within the experimental region when predicting y values. © The Mc. Graw-Hill Companies, Inc. , 2000

Example l l l Use the equation of the regression line to predict the

Example l l l Use the equation of the regression line to predict the blood pressure for a person who is 50 years old. Since y = 81. 048 + 0. 964 x, then y = 81. 048 + 0. 964(50) = 129. 248 129. Note that the value of 50 is within the range of x values. © The Mc. Graw-Hill Companies, Inc. , 2000

Prediction Interval - Example A 1 $62 B 2 $78 C 3 $70 D

Prediction Interval - Example A 1 $62 B 2 $78 C 3 $70 D 4 $90 E 4 $93 F 6 $103 © The Mc. Graw-Hill Companies, Inc. , 2000

The Line of Regression l l Once you know there is a significant linear

The Line of Regression l l Once you know there is a significant linear correlation, you can write an equation describing the relationship between the x and y variables. This equation is called the line of regression or least squares line. The equation of a line may be written as y = mx + b where m is the slope of the line and b is the y-intercept The line of regression is The slope m is: The y-intercept is: © The Mc. Graw-Hill Companies, Inc. , 2000

Application Final Grade Absences 95 90 85 80 75 70 65 60 55 50

Application Final Grade Absences 95 90 85 80 75 70 65 60 55 50 45 40 0 2 4 6 8 10 12 Absences X 14 16 x 8 2 5 12 15 9 6 Final Grade y 78 92 90 58 43 74 81 © The Mc. Graw-Hill Companies, Inc. , 2000

1 2 3 4 5 6 7 x y xy 8 78 624 2

1 2 3 4 5 6 7 x y xy 8 78 624 2 92 184 5 90 450 12 58 696 15 43 645 9 74 666 6 81 486 57 516 3751 x 2 y 2 64 6084 4 8464 25 8100 144 3364 225 1849 81 5476 36 6561 579 39898 The line of regression is: Write the equation of the line of regression with x = number of absences and y = final grade. Calculate m and b. © The Mc. Graw-Hill Companies, Inc. , 2000 = – 3. 924 x + 105. 667

The Line of Regression m = – 3. 924 and b = 105. 667

The Line of Regression m = – 3. 924 and b = 105. 667 Final Grade The line of regression is: 95 90 85 80 75 70 65 60 55 50 45 40 0 2 4 6 8 10 12 14 16 Absences Note that the point = (8. 143, 73. 714) is on the line. © The Mc. Graw-Hill Companies, Inc. , 2000

Multiple Regression More Explanatory Variables Absence 8 IQ Grade 115 78 2 135 92

Multiple Regression More Explanatory Variables Absence 8 IQ Grade 115 78 2 135 92 5 126 90 12 110 58 15 105 43 9 120 74 6 125 81 © The Mc. Graw-Hill Companies, Inc. , 2000

Minitab Output Regression Analysis The regression equation is Grade = 52. 7 – 2.

Minitab Output Regression Analysis The regression equation is Grade = 52. 7 – 2. 65 absence + 0. 357 IQ Predictor Coef St. Dev Constant 52. 720 – 2. 652 0. 357 86. 110 2. 111 0. 580 Absence IQ S = 4. 603 R-Sq = 95. 4% T 0. 61 – 1. 26 0. 62 P 0. 573 0. 277 0. 571 © The Mc. Graw-Hill Companies, Inc. , 2000 R-Sq(adj) = 93. 2%

Interpretation The regression equation is Grade = 52. 7 – 2. 65 absence +

Interpretation The regression equation is Grade = 52. 7 – 2. 65 absence + 0. 357 IQ When other variables are 0, the grade is 52. 7. If IQ is held constant, each time there is one more absence the predicted grade will decrease by 2. 65 points. If number of absences is held constant, and IQ is increased by one point the predicted grade will increase by 0. 357 points. © The Mc. Graw-Hill Companies, Inc. , 2000