Regression analysis Linear regression Logistic regression Relationship and

  • Slides: 32
Download presentation
Regression analysis Linear regression Logistic regression

Regression analysis Linear regression Logistic regression

Relationship and association 2

Relationship and association 2

Straight line 3

Straight line 3

Best straight line? 4

Best straight line? 4

Best straight line! Least square estimation 5

Best straight line! Least square estimation 5

Simple linear regression 1. Is the association linear? 6

Simple linear regression 1. Is the association linear? 6

Simple linear regression 1. Is the association linear? 2. Describe the association: what is

Simple linear regression 1. Is the association linear? 2. Describe the association: what is b 0 and b 1 BMI = -12. 6 kg/m 2+0. 35 kg/m 3*Hip 7

Simple linear regression 1. Is the association linear? 2. Describe the association 3. Is

Simple linear regression 1. Is the association linear? 2. Describe the association 3. Is the slope significantly different from 0? Help SPSS!!! Coefficientsa Standardized Unstandardized Coefficients Model 1 B (Constant) Hip Std. Error -12, 581 2, 331 , 345 , 023 Coefficients t Beta , 565 Sig. -5, 396 , 000 15, 266 , 000 a. Dependent Variable: BMI 8

Simple linear regression 1. Is the association linear? 2. Describe the association 3. Is

Simple linear regression 1. Is the association linear? 2. Describe the association 3. Is the slope significantly different from 0? 4. How good is the fit? How far are the data points fom the line on avarage? 9

The Correlation Coefficient, r R=0 R=1 R = 0. 7 R = -0. 5

The Correlation Coefficient, r R=0 R=1 R = 0. 7 R = -0. 5 10

r 2 – Goodness of fit How much of the variation can be explained

r 2 – Goodness of fit How much of the variation can be explained by the model? R 2 =0 =1 R 2 = 0. 5 R 2 = 0. 2 11

Multiple linear regression Could waist measure descirbe some of the variation in BMI? BMI

Multiple linear regression Could waist measure descirbe some of the variation in BMI? BMI =1. 3 kg/m 2 + 0. 42 kg/m 3 * Waist Or even better: 12

Multiple linear regression Adding age: adj R 2 = 0. 352 Coefficientsa Model 1

Multiple linear regression Adding age: adj R 2 = 0. 352 Coefficientsa Model 1 Unstandardized Standardized 95, 0% Confidence Interval Coefficients for B B (Constant) Std. Error t Beta -9, 001 2, 449 Waist , 168 , 043 Hip , 252 Age -, 064 Sig. Lower Bound Upper Bound -3, 676 , 000 -13, 813 -4, 190 , 201 3, 923 , 000 , 084 , 252 , 031 , 411 8, 012 , 000 , 190 , 313 , 018 -, 126 -3, 492 , 001 -, 101 -, 028 a. Dependent Variable: BMI Coefficientsa Unstandardized Standardized 95, 0% Confidence Interval Coefficients for B Adding thigh: adj R 2 = 0. 352? Model 1 B (Constant) Waist Age Thigh Std. Error 3, 581 1, 784 , 168 , 043 -, 064 , 252 t Beta Sig. Lower Bound Upper Bound 2, 007 , 045 , 075 7, 086 , 201 3, 923 , 000 , 084 , 252 , 018 -, 126 -3, 492 , 001 -, 101 -, 028 , 031 , 411 8, 012 , 000 , 190 , 313 a. Dependent Variable: BMI 13

Assumptions 1. Dependent variable must be metric continuous 2. Independent must be continuous or

Assumptions 1. Dependent variable must be metric continuous 2. Independent must be continuous or ordinal 3. Linear relationship between dependent and all independent variables 4. Residuals must have a constant spread. 5. Residuals are normal distributed 6. Independent variables are not perfectly correlated with each other 14

Non-parametric correlation 15

Non-parametric correlation 15

Ranked Correlation Kendall’s Spearman’s rs Korrelation koefficienten er mellem -1 og 1. Hvor -1

Ranked Correlation Kendall’s Spearman’s rs Korrelation koefficienten er mellem -1 og 1. Hvor -1 er perfekt omvendt korrelation, 0 betyder ingen korrelation, og 1 betyder perfekt korrelation. Pearson is the correlation method for normal data Remember the assumptions: 1. Dependent variable must be metric continuous 2. Independent must be continuous or ordinal 3. Linear relationship between dependent and all independent variables 4. Residuals must have a constant spread. 5. Residuals are normal distributed 16

Kendall’s - Et eksempel 17

Kendall’s - Et eksempel 17

Kendall’s - Et eksempel 18

Kendall’s - Et eksempel 18

Spearman – det samme eksempel d 2 1 4 9 1 1 1 9

Spearman – det samme eksempel d 2 1 4 9 1 1 1 9 9 1 16 19

Korrelation i SPSS 20

Korrelation i SPSS 20

Korrelation i SPSS Correlations a a Pearson b 1 , 685* Kendall's tau_b a

Korrelation i SPSS Correlations a a Pearson b 1 , 685* Kendall's tau_b a Sig. (2 -tailed) b Pearson Sig. (2 -tailed) , 029 10 10 N 1, 000 , 511* , 685* 1 . N b Correlation , 040 10 10 , 511* 1, 000 Coefficient Correlation Sig. (2 -tailed) b Coefficient Correlation N Correlation a Sig. (2 -tailed) , 029 10 *. Correlation is significant at the 0. 05 level (2 -tailed). , 040. N 10 Spearman's rho a Correlation 10 10 1, 000 , 685* Coefficient Sig. (2 -tailed) N b Correlation . , 029 10 10 , 685* 1, 000 Coefficient Sig. (2 -tailed) N , 029. 10 10 *. Correlation is significant at the 0. 05 level (2 -tailed). 21

Logistic regression 22

Logistic regression 22

Logistic Regression • • • If the dependent variable is categorical and especially binary?

Logistic Regression • • • If the dependent variable is categorical and especially binary? Use some interpolation method Linear regression cannot help us. 23

The sigmodal curve 24

The sigmodal curve 24

The sigmodal curve • The intercept basically just ‘scale’ the input variable 25

The sigmodal curve • The intercept basically just ‘scale’ the input variable 25

The sigmodal curve • • The intercept basically just ‘scale’ the input variable Large

The sigmodal curve • • The intercept basically just ‘scale’ the input variable Large regression coefficient → risk factor strongly influences the probability 26

The sigmodal curve • • The intercept basically just ‘scale’ the input variable Large

The sigmodal curve • • The intercept basically just ‘scale’ the input variable Large regression coefficient → risk factor strongly influences the probability Positive regression coefficient → risk factor increases the probability Logistic regession uses maximum likelihood estimation, not least square estimation 27

Does age influence the diagnosis? Continuous independent variable Variables in the Equation 95% C.

Does age influence the diagnosis? Continuous independent variable Variables in the Equation 95% C. I. for EXP(B) B Step 1 a Age Constant S. E. Wald df Sig. Exp(B) , 109 , 010 108, 745 1 , 000 1, 115 -4, 213 , 423 99, 097 1 , 000 , 015 Lower 1, 092 Upper 1, 138 a. Variable(s) entered on step 1: Age. 28

Does previous intake of OCP influence the diagnosis? Categorical independent variable Variables in the

Does previous intake of OCP influence the diagnosis? Categorical independent variable Variables in the Equation 95% C. I. for EXP(B) B Step 1 a OCP(1) Constant S. E. Wald df Sig. Exp(B) -, 311 , 180 2, 979 1 , 084 , 733 , 233 , 123 3, 583 1 , 058 1, 263 Lower , 515 Upper 1, 043 a. Variable(s) entered on step 1: OCP. 29

Odds ratio 30

Odds ratio 30

Multiple logistic regression Variables in the Equation 95% C. I. for EXP(B) B Step

Multiple logistic regression Variables in the Equation 95% C. I. for EXP(B) B Step 1 a S. E. Wald df Sig. Exp(B) Lower Upper Age , 123 , 011 115, 343 1 , 000 1, 131 1, 106 1, 157 BMI , 083 , 019 18, 732 1 , 000 1, 087 1, 046 1, 128 OCP , 528 , 219 5, 808 1 , 016 1, 695 1, 104 2, 603 -6, 974 , 762 83, 777 1 , 000 , 001 Constant a. Variable(s) entered on step 1: Age, BMI, OCP. 31

Predicting the diagnosis by logistic regression What is the probability that the tumor of

Predicting the diagnosis by logistic regression What is the probability that the tumor of a 50 year old woman who has been using OCP and has a BMI of 26 is malignant? z = -6. 974 + 0. 123*50 + 0. 083*26 + 0. 28*1 = 1. 6140 p = 1/(1+e-1. 6140) = 0. 8340 Variables in the Equation 95% C. I. for EXP(B) B Step 1 a S. E. Wald df Sig. Exp(B) Lower Upper Age , 123 , 011 115, 343 1 , 000 1, 131 1, 106 1, 157 BMI , 083 , 019 18, 732 1 , 000 1, 087 1, 046 1, 128 OCP , 528 , 219 5, 808 1 , 016 1, 695 1, 104 2, 603 -6, 974 , 762 83, 777 1 , 000 , 001 Constant a. Variable(s) entered on step 1: Age, BMI, OCP. 32