Relationship between two continuous variables correlations and linear
Relationship between two continuous variables: correlations and linear regression both continuous. Correlation – larger values of one variable correspond to larger/smaller values of the other variable. r measures the strenght. From plus one to minus one, zero – no relationship; one – on a straight line. p measures stat significance, significance of r differing from zero. Parameetric or Pearson correlation assumes normal distribution of both variables.
We start calculating Pearson’s r from calculating covariance: . . . which is not convenient, so let’s rescale it: . . and the result is between – 1 ja +1
r = 0. 96 r = -0. 96 r = 0. 53 r = -0. 83 r = 0. 43 r = -1
Non-parametric correlation relies on ranks. Single observations far away do not disturb. Usually Spearman’s (rank) correlation. Power is lower, But also real differences – what to think about a non-linear relationship? Ordinal variables. Philosophical aspect – we can describe the same thing differently in mathematical terms!
We report the result: “between …. there was a correlation (r= , N=, p= )” or if non-parametric then “. . . (rs= ; N=, p= )” Symmetrical and dimensionless. To appoximate the relationship by a function regression. Least-squares method – residuals predicting: predicted value. The fitted line has two parameters: intercept and slope (b). Slope has a unit, value depends on the units of the axes.
eggs laid weight, kg y = 2, 04 x – 1, 2
wool production, kg hours basked y = -0, 195 x + 7, 1
Test following the path of ANOVA F=MSmodel/MSerror SStotal=SSmodel+SSerror, R 2 = SSmodel/SStotal model acconts. . . % of variance. Two ways to express strength – slope and R 2 , p does not measure the strength of the relationship.
Presenting results “weight depended on length (b=. . . , R 2=. . . , df=. . . , F=. . . , p<0. 001)” equation: length = 3. 78*temperature + 47. 6 Standard error of slope Intercept zero – proportional, if x changes k times, then also y changes k. Regression is not symmetrical!
Assumptions of regression analysis are as follows: - residuals should be normally distributed; - variance of residuals must be independent on the values of x – otherwise heteroscedastic. - no other dependence on x; Distribution of x variable not important. Transformations – but do not forget when writing the equation. Regression through the origin.
- Slides: 10