Multivariate Data Summary Linear Regression and Correlation Pearsons
- Slides: 56
Multivariate Data Summary
Linear Regression and Correlation
Pearson’s correlation coefficient r.
Slope and Intercept of the Least Squares line
Scatter Plot Patterns r = 0. 0 r = +0. 9 r = +0. 7 r = +1. 0
Non-Linear Patterns r can take on arbitrary values between -1 and +1 if the pattern is non-linear depending or how well your can fit a straight line to the pattern
The Coefficient of Determination
An important Identity in Statistics (Total variability in Y) = (variability in Y explained by X) + (variability in Y unexplained by X)
It can also be shown: = proportion variability in Y explained by X. = the coefficient of determination
Categorical Data Techniques for summarizing, displaying and graphing
The frequency table The bar graph Suppose we have collected data on a categorical variable X having k categories – 1, 2, … , k. To construct the frequency table we simply count for each category (i) of X, the number of cases falling in that category (fi) To plot the bar graph we simply draw a bar of height fi above each category (i) of X.
Example In this example data has been collected for n = 34, 188 subjects. • The purpose of the study was to determine the relationship between the use of Antidepressants, Mood medication, Anxiety medication, Stimulants and Sleeping pills. • In addition the study interested in examining the effects of the independent variables (gender, age, income, education and role) on both individual use of the medications and the multiple use of the medications.
The variables were: 1. Antidepressant use, 2. Mood medication use, 3. Anxiety medication use, 4. Stimulant use and 5. Sleeping pills use. 6. gender, 7. age, 8. income, 9. education and 10. Role – i. iii. iv. Parent, worker, partner All variables were measured on a Categorical Scale v. viii. worker only Parent only Partner only No roles
Frequency Table for Age
Bar Graph for Age
Frequency Table for Role
Bar Graph for Role
The pie chart • An alternative to the bar chart • Draw a circle (a pie) • Divide the circle into segments with area of each segment proportional to fi or pi = fi /n
Example • In this study the population are individuals who received a head injury. (n = 22540) • The variable is the mechanism that caused the head injury (Inj. Mech) with categories: – – – MVA (Motor vehicle accident) Falls Violence Other VA (Other vehicle accidents) Accidents (industrial accident) Other (all other mechanisms for head injury)
Graphical and Tabular Display of Categorical Data. • The frequency table • The bar graph • The pie chart
The frequency table
The bar graph
The pie chart
Multivariate Categorical Data
The two way frequency table The c 2 statistic Techniques for examining dependence amongst two categorical variables
Situation • • We have two categorical variables R and C. The number of categories of R is r. The number of categories of C is c. We observe n subjects from the population and count xij = the number of subjects for which R = i and C = j. • R = rows, C = columns
Example Both Systolic Blood pressure (C) and Serum Chlosterol (R) were meansured for a sample of n = 1237 subjects. The categories for Blood Pressure are: <126 127 -146 147 -166 167+ The categories for Chlosterol are: <200 200 -219 220 -259 260+
Table: two-way frequency Systolic Blood pressure Serum Cholesterol <127 127 -146 147 -166 167+ Total < 200 117 121 47 22 307 200 -219 85 98 43 20 246 220 -259 115 209 68 43 439 260+ 67 99 46 33 245 Total 388 527 204 118 1237
Example This comes from the drug use data. The two variables are: 1. Age (C) and 2. Antidepressant Use (R) measured for a sample of n = 33, 957 subjects.
Two-way Frequency Table Percentage antidepressant use vs Age
The c 2 statistic for measuring dependence amongst two categorical variables Define = Expected frequency in the (i, j) th cell in the case of independence.
Columns 1 2 3 4 5 Total 1 2 x 11 x 21 x 12 x 22 x 13 x 23 x 14 x 24 x 15 x 25 R 1 R 2 3 x 31 x 32 x 33 x 34 x 35 R 3 4 Total x 41 C 1 x 42 C 2 x 43 C 3 x 44 C 4 x 45 C 5 R 4 N
Columns 1 2 3 4 5 Total 1 2 E 11 E 21 E 12 E 22 E 13 E 23 E 14 E 24 E 15 E 25 R 1 R 2 3 E 31 E 32 E 33 E 34 E 35 R 3 4 Total E 41 C 1 E 42 C 2 E 43 C 3 E 44 C 4 E 45 C 5 R 4 n
Justification Proportion in column j for row i overall proportion in column j 1 2 3 4 5 Total 1 E 12 E 13 E 14 E 15 R 1 2 E 21 E 22 E 23 E 24 E 25 R 2 3 E 31 E 32 E 33 E 34 E 35 R 3 4 E 41 E 42 E 43 E 44 E 45 R 4 Total C 1 C 2 C 3 C 4 C 5 n
and Proportion in row i for column j overall proportion in row i 1 2 3 4 5 Total 1 E 12 E 13 E 14 E 15 R 1 2 E 21 E 22 E 23 E 24 E 25 R 2 3 E 31 E 32 E 33 E 34 E 35 R 3 4 E 41 E 42 E 43 E 44 E 45 R 4 Total C 1 C 2 C 3 C 4 C 5 n
Multivariate Categorical data The two-way frequency table
The two-way frequency table Columns 1 2 3 4 5 Total 1 2 x 11 x 21 x 12 x 22 x 13 x 23 x 14 x 24 x 15 x 25 R 1 R 2 3 x 31 x 32 x 33 x 34 x 35 R 3 4 Total x 41 C 1 x 42 C 2 x 43 C 3 x 44 C 4 x 45 C 5 R 4 N
An Example : Table: two-way frequency Systolic Blood pressure Serum Cholesterol <127 127 -146 147 -166 167+ Total < 200 117 121 47 22 307 200 -219 85 98 43 20 246 220 -259 115 209 68 43 439 260+ 67 99 46 33 245 Total 388 527 204 118 1237
Measuring Dependence: The c 2 statistic Eij= Expected frequency in the (i, j) th cell in the case of independence. xij= observed frequency in the (i, j) th cell
Expected frequencies Eij Columns 1 2 3 4 5 Total 1 2 E 11 E 21 E 12 E 22 E 13 E 23 E 14 E 24 E 15 E 25 R 1 R 2 3 E 31 E 32 E 33 E 34 E 35 R 3 4 Total E 41 C 1 E 42 C 2 E 43 C 3 E 44 C 4 E 45 C 5 R 4 N
Example: studying the relationship between Systolic Blood pressure and Serum Cholesterol In this example we are interested in whether Systolic Blood pressure and Serum Cholesterol are related or whether they are independent. Both were measured for a sample of n = 1237 cases
Observed frequencies Systolic Blood pressure Serum Cholesterol <127 127 -146 147 -166 167+ Total < 200 117 121 47 22 307 200 -219 85 98 43 20 246 220 -259 115 209 68 43 439 260+ 67 99 46 33 245 Total 388 527 204 118 1237
Expected frequencies Systolic Blood pressure Serum Cholesterol <127 127 -146 147 -166 167+ Total < 200 96. 29 130. 79 50. 63 29. 29 307 200 -219 77. 16 104. 8 40. 47 23. 47 246 220 -259 137. 70 187. 03 72. 40 41. 88 439 260+ 76. 85 104. 38 40. 04 23. 37 245 Total 388 527 204 118 1237 In the case of independence the distribution across a row is the same for each row The distribution down a column is the same for each column
Standardized residuals The c 2 statistic
Example This comes from the drug use data. The two variables are: 1. Role (C) and 2. Antidepressant Use (R) measured for a sample of n = 33, 957 subjects.
Two-way Frequency Table Percentage antidepressant use vs Role
Calculation of c 2 The Raw data Expected frequencies
The Residuals The calculation of c 2
Example • In this example n = 57407 individuals who had been victimized twice by crimes • Rows = crime of first vicitmization • Cols = crimes of second victimization
Next Topic: Brief introduction to Statistical Packages
- Simple linear regression and multiple linear regression
- Survival analysis vs logistic regression
- Logistic regression vs linear regression
- Linear regression vs multiple regression
- Logistische funktion ableitung
- Linear regression spss
- Normal equation logistic regression
- Which table shows no correlation
- Positive correlation versus negative correlation
- Difference between regression and correlation
- R squared to correlation coefficient
- Pearson correlation coefficient
- Difference between regression and correlation
- Prediction interval formula
- Difference between correlation and regression
- Multivariate vs bivariate
- Contoh soal regresi sederhana dan jawabannya
- Pearsons exam wizard
- Pearsons
- Pearsons
- Pearsons
- Pearson's comprehensive medical assisting
- Pearsons
- Pearson's r
- Regression vs correlation
- Correlation vs regression
- Multivariate analysis of variance and covariance
- Advanced and multivariate statistical methods
- Knn linear regression
- Hierarchical multiple regression spss
- Linear regression riddle b
- Scala linear regression
- Multiple linear regression model
- Materi regresi linear
- Cost function in machine learning
- Linear regression with multiple features
- Sum of squares
- Ap statistics linear regression
- Linear regression example
- Log linear regression model
- F-test formula
- Log linear regression model
- Classical linear regression model
- Sejarah regresi
- Linear regression loss function
- Classical normal linear regression model
- Null hypothesis for linear regression
- Multiple linear regression variance
- Linear regression loss function
- Minitab regression analysis
- Regression equation in excel
- Linear regression riddle a answer key
- In multiple linear regression model, the hat matrix (h) is
- Linear regression gradient descent
- Chapter 7 linear regression
- Linear regression lecture
- Linear regression model validation techniques