Examining the Relationship Between Two Variables Bivariate Analyses
Examining the Relationship Between Two Variables (Bivariate Analyses)
What type of analysis? • • • We have two variables X and Y and we are interested in describing how a response (Y) is related to an explanatory variable (X). What graphical displays do we use to show the relationship between X and Y ? What statistical analyses do we use to summarize, describe, and make inferences about the relationship?
Type of Displays Scatterplot Comparative Boxplot Logistic Plot 2 -D Mosaic Plot X is Continuous X is Ordinal or Nominal Y is Continuous Y is Ordinal or Nominal
Y Variable/Response Data Type Fit Y by X in JMP In the lower left corner of the Fit Y by X dialog box you will see this graphic which is the same as the more stylized version on the previous slide. X Variable/Predictor Data Type
Type of Displays Scatterplot Comparative Boxplot Logistic Plot 2 -D Mosaic Plot X is Continuous X is Ordinal or Nominal Y is Continuous Y is Ordinal or Nominal
Type of Analyses If X has k = 2 levels then Two-Sample t-Test or Wilcoxon Rank Sum Test. • If X has k > 2 levels then Oneway ANOVA or Kruskal Wallis Test Correlation and Regression Y is Continuous - Parametric or Nonparametric • If Y has 2 levels then use Logistic Regression Y is Ordinal or Nominal • • If Y has more than 2 levels then use Polytomous Logistic Regression If both X and Y have two levels then use Fisher’s Exact Test, RR/OR, and Risk Difference/AR • If either X or Y has more than two levels use a Chi-square Test. • Mc. Nemar’s Test • (dependent)
Y nominal/ordinal Y continuous Fit Y by X in JMP X continuous nominal/ordinal X
Example: Low Birthweight Study (Note: This is not NC one) List of Variables • • • id – ID # for infant & mother headcir – head circumference (in. ) leng – length of infant (in. ) weight – birthweight (lbs. ) gest – gestational age (weeks) mage – mother’s age mnocig – mother’s cigarettes/day mheight – mother’s height (in. ) mppwt – mother’s pre-pregnancy weight (lbs. ) Continuous Nominal • • • fage – father’s age fedyrs – father’s education (yrs. ) fnocig – father’s cigarettes/day fheight – father’s height lowbwt – low birth weight indicator (1 = yes, 0 = no) mage 35 – mother’s age over 35 ? (1 = yes, 0 = no) smoker – mother smoked during preg. (1 = yes, 0 = no) Smoker – mother’s smoking status (Smoker or Non-smoker) Low Birth Weight – birth weight (Low, Normal)
Example: Low Birthweight Study (Birthweight vs. Gestational Age) Y = birthweight (lbs. ) Continuous X = gestational age (weeks) Continuous
Regression and Correlation Analysis from Fit Y by X
Example: Low Birthweight Study (Birthweight vs. Mother’s Smoking Status) Y = birthweight (lbs. ) Continuous X = mother’s smoking status (Smoker vs. Nonsmoker) Nominal
Independent Samples t-Test from Fit Y by X
Example: Low Birthweight Study (Birthweight Status vs. Mother’s Cigs/Day) Y = birthweight status (Low, Normal) Nominal X = mother’s cigs. /day Continuous P(Low|Cigs/Day)
Logistic Regression from Fit Y by X
Example: Low Birthweight Study (Birthweight Status vs. Mother’s Smoking Status) Y = birthweight status (Low, Normal) Nominal X = mother’s smoking status (Smoker, Nonsmoker) Nominal
Independent Samples p 1 vs. p 2 Fisher’s Exact, Chi-square, Risk Difference, RR, & OR Skipped the arrows this time, everything should selfexplanatory. Notice the OR is upside-down and needs reciprocation. OR = 1/. 342 = 2. 92
Summary In summary have seen how bivariate relationships work in JMP and in statistics in general. We know that the type of analysis that is appropriate depends entirely on the data type of the response (Y) and the explanatory variable or predictor (X).
- Slides: 17