Chapter 19 Basic Quantitative Data Analysis Data Cleaning
Chapter 19 Basic Quantitative Data Analysis
Data Cleaning • Check for odd symbols, truncated or overlong times • Recheck scoring • Recheck coding categories • Compare one variable value with value in second variable • Look for outliers
Reasons for Missing Data • Participant skipped item or questionnaire, purposely or inadvertently • Participant withdrew, became ill, or died • Had to omit all or part of the data collection • Poor directions or poorly worded question • Data missed during data entry
Categorizing Missing Data • Missing completely at random (MCAR) • Missing at random (MAR) • Missing not at random (MNAR)
Replacing Missing Data • Complete case analysis is when you drop any participant from the analysis when they have missing data - If a lot of participants are missing data it may negatively impact the results
Replacing Missing Data • Principles in handling missing data are: - Some missing data cannot be replaced - Imputation uses existing information to estimate the missing values - The easiest approach is to replace missing data with the group’s mean (average) on the item
Replacing Missing Data • Principles in handling missing data are: - A more justifiable approach is to use the average of the individual participant’s scores or ratings on the remaining items of a multi-item scale - Missing values may be estimated from values at previous time points
Replacing Missing Data • Principles in handling missing data are: - Incomplete cases (participants) may be deleted and the analysis may be done on those who completed the study - A regression imputation may be done to estimate the values of the missing data
Replacing Missing Data • Principles in handling missing data are: - Expectation maximization uses a series of iterations to reach convergence - Multiple imputation contrasts and combines replacement values to find the best estimates
Visual Representations • Stem and leaf illustrates distribution of values • Box plots illustrate distribution of values • Bar and pie charts demonstrate differences between groups and subgroups • Plots can show relationships between interval level variables
Basic Descriptive Statistics • Normal distribution is represented by a symmetrical bell-shaped curve • Positive skew has more cases at low end of values • Negative skew has more cases at high end of values
Basic Descriptive Statistics • Mode is the value that occurs most often • Median is the middle score in the distribution • Mean is the average of all scores
Basic Descriptive Statistics • Range is the distance between the highest and lowest scores – The range or distance between these endpoints can be divided into various portions
Basic Descriptive Statistics • Variance is the average of the squared deviations from the mean • Standard deviation is the square root of the variance
Bivariate Association • Bivariate refers to relationships between a set of variables – Pearson product moment correlation coefficient represented as r is the most commonly used bivariate measure of association
Bivariate Association • A correlation matrix can be used to analyze multiple variables at one time to see the differences in the strengths of relationships between various pairs of variables • You may also calculate the coefficient of determination (R 2)
Additional Measures of Association • Spearman rank-order correlation is used for ordinal data • The chi-square statistic is used for nominal data
Additional Measures of Association • Fisher’s Exact Test is used if there are less than five cases per cell • The Mann-Whitney U Test is used instead of chi-square if the data are ordinal
- Slides: 18