Chapter 19 Basic Quantitative Data Analysis Data Cleaning

Chapter 19 Basic Quantitative Data Analysis

Data Cleaning • Check for odd symbols, truncated or overlong times • Recheck scoring • Recheck coding categories • Compare one variable value with value in second variable • Look for outliers

Reasons for Missing Data • Participant skipped item or questionnaire, purposely or inadvertently • Participant withdrew, became ill, or died • Had to omit all or part of the data collection • Poor directions or poorly worded question • Data missed during data entry

Categorizing Missing Data • Missing completely at random (MCAR) • Missing at random (MAR) • Missing not at random (MNAR)

Replacing Missing Data • Complete case analysis is when you drop any participant from the analysis when they have missing data - If a lot of participants are missing data it may negatively impact the results

Replacing Missing Data • Principles in handling missing data are: - Some missing data cannot be replaced - Imputation uses existing information to estimate the missing values - The easiest approach is to replace missing data with the group’s mean (average) on the item

Replacing Missing Data • Principles in handling missing data are: - A more justifiable approach is to use the average of the individual participant’s scores or ratings on the remaining items of a multi-item scale - Missing values may be estimated from values at previous time points

Replacing Missing Data • Principles in handling missing data are: - Incomplete cases (participants) may be deleted and the analysis may be done on those who completed the study - A regression imputation may be done to estimate the values of the missing data

Replacing Missing Data • Principles in handling missing data are: - Expectation maximization uses a series of iterations to reach convergence - Multiple imputation contrasts and combines replacement values to find the best estimates

Visual Representations • Stem and leaf illustrates distribution of values • Box plots illustrate distribution of values • Bar and pie charts demonstrate differences between groups and subgroups • Plots can show relationships between interval level variables

Basic Descriptive Statistics • Normal distribution is represented by a symmetrical bell-shaped curve • Positive skew has more cases at low end of values • Negative skew has more cases at high end of values

Basic Descriptive Statistics • Mode is the value that occurs most often • Median is the middle score in the distribution • Mean is the average of all scores

Basic Descriptive Statistics • Range is the distance between the highest and lowest scores – The range or distance between these endpoints can be divided into various portions

Basic Descriptive Statistics • Variance is the average of the squared deviations from the mean • Standard deviation is the square root of the variance

Bivariate Association • Bivariate refers to relationships between a set of variables – Pearson product moment correlation coefficient represented as r is the most commonly used bivariate measure of association

Bivariate Association • A correlation matrix can be used to analyze multiple variables at one time to see the differences in the strengths of relationships between various pairs of variables • You may also calculate the coefficient of determination (R 2)

Additional Measures of Association • Spearman rank-order correlation is used for ordinal data • The chi-square statistic is used for nominal data

Additional Measures of Association • Fisher’s Exact Test is used if there are less than five cases per cell • The Mann-Whitney U Test is used instead of chi-square if the data are ordinal