Soc 3306 a Lecture 5 Data Analysis Overview

  • Slides: 14
Download presentation
Soc 3306 a Lecture 5: Data Analysis Overview of Useful Statistics for Your Analysis

Soc 3306 a Lecture 5: Data Analysis Overview of Useful Statistics for Your Analysis

The difference between Parametric and Non-Parametric statistics n Parametric Statistics: ¨ -appropriate for interval/ratio

The difference between Parametric and Non-Parametric statistics n Parametric Statistics: ¨ -appropriate for interval/ratio data ¨ -generalizable to a population ¨ -assumes normal distributions ¨ Note: n Assumes random sampling Non-Parametric Statistics: ¨ -used with nominal/ordinal data ¨ -not generalizable to a population ¨ -does not assume normal distributions

Contingency (Cross-Tabs) Analysis and Chi-Square or Gamma/Tau-b - non-parametric (non-normal distributions) statistics n Assumptions

Contingency (Cross-Tabs) Analysis and Chi-Square or Gamma/Tau-b - non-parametric (non-normal distributions) statistics n Assumptions n ¨ Nominal or ordinal (categorical) data ¨ Any type of distribution n The hypothesis test: The null hypothesis: there is no association between variables

Contingency (cont. ) Conducting the Analysis: n a. calculate percentages within the categories of

Contingency (cont. ) Conducting the Analysis: n a. calculate percentages within the categories of the IV and compare across the categories of the DV. Are there differences in the outcomes? n b. for nominal ¨ Chi-square statistic: is the relationship (the above differences) real? ¨ Phi, Cramer's V, etc. : how strong is the relationship? n c. for ordinal ¨ t-test for gamma, tau-b: is the relationship (the above differences) real? ¨ Gamma, tau-b: how strong and what direction?

T-Tests (parametric) for Means and Proportions n The t-test is used to determine whether

T-Tests (parametric) for Means and Proportions n The t-test is used to determine whether sample(s) have different means. Essentially, the t-test is the ratio between the sample mean difference and the standard error of that difference. The t-test makes some important assumptions: ¨ Interval/Ratio level data ¨ Random sampling ¨ normal distributions ¨ equal variances (relatively). Use the Levene’s test to determine whether variances are equal.

T-tests (cont. ) n a. The one sample t-test: ¨ n tests a sample

T-tests (cont. ) n a. The one sample t-test: ¨ n tests a sample mean against a known population mean The null hypothesis tests if the sample mean is equal to the population mean. b. The independent samples t-test: tests whether the mean of one sample is different from the mean of another sample. The null hypothesis tests if the mean of sample 1 is equal to the mean of sample 2. ¨ Note: With independent t-tests, you must pay attention to the standard error of the sample(s). There are two ways to estimate the standard error. The Levene test is used in SPSS to do this: ¨ Equal variances: the two samples are relatively equal in size & distribution ¨ Non-equal variances: the two samples are not equal in variance (large discrepancy) ¨ n c. The paired group t-test (dependent or related samples) tests if two groups within the overall sample are different on the same dependent variable The null hypothesis tests if the mean of (var 1 - var 2) is equal to 0. Usually var 1 is measured at Time 1, var 2 at Time 2. ¨ Overall, you will be looking for the t-value and its corresponding p-value. Depending on your alpha level, you will accept/reject the null hypothesis based on these numbers. ¨

ANOVA (parametric) n n Analysis of Variance, or ANOVA, is testing the difference in

ANOVA (parametric) n n Analysis of Variance, or ANOVA, is testing the difference in the means among 3 or more different samples. One-way ANOVA Assumptions: One independent variable -- categorical with two+ levels ¨ Dependent variable -- interval or ratio ¨ n Two-way ANOVA Assumptions Two or more independent variables -- categorical with two+ levels ¨ Dependent variable -- interval or ratio level ¨ Analysis will include main effects and an interaction term. ¨ n ANOVA is testing the ratio (F) of the mean squares between groups and within groups. Depending on the degrees of freedom, the F score will show if there is a difference in the means among all of the groups.

ANOVA (cont. ) n One-way ANOVA will provide you with an F-ratio and its

ANOVA (cont. ) n One-way ANOVA will provide you with an F-ratio and its corresponding p-value. If there is a large enough difference between the between groups mean squares and the within groups mean squares, then the null hypothesis will be rejected, indicating that there is a difference in the mean scores among the groups. However, the F-ratio does not tell you where those differences are. You can do ad-hoc comparisons such as the Tukey-b, Bonferroni or Scheffe test to do this. n Two-way ANOVA will provide you with an F-ratio and its corresponding p-value as well as F-ratios and p-values for each main effect and interaction term. When the interaction is significant, the interaction means (in SPSS, ask for this under options in GLM) should also be interpreted.

Correlation (parametric) n n n Used to test the presence, strength and direction of

Correlation (parametric) n n n Used to test the presence, strength and direction of a linear relationship among variables. Correlation is a numerical expression that signifies the relationship between two variables. Correlation allows you to explore this relationship by 'measuring the association' between the variables. Correlation is a 'measure of association' because the correlation coefficient provides the degree of the relationship between the variables. Correlation does not infer causality! Typically, you need at least interval/ratio level data. However, you can run correlation with ordinal level data with 5+ categories.

Correlation (cont. ) n n n a. The relationships: Essentially, there are four types

Correlation (cont. ) n n n a. The relationships: Essentially, there are four types of relationships: (1) positive, (2) negative, (3) curvilinear, and (4) no relationship. b. The hypotheses and tests: The correlation statistic (Pearson's r) tests the null hypothesis that there is no relationship between the variables (r = 0) c. The Correlation Coefficient : Pearson's r, the correlation coefficient, is the numeric value of the relationship between variables. The correlation coefficient is a percentage and can vary between -1 and +1. If no relationship exists, then the correlation coefficient would equal 0. Pearson's r provides an (1) estimate of the strength of the relationship and (2) an estimate of the direction of the relationship. ¨ If the correlation coefficient lies between -1 and 0, it is a negative (inverse) relationship, 0 and +1, it is a positive relationship and is 0, there is no relationship The closer the coefficient lies to -1 or +1, the stronger the relationship.

Correlation (cont. ) n n d. Coefficient of determination: Related to the correlation coefficient

Correlation (cont. ) n n d. Coefficient of determination: Related to the correlation coefficient is the coefficient of determination. This statistic provides the percentage of the variance accounted for both variables (x & y). To calculate the determination coefficient, you square the r value. In other words, if you had an r of 90, your coefficient of determination would account for 81 percent of the variance explained between the variables. e. Partial Correlation: When you need to 'control' for the effect of variables, you can use partial correlation.

Simple (Bivariate) and Multiple (Multivariate) Regression n Regression is used to model, calculate, and

Simple (Bivariate) and Multiple (Multivariate) Regression n Regression is used to model, calculate, and predict the pattern of a linear relationship among two or more variables. There are two main types of regression -- simple & multiple a. Assumptions ¨ Note: Variables should be approximately normally distributed. If not, recode and use non-parametric measures. ¨ Dependent Variable: at least interval (can use ordinal if using summated scale) ¨ Independent Variable: should be interval. Independent variables should be independent of each other, not related in any way. You can use nominal if it is binary or 'dummy' variable (0, 1)

Regression (cont. ) n b. Tests ¨ Overall: The null tests that the regression

Regression (cont. ) n b. Tests ¨ Overall: The null tests that the regression (estimated) line is no better predicting the dependent variable than the mean of the DV. ¨ Coefficients (slope "b"): That the estimated coefficient b equals 0 (no slope to the line) n c. Statistics to determine significance ¨ Overall: R-squared, F-test ¨ Coefficient: t tests n d. Limitations ¨ Only addresses linear patterns ¨ Multicollinearity may be a problem

Useful Sources n Agresti and Finley Statistical Methods for the Social Sciences n Tabachnick

Useful Sources n Agresti and Finley Statistical Methods for the Social Sciences n Tabachnick and Fidell (2001) Using Multivariate Statistics n For writing up the above statistics, “From Numbers to Words”