Chapter 15 Differences Between Groups and Relationships Among
Chapter 15 Differences Between Groups and Relationships Among Variables © 2007 Thomson/South-Western. All rights reserved.
LEARNING OUTCOMES After studying this chapter, you should be able to 1. Understand what multivariate statistical analysis involves and know the two types of multivariate analysis 2. Interpret results from multiple regression analysis. 3. Interpret results from multivariate analysis of variance (MANOVA) 4. Interpret basic exploratory factor analysis results © 2007 Thomson/South-Western. All rights reserved. 2
What Is the Appropriate Test of Difference? • Test of Differences Ø An investigation of a hypothesis that two (or more) groups differ with respect to measures on a variable. v Behavior, characteristics, beliefs, opinions, emotions, or attitudes • Bivariate Tests of Differences Ø Involve only two variables: a variable that acts like a dependent variable and a variable that acts as a classification variable. v Differences in mean scores between groups or in comparing how two groups’ scores are distributed across possible response categories. © 2007 Thomson/South-Western. All rights reserved. 3
EXHIBIT 15. 1 Choosing the Right Statistic © 2007 Thomson/South-Western. All rights reserved. 4
EXHIBIT 15. 1 Choosing the Right Statistic (cont’d) © 2007 Thomson/South-Western. All rights reserved. 5
Common Bivariate Tests Type of Measurement Differences between two independent groups Differences among three or more independent groups Interval and ratio Independent groups: t-test or Z-test One-way ANOVA Ordinal Mann-Whitney U-test Wilcoxon test Kruskal-Wallis test Nominal Z-test (two proportions) Chi-square test © 2007 Thomson/South-Western. All rights reserved. 6
Cross-Tabulation Tables: The χ2 Test for Goodness-of-Fit • Cross-Tabulation (Contingency) Table Ø A joint frequency distribution of observations on two more variables. • χ2 Distribution Ø Provides a means for testing the statistical significance of a contingency table. Ø Involves comparing observed frequencies (Oi) with expected frequencies (Ei) in each cell of the table. Ø Captures the goodness- (or closeness-) of-fit of the observed distribution with the expected distribution. © 2007 Thomson/South-Western. All rights reserved. 7
Example: Papa John’s Restaurants Univariate Hypothesis: Papa John’s restaurants are more likely to be located in a stand-alone location or in a shopping center. Bivariate Hypothesis: Stand-alone locations are more likely to be profitable than are shopping center locations. © 2007 Thomson/South-Western. All rights reserved. 8
Chi-Square Test χ² = chi-square statistic Oi = observed frequency in the ith cell Ei = expected frequency on the ith cell Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size © 2007 Thomson/South-Western. All rights reserved. 9
Degrees of Freedom (d. f. ) (R-1)(C-1)=(2 -1)=1 d. f. =(R-1)(C-1) © 2007 Thomson/South-Western. All rights reserved. 10
The t-Test for Comparing Two Means • Independent Samples t-Test Ø A test for hypotheses stating that the mean scores for some interval- or ratio-scaled variable grouped based on some less than interval classificatory variable. © 2007 Thomson/South-Western. All rights reserved. 11
The t-Test for Comparing Two Means (cont’d) • Determining when an independent samples t-test is appropriate: Ø Is the dependent variable interval or ratio? Ø Can the dependent variable scores be grouped based upon some categorical variable? Ø Does the grouping result in scores drawn from independent samples? Ø Are two groups involved in the research question? © 2007 Thomson/South-Western. All rights reserved. 12
The t-Test for Comparing Two Means (cont’d) • Pooled Estimate of the Standard Error Ø An estimate of the standard error for a t-test of independent means that assumes the variances of both groups are equal. © 2007 Thomson/South-Western. All rights reserved. 13
The t-Test for Comparing Two Means (cont’d) © 2007 Thomson/South-Western. All rights reserved. 14
EXHIBIT 15. 2 Independent Samples t-Test Results © 2007 Thomson/South-Western. All rights reserved. 15
What Is ANOVA? • Analysis of Variance (ANOVA) Ø An analysis involving the investigation of the effects of one treatment variable on an interval-scaled dependent variable Ø A hypothesis-testing technique to determine whether statistically significant differences in means occur between two or more groups. Ø A method of comparing variances to make inferences about the means. Ø ANOVA tests whether “grouping” observations explains variance in the dependent variable. © 2007 Thomson/South-Western. All rights reserved. 16
Simple Illustration of ANOVA • How much coffee respondents report drinking each day based on which shift they work (GY stands for Graveyard shift). Day Day Day GY GY Night Night 1 3 4 0 2 7 2 1 6 6 8 3 7 6 © 2007 Thomson/South-Western. All rights reserved. 17
EXHIBIT 15. 3 Illustration of ANOVA Logic © 2007 Thomson/South-Western. All rights reserved. 18
Partitioning Variance in ANOVA • Total Variability Ø Grand mean v The mean of a variable over all observations. Ø SST v The total observed variation across all groups and individual observations v SST = Total of (observed value-grand mean)2 © 2007 Thomson/South-Western. All rights reserved. 19
Partitioning Variance in ANOVA • Between-groups Variance Ø The sum of differences between the group mean and the grand mean summed over all groups for a given set of observations. Ø SSB v Systematic variation of scores between groups due to manipulation of an experimental variable or group classifications of a measured independent variable or between-group variance. v SSB = Total of ngroup(Group Mean − Grand Mean)2 © 2007 Thomson/South-Western. All rights reserved. 20
Partitioning Variance in ANOVA • Within-group Error or Variance Ø The sum of the differences between observed values and the group mean for a given set of observations; also known as total error variance. Ø SSE v Variation of scores due to random error or within-group variance due to individual differences from the group mean. v This is the error of prediction. v SSE = Total of (Observed Mean − Group Mean)2 © 2007 Thomson/South-Western. All rights reserved. 21
The F-Test • F-Test Ø Is used to determine whethere is more variability in the scores of one sample than in the scores of another sample. Ø Variance components are used to compute f-ratios v SSE, SSB, SST © 2007 Thomson/South-Western. All rights reserved. 22
EXHIBIT 15. 4 Interpreting ANOVA © 2007 Thomson/South-Western. All rights reserved. 23
Correlation Coefficient Analysis • Correlation coefficient Ø A statistical measure of the covariation, or association, between two at-least interval variables. • Covariance Ø Extent to which two variables are associated systematically with each other. © 2007 Thomson/South-Western. All rights reserved. 24
Simple Correlation Coefficient • Correlation coefficient (r) Ø Ranges from +1 to -1 v Perfect positive linear relationship = +1 v Perfect negative (inverse) linear relationship = -1 v No correlation = 0 • Correlation coefficient for two variables (X, Y) © 2007 Thomson/South-Western. All rights reserved. 25
Correlation, Covariance, and Causation • When two variables covary, they display concomitant variation. • This systematic covariation does not in and of itself establish causality. • Rooster’s crow and the rising of the sun Ø Rooster does not cause the sun to rise. © 2007 Thomson/South-Western. All rights reserved. 26
Coefficient of Determination • Coefficient of Determination (R 2) Ø A measure obtained by squaring the correlation coefficient; the proportion of the total variance of a variable accounted for by another value of another variable. Ø Measures that part of the total variance of Y that is accounted for by knowing the value of X. © 2007 Thomson/South-Western. All rights reserved. 27
Regression Analysis • Simple (Bivariate) Linear Regression Ø A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable. • The Regression Equation (Y = α + βX ) Ø Y = the continuous dependent variable Ø X = the independent variable Ø α = the Y intercept (regression line intercepts Y axis) Ø β = the slope of the coefficient (rise over run) © 2007 Thomson/South-Western. All rights reserved. 28
The Regression Equation • Parameter Estimate Choices Ø β is indicative of the strength and direction of the relationship between the independent and dependent variable. Ø α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X) • Standardized Regression Coefficient (β) Ø Estimated coefficient of the strength of relationship between the independent and dependent variables. Ø Expressed on a standardized scale where higher absolute values indicate stronger relationships (range is from -1 to 1). © 2007 Thomson/South-Western. All rights reserved. 29
EXHIBIT 15. 5 The Advantage of Standardized Regression Weights © 2007 Thomson/South-Western. All rights reserved. 30
The Regression Equation (cont’d) • Parameter Estimate Choices (cont’d) Ø Raw regression estimates (b 1) v Raw regression weights have the advantage of retaining the scale metric—which is also their key disadvantage. v If the purpose of the regression analysis is forecasting, then raw parameter estimates must be used. v This is another way of saying when the researcher is interested only in prediction. Ø Standardized regression estimates (β 1) v Standardized regression estimates have the advantage of a constant scale. v Standardized regression estimates should be used when the researcher is testing explanatory hypotheses. © 2007 Thomson/South-Western. All rights reserved. 31
Multiple Regression Analysis • Multiple Regression Analysis Ø An analysis of association in which the effects of two or more independent variables on a single, intervalscaled dependent variable are investigated simultaneously. • Dummy variable Ø The way a dichotomous (two group) independent variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other. © 2007 Thomson/South-Western. All rights reserved. 32
Multiple Regression Analysis (cont’d) • A Simple Example Ø Assume that a toy manufacturer wishes to explain store sales (dependent variable) using a sample of stores from Canada and Europe. Ø Several hypotheses are offered: v H 1: Competitor’s sales are related negatively to sales. v H 2: Sales are higher in communities with a sales office than when no sales office is present. v H 3: Grammar school enrollment in a community is related positively to sales. © 2007 Thomson/South-Western. All rights reserved. 33
Multiple Regression Analysis (cont’d) • Statistical Results of the Multiple Regression Ø Regression Equation: Ø Coefficient of multiple determination (R 2): 0. 845 Ø F-value= 14. 6; p<. 05 © 2007 Thomson/South-Western. All rights reserved. 34
Multiple Regression Analysis (cont’d) • Regression Coefficients in Multiple Regression Ø Partial correlation v The correlation between two variables after taking into account the fact that they are correlated with other variables too. • R 2 in Multiple Regression Ø The coefficient of multiple determination in multiple regression indicates the percentage of variation in Y explained by all independent variables. © 2007 Thomson/South-Western. All rights reserved. 35
Multiple Regression Analysis (cont’d) • Coefficients of Partial Regression Ø bn v Independent variables correlated with one another v The percentage of variance in the dependent variable that is explained by a single independent variable, holding other independent variables constant Ø R 2 v The percentage of variance in the dependent variable that is explained by the variation in the independent variables. © 2007 Thomson/South-Western. All rights reserved. 36
Multiple Regression Analysis (cont’d) • Statistical Significance in Multiple Regression Ø F-test v Tests statistical significance by comparing the variation explained by the regression equation to the residual error variation. v Allows for testing of the relative magnitudes of the sum of squares due to the regression (SSR) and the error sum of squares (SSE). © 2007 Thomson/South-Western. All rights reserved. 37
Multiple Regression Analysis (cont’d) • Degrees of Freedom (d. f. ) Ø k = number of independent variables Ø n = number of observations or respondents • Calculating Degrees of Freedom (d. f. ) Ø d. f. for the numerator = k Ø d. f. for the denominator = n - k - 1 © 2007 Thomson/South-Western. All rights reserved. 38
F-test © 2007 Thomson/South-Western. All rights reserved. 39
EXHIBIT 15. 4 Interpreting Multiple Regression Results © 2007 Thomson/South-Western. All rights reserved. 40
Steps in Interpreting a Multiple Regression Model 1. Examine the model F-test. 2. Examine the individual statistical tests for each parameter estimate. 3. Examine the model R 2. 4. Examine collinearity diagnostics. © 2007 Thomson/South-Western. All rights reserved. 41
Other Multivariate Techniques • Multivariate Data Analysis Ø A group of statistical techniques allowing for the simultaneous analysis of three or more variables. • Multivariate Techniques Ø Exploratory factor analysis Ø Confirmatory factor analysis Ø Multivariate analysis of variance (MANOVA) Ø Multiple discriminant analysis Ø Cluster analysis. © 2007 Thomson/South-Western. All rights reserved. 42
Key Terms and Concepts • Cross-tabulation (contingency table) • Univariate analysis • Bivariate analysis • Analysis of variance (ANOVA) • Grand mean • Coefficient of determination (r 2) • Simple linear regression • Standardized regression coefficient (β) • Multiple regression analysis • Multivariate data analysis • Between-groups variance • Within-group error or variance • F-test • Within-group variation • Between-group variance • Total variability (SST) • Correlation coefficient © 2007 Thomson/South-Western. All rights reserved. 43
- Slides: 43