Introduction to SAS Essentials Mastering SAS for Data

  • Slides: 32
Download presentation
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward 1 SAS ESSENTIALS -- Elliott & Woodward

Chapter 13: ANALYSIS OF VARIANCE 2 SAS ESSENTIALS -- Elliott & Woodward

Chapter 13: ANALYSIS OF VARIANCE 2 SAS ESSENTIALS -- Elliott & Woodward

LEARNING OBJECTIVES • To be able to compare three or more means using oneway

LEARNING OBJECTIVES • To be able to compare three or more means using oneway ANOVA with multiple comparisons • To be able to perform a repeated measures (dependent samples) analysis of variance with multiple comparisons • To be able to graph mean comparisons 3 SAS ESSENTIALS -- Elliott & Woodward

PROC ANOVA and PROC GLM � This chapter illustrates how to perform an analysis

PROC ANOVA and PROC GLM � This chapter illustrates how to perform an analysis of variance (ANOVA) for several common designs. The book covers three SAS procedures: PROC ANOVA, PROC GLM, and PROC MIXED. (PROC MIXED is covered in Chapter 14. ) � In this chapter, we describe: � PROC ANOVA: a basic procedure useful for one-way ANOVA or for multiway factorial designs with fixed factors and an equal number of observations per cell. � PROC GLM: for one-way repeated measures analysis, and techniques not supported by PROC ANOVA. 4 SAS ESSENTIALS -- Elliott & Woodward

13. 1 COMPARING THREE OR MORE MEANS USING ONE-WAY ANALYSIS OF VARIANCE � A

13. 1 COMPARING THREE OR MORE MEANS USING ONE-WAY ANALYSIS OF VARIANCE � A one-way ANOVA is an extension of the independent group t-test where there are more than two groups. Assumptions for this test are similar to those for the ttest: � Data within groups are normally distributed with equal variances across groups. � Groups are from independent samples. �The hypotheses for the comparison of independent groups are as follows (k is the number of groups): H 0: m 1 = m 2 = … = mk: Means of all the groups are equal. Ha: mi mj for some i j: At least two means are not equal. 5 SAS ESSENTIALS -- Elliott & Woodward

Simplified Syntax for PROC ANOVA � The syntax for the statement is as follows:

Simplified Syntax for PROC ANOVA � The syntax for the statement is as follows: CLASS defines grouping variable. PROC ANOVA <Options>; The MODEL statement defines the model tested. CLASS variable; MODEL dependentvar = independentvars; MEANS independentvars / typecomparison <meansoptions>; The MEANS statement defines post hoc multiple comparisons. 6 SAS ESSENTIALS -- Elliott & Woodward

Table 13. 1 Common Options for PROC ANOVA and PROC GLM for preforming a

Table 13. 1 Common Options for PROC ANOVA and PROC GLM for preforming a One-Way ANOVA or simple Repeated Measures Option DATA = dataname NOPRINT OUTSTAT=dataname PLOTS=options ORDER=option ALPHA=p 7 Explanation Specifies which data set to use. Suppresses output. This is used when you want to extract information from ANOVA results but don’t want SAS to produce output in the Results Viewer. Names an output data set that saves a number of the results from the ANOVA calculation. Specify PLOTS=NONE to suppress plots that are generated by default. Specifies order in which to display the CLASS variable (similar to what was covered in Chapter 10: Analyzing Counts and Tables. ) Options are DATA, FORMATTED, FREQ, or INTERNAL. Specifies alpha level for a Confidence Interval (GLM only) SAS ESSENTIALS -- Elliott & Woodward

Common Statements for PROC ANOVA and PROC GLM (For one-way analyses) (Table 13. 1

Common Statements for PROC ANOVA and PROC GLM (For one-way analyses) (Table 13. 1 Continued) CLASS variable list; This statement is required and specifies the grouping variable(s) for the analysis. MODEL specification Specifies the dependent and independent variables for the analysis. More specifically, it takes the form MODEL dependentvariable=independentvariable(s); FREQ var MEANS vars LSMEANS vars Specifies that a variable represents the count of values for an observation. Similar to the WEIGHT statement for PROC FREQ. Calculates means for dependent variables and may include comparisons. Calculates least square means for a dependent variable & to request comparisons. (GLM Only) REPEATED vars Used to specify repeated measure variables. TEST specificaion Used to specify a hypothesis test value. CONTRAST specification Allows you to create customized posthoc comparisons. (GLM Only) BY, FORMAT, LABEL, These statements are common to most procedures, WHERE and may be used here. 8 SAS ESSENTIALS -- Elliott & Woodward

Using the MEANS or LSMEANS Statement � When you perform a one-way ANOVA, typically

Using the MEANS or LSMEANS Statement � When you perform a one-way ANOVA, typically there is a two- step procedure: (1) test the null hypothesis to determine whether any significant differences exist, and (2) if H 0 is rejected, run subsequent multiple comparison tests to determine which differences are significantly different. � Pairwise comparison of means can be performed using one of several multiple comparison tests specified using the MEANS statement, which has the following format (where independantvar is a CLASS variable): MEANS in dependentvar/typecomparison <meansoptions>; � For PROC GLM, use the LSMEANS statement: LSMEANS in dependentvar / typecomparison <meansoptions>; 9 SAS ESSENTIALS -- Elliott & Woodward

Table 13. 2 Common typecomparison options for the PROC ANOVA or GLM MEANS Statement

Table 13. 2 Common typecomparison options for the PROC ANOVA or GLM MEANS Statement (Options following the slash /) Option Explanation BON Bonferroni t-tests of difference DUNCAN Duncan’s multiple range test SCHEFFE Scheffe multiple comparison SNK Student Newman Keuls multiple range test LSD Fisher’s Least Significant Difference TUKEY Tukey’s studentized range test DUNNETT (‘x’) Dunnett’s test—compare to a single control, where 'x' is the category value of the control group ALPHA=pvalue Specifies the significance level for comparisons (default: 0. 05) CLDIFF Requests that confidence limits be included in the output. 10 SAS ESSENTIALS -- Elliott & Woodward

Common typecomparison options for the PROC GLM LSMEANS Statement (Options following the slash /)

Common typecomparison options for the PROC GLM LSMEANS Statement (Options following the slash /) (Table 13. 2 continued) ADJUST=option Specify type of multiple comparison. Examples are BON, DUNCAN, SCHFEE, SNK, LSD, DUNNETT PDIFF= Calculates p-values base (default is T). You can also specify TUKEY or DUNNETT options. • Do Hands on Example p 315 (AANOVA 1. SAS) 11 SAS ESSENTIALS -- Elliott & Woodward

SAS Code for a One-Way ANOVA (From AANOVA 1. SAS) PROC ANOVA DATA=ACHE; CLASS

SAS Code for a One-Way ANOVA (From AANOVA 1. SAS) PROC ANOVA DATA=ACHE; CLASS BRAND; MODEL RELIEF=BRAND; MEANS BRAND/TUKEY; TITLE 'ANOVA EXAMPLE'; RUN; QUIT; 12 CLASS defines the grouping variable, BRAND. The MODEL statement indicates you are wanting to test if BRAND can predict mean RELIEF. The MEANS statement is used for a post hoc test (if Ho is rejected) to determine which means are different SAS ESSENTIALS -- Elliott & Woodward

Results of a One-Way ANOVA � The primary results for a One-Way ANOVA test

Results of a One-Way ANOVA � The primary results for a One-Way ANOVA test are in the following table: The p-value is used to decide whether or not to reject the null hypothesis. Typically, if p<0. 05, you reject Ho. If you reject Ho, it indicates that some means (by group) are different, so you proceed to look at the post hoc results. 13 SAS ESSENTIALS -- Elliott & Woodward

Post Hoc Multiple Comparisons test – Tukey Test This test summarizes which means are

Post Hoc Multiple Comparisons test – Tukey Test This test summarizes which means are found different at the alpha=0. 05 significance level. In this table, means that are considered NOT DIFFERENT (at alpha=0. 05) are grouped (see the Tukey Grouping Column). Thus, means 3 and one are grouped into group B – and the means (26. 54 and 26. 28) are considered NOT DIFFERENT. BRAND 2 is grouped alone (GROUP A) , thus the mean for BRAND 2 (30. 880) is considered LARGER than either 26. 54 or 26. 28 (at the alpha=0. 05 level). 14 SAS ESSENTIALS -- Elliott & Woodward

Graphical Comparison of Groups This graph reinforces the statistical results -- that groups 1

Graphical Comparison of Groups This graph reinforces the statistical results -- that groups 1 and 3 are very similar, but the mean for group 2 is larger than for either groups 1 or 2. 15 SAS ESSENTIALS -- Elliott & Woodward

Multiple Comparison Test Using Confidence Limits � Using this code for the comparison test:

Multiple Comparison Test Using Confidence Limits � Using this code for the comparison test: MEANS BRAND/TUKEY CLDIFF; � Results in this table 16 In this table, mean differences are compared. For example, the first line tests the difference between means for groups 2 minus 3 = 4. 340 and reports a 95% CL of 0. 691 to 7, 989. Since this range does not include 0. 0, the difference is considered statistical different at the 0. 05 significance level. The *** indicates a 0. 05 significant difference for that comparison SAS ESSENTIALS -- Elliott & Woodward

Multiple Comparisons using p-values � Using PROC GLM instead of PROC ANOVA, and using

Multiple Comparisons using p-values � Using PROC GLM instead of PROC ANOVA, and using this code for the comparison test: LSMEANS BRAND/ PDIFF; � Results in this table: This table reports the results of mean comparisons. For example, the comparison of mean 1 vs 3 reports a p -value of 0. 8524, indicating that the difference in means is NOT statistically different. The comparison of means 2 vs 3 is statistically different at p=0. 0080. 17 SAS ESSENTIALS -- Elliott & Woodward

13. 2 COMPARING THREE OR MORE REPEATED MEASURES � Repeated measures are observations taken

13. 2 COMPARING THREE OR MORE REPEATED MEASURES � Repeated measures are observations taken from the same or related subjects over time or in differing circumstances. � When there are three or more repeated measures, the corresponding analysis is a repeated measures ANOVA. � The hypotheses being tested with repeated measures ANOVA are as follows: H 0: There is no difference among the group means (repeated measures). Ha : There is a difference among the group means. 18 SAS ESSENTIALS -- Elliott & Woodward

Example Syntax for a Repeated Measures ANOVA The CLASS statement indicates grouping variables. In

Example Syntax for a Repeated Measures ANOVA The CLASS statement indicates grouping variables. In repeated measures, a subject variable is included. PROC GLM DATA=STUDY; CLASS SUBJ DRUG; MODEL RESULT = SUBJ DRUG; MEANS DRUG/DUNCAN; TITLE 'Repeated Measures ANOVA'; RUN; The MODEL statement indicates that you want to QUIT; predict RESULT from type of DRUG. Subject is included to account for subject differences 19 SAS ESSENTIALS -- Elliott & Woodward

Example Repeated Measures Data Each Subject received each of the 4 drugs (in random

Example Repeated Measures Data Each Subject received each of the 4 drugs (in random order, with a washout period between administrations. ) Subj Drug 1 Drug 2 Drug 3 Drug 4 1 31 29 17 35 2 15 17 11 23 3 25 21 19 31 4 35 35 21 45 5 27 27 15 31 20 SAS ESSENTIALS -- Elliott & Woodward

Repeated Measures Data in SAS � The data for the repeated measures in not

Repeated Measures Data in SAS � The data for the repeated measures in not like in the talbe. Each line represents an observation, and each subject has 4 lines representing the 4 drugs. DATA STUDY; INPUT SUBJ DRUG RESULT; DATALINES; Notice how data is set up for 1 1 31 repeated measures – each subject has 4 records – one 1 2 29 for each drug observation. 1 3 17 1 4 35 2 1 15 Etc… � Do the Hands on Example p 320 (AGLM 1. SAS) 21 SAS ESSENTIALS -- Elliott & Woodward

Results from Repeated Measures ANOVA � The results of interest are in the Type

Results from Repeated Measures ANOVA � The results of interest are in the Type III table: Typically, you are not interested in the SUBJ line in this table (or p-value). The line of interest is the DRUG line, which tests the hypothesis of interest. In this case p<0. 0001, which indicates a significant difference in means for the 4 Drugs. Do a post hoc test to determine which drugs are different. 22 SAS ESSENTIALS -- Elliott & Woodward

Multiple Comparisons for Repeated Measures ANOVA � This statement provides a multiple comparison test,

Multiple Comparisons for Repeated Measures ANOVA � This statement provides a multiple comparison test, which is appropriate if the main hypothesis is significant: MEANS DRUG/DUNCAN; 23 Results indicate that there is NO DIFFERNCE in DRUGS 1 and 2 (Means of 26. 6 vs 25. 8). However, DRUG 4 has the largest (statistically significant) mean at 33. 0 and DRUG 3 has the smallest at 16. 60. SAS ESSENTIALS -- Elliott & Woodward

Graphical Results of a Repeated Measures ANOVA This is visual confirmation of the multiple

Graphical Results of a Repeated Measures ANOVA This is visual confirmation of the multiple comparisons – the line for DRUG 4 is consistently higher than all the others. DRUGS 1 and 2 are too close to call different , and DRIG 3 has the smallest means. 24 SAS ESSENTIALS -- Elliott & Woodward

Using LSMEANS for Comparisons (Tukey) � Using this code: LSMEANS DRUG/PDIFF ADJUST=TUKEY; � You

Using LSMEANS for Comparisons (Tukey) � Using this code: LSMEANS DRUG/PDIFF ADJUST=TUKEY; � You get the following results: Results indicate that there is NO DIFFERNCE in DRUGS 1 and 2 (p=. 97). However, the mean for DRUG 4 is different than for DRUG 1 (p=0. 0147) and so on… � Other common ADJUST= options are BON, DUNNETT, and SCHEFFE. 25 SAS ESSENTIALS -- Elliott & Woodward

13. 3 GOING DEEPER: CONTRASTS � At times when you are comparing means across

13. 3 GOING DEEPER: CONTRASTS � At times when you are comparing means across groups in a one-way ANOVA, you may be interested in specific posthoc comparisons. � For example, suppose you have a data set consisting of four groups. For some hypothesized reason, you wonder if the average of means 1 and 2 is different from mean 4. Using a CONTRAST statement, you can specify such a comparison. � A CONTRAST statement uses the following syntax: CONTRAST 'label' indvar effectvalues; 26 SAS ESSENTIALS -- Elliott & Woodward

Setting Up a Contrast Statement � For example, a CONTRAST statement to compare GROUP

Setting Up a Contrast Statement � For example, a CONTRAST statement to compare GROUP 1 versus the combined mean of GROUP 3 and 4 use: A Label of your choosing Definition of the contrast CONTRAST '1 vs 3+4' GROUP -1 0. 5. 5; � Note that the effectvalues ( -1, 0, 0. 5) sum up to zero. So ( -1, 0, 0. 5) The -1 Represents Group 1, the 0 Group 2, etc 27 The signs indicate the comparison. GROUPS 3 and 4 both have coefficients of 0. 5, which indicate that their means are combined equally, each contributing a half (0. 5) to the value. SAS ESSENTIALS -- Elliott & Woodward

Do Hands On Example p 324 (AGLM CONTRAST. SAS) This is standard ANOVA PROC

Do Hands On Example p 324 (AGLM CONTRAST. SAS) This is standard ANOVA PROC GLM DATA=CONTRAST; code. CLASS GROUP; MODEL OBSERVATION=GROUP; CONTRAST 'Groups 1 vs 3&4' GROUP -1 0. 5. 5; RUN; quit; Add one or more CONTAST Statements within PROC GLM 28 SAS ESSENTIALS -- Elliott & Woodward

Contrast Statement Results Standard ANOVA Results CONTRAST Statement Results. 29 SAS ESSENTIALS -- Elliott

Contrast Statement Results Standard ANOVA Results CONTRAST Statement Results. 29 SAS ESSENTIALS -- Elliott & Woodward

Continue CONTRAST Example � Add these statements: CONTRAST 'Drugs 1 vs 3&4 Again' GROUP

Continue CONTRAST Example � Add these statements: CONTRAST 'Drugs 1 vs 3&4 Again' GROUP -2 0 1 1; CONTRAST 'Drugs 1&2 vs 3&4' GROUP -. 5. 5. 5; � Adds two more CONTRAST comparisons to output: 30 SAS ESSENTIALS -- Elliott & Woodward

13. 4 SUMMARY � This chapter illustrates SAS procedures for comparing three or more

13. 4 SUMMARY � This chapter illustrates SAS procedures for comparing three or more means in both an independent group setting and for repeated measures. In both cases, the chapter includes examples illustrating how to perform posthoc multiple comparisons analysis. � Continue to Chapter 14: ANALYSIS OF VARIANCE, PART II 31 SAS ESSENTIALS -- Elliott & Woodward

These slides are based on the book: Introduction to SAS Essentials Mastering SAS for

These slides are based on the book: Introduction to SAS Essentials Mastering SAS for Data Analytics, 2 nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: 111904216 X ISBN-13: 978 -1119042167 These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to acelliott@smu. edu. Thanks. 32 SAS ESSENTIALS -- Elliott & Woodward