Ann Arbor ASA Up and Running Series SAS
Ann Arbor ASA Up and Running Series: SAS Sponsored by the Ann Arbor Chapter of the American Statistical Association and the Department of Statistics of the University of Michigan
Contents • • • Starting SAS User Interface Libraries Syntax Getting Data into SAS Examining Data Manipulating Data Descriptive Statistics Graphing Data Statistics in SAS Up and Running Series: SAS 2
Starting SAS Start SAS 9. 3 (English) Up and Running Series: SAS 3
User Interface Log Comments, warnings, etc. Explorer/ Results Program Editor: Write and submit commands Up and Running Series: SAS Output (not seen) 4
Libraries • SAS requires the creation of Library folders to save the data – Libraries are accessed through LIBNAME command • Four Libraries are defined by default, at the start of SAS – – Maps SASHELP: holds help info and sample datasets SASUSER: holds settings, etc. WORK: default temporary Library for each session • All data stored in this folder will be deleted at the end of each SAS session • It is recommended the creation of permanent files/Libraries Up and Running Series: SAS 5
Libraries • Create a folder called ‘my_files’ on your desktop. • Run this command in SAS: LIBNAME a "C: UsersuniquenameDesktopmy_files"; • Refer to datasets in that folder by with the prefix ‘a. datasetname’. • TIP: Use memorable names for libraries, rather than ‘a’ (e. g. , ‘raw’, ‘final’, ‘time 1’, etc) Up and Running Series: SAS 6
Syntax • SAS divides commands into two groups – DATA step • create/alter datasets – PROC (Procedures) • perform statistical analyses or generate reports. • Some exceptions to the rule: – DATA step can be used to generate reports – PROC IMPORT creates a data set – PROC SORT alters data sets (without telling you!) Up and Running Series: SAS 7
Getting data into SAS • PROC IMPORT – Allows the reading of standard file types – Allows the reading of plain text, with user-specified delimiters (i. e. , the characters which separate the data) – WARNING – SAS changed PROC IMPORT for Excel and Access files, in 64 -bit SAS • DATA step – Allows the reading of non-standard file types, complex file structures, and unusual delimiters. Up and Running Series: SAS 8
DATA step • SAS syntax can be used to read in raw data files (. txt, . csv files), specifying which variables to read in, which ones are text/numeric, combining multiple rows into one case, etc. • However, this is a more advanced topic. – Follow up with an Intro class from CSCAR, or by going through examples from the literature (e. g. , ‘The Little SAS Book’). Up and Running Series: SAS 10
Examining Data • VIEWTABLE Window – Select dataset icon in Explorer • PROC CONTENTS – Produces a listing of data set information, including the variables and their properties • PROC PRINT – Prints a subset of variables or cases to the output window Up and Running Series: SAS 11
VIEWTABLE Window Up and Running Series: SAS 12
PROC CONTENTS • In the Editor window, type: PROC CONTENTS data=a. class 2; run; • Highlight the syntax • Submit for processing – Click on icon of ‘running-man’ – Right click on selected syntax Submit Selection Up and Running Series: SAS 13
PROC CONTENTS Up and Running Series: SAS 14
PROC PRINT • In the Editor window, type: PROC PRINT data=a. class 2; run; • Submit for processing Up and Running Series: SAS 15
PROC PRINT Up and Running Series: SAS 16
Manipulating Data • Usually done within a data step – Match data sets using a shared key variable – Create new variables, or drop/rename existing variables – Take one or more subsets of the data – Sort the data by specific variable(s). • Overwrite existing or create new datasets – PROC SORT – Adding/Removing variables – Merging Datasets Up and Running Series: SAS 17
PROC SORT • In the Editor window, type: PROC SORT data=a. class 2 out=a. class 2 sorted; by age descending weight height; run; • Submit for processing • WARNING: PROC SORT alters data – Store in a new dataset out=‘newdatasetname’; Up and Running Series: SAS 18
PROC SORT Up and Running Series: SAS 19
Adding/Removing variables • Create new data set, compute new variables, remove unwanted variables DATA a. class 2 metric (drop=weight height sex age); set a. class 2; height_cm=height*2. 54; weight_kg=weight/2. 2; label height_cm=‘Height in CM’ weight_kg=‘Weight in Kilograms’; run; PROC PRINT data=a. class 2 metric; run; • Submit for processing Up and Running Series: SAS 20
Adding/Removing variables Up and Running Series: SAS 21
Merging Datasets • Data sets must be sorted by the same key variable(s) proc sort data=a. class 2; by name; proc sort data=a. class 2 metric; by name; data classmerged; merge a. class 2 metric; by name; run; • Submit for processing Up and Running Series: SAS 22
Merging Datasets Up and Running Series: SAS 23
Merging Datasets Up and Running Series: SAS 24
Descriptive Statistics • PROC FREQ – Produces a table of counts and percentages – For cross-tabulations, statistical tests can also be performed; e. g. , independence testing • PROC MEANS – Produces descriptive statistics such as mean, standard deviation, minimum, maximum Up and Running Series: SAS 25
PROC FREQ • In the Editor window, type proc freq data=a. class 2; tables age*sex; run; • Submit for processing Up and Running Series: SAS 26
PROC FREQ Up and Running Series: SAS 27
PROC MEANS • In the Editor window, type proc means data=a. class 2; var age weight height; run; • Submit for processing Up and Running Series: SAS 28
PROC MEANS Up and Running Series: SAS 29
Graphing Data PROC GPLOT • • Simple bivariate scatterplot Separate lines Multiple variables scatterplot Options Up and Running Series: SAS 30
PROC GPLOT • Simple bivariate scatterplot: proc gplot data=a. class 2; symbol 1 value=dot interpol=rl; plot weight*height; run; • Submit for processing Up and Running Series: SAS 31
PROC GPLOT - Log Up and Running Series: SAS 32
PROC GPLOT Up and Running Series: SAS 33
PROC GPLOT • To graph separate lines for each level of a categorical variable, type: proc gplot data=a. class 2; symbol 1 value=dot interpol=rl; plot weight*height = sex; run; • Submit for processing Up and Running Series: SAS 34
PROC GPLOT Up and Running Series: SAS 35
PROC GPLOT • Multiple variables on the same graph: proc gplot data=a. class 2; symbol 1 value=dot interpol=rl color=blue; symbol 2 value=dot interpol=rl color=red; plot weight * age; plot 2 height * age; run; quit; • Submit for processing Up and Running Series: SAS 36
PROC GPLOT Up and Running Series: SAS 37
PROC GPLOT value=___ interpol=___ • Any character enclosed in • RL / RQ / RC single quotes – linear – quadratic • Special characters – – – dot plus sign star square. . . and many others – cubic – regression curves • JOIN – connects consecutive points (line graph) • BOX Up and Running Series: SAS 38
Statistics in SAS • PROC CORR – Correlational analyses • PROC REG – Statistical Regression • PROC UNIVARIATE – To assess normality of regression residuals Up and Running Series: SAS 39
PROC CORR • Compute bivariate correlation coefficients proc corr data = a. class 2; var age; with height weight; run; Up and Running Series: SAS 40
PROC CORR Up and Running Series: SAS 41
PROC REG • Run a regression on merged ‘class’ dataset – Save residuals and predicted values in an output dataset – Request residual plot proc reg data=a. classmerged; model height_cm=age weight / partial; output out=reg_data p=predict r=resid rstudent=rstudent; plot rstudent. * height_cm; run; quit; • Notes – the quit command terminates the regression procedure; otherwise it keeps running; the output data set will be in the work library, since no library 42 Up and Running Series: SAS was specified.
PROC REG Up and Running Series: SAS 43
PROC REG Up and Running Series: SAS 44
PROC REG Up and Running Series: SAS 45
PROC REG Up and Running Series: SAS 46
PROC UNIVARIATE • Assess normality of regression residuals stored in the output dataset from PROC REG: proc univariate data=reg_data; var rstudent; histogram; qqplot / normal (mu=est sigma=est); run; quit; Up and Running Series: SAS 47
PROC UNIVARIATE Up and Running Series: SAS 48
PROC UNIVARIATE Up and Running Series: SAS 49
PROC UNIVARIATE Up and Running Series: SAS 50
QUESTIONS Up and Running Series: SAS 51
Winter 2013 Training from CSCAR http: //cscar. research. umich. edu/workshops/ Introduction to SAS® - January 28, 30, February 1, 4, 6, 8, 2013 Intermediate Topics in SPSS: Data Management and Macros - February 5, 7, 2013 Intermediate Topics in SPSS: Advanced Statistical Models - February 12, 14, 2013 Intermediate SAS® - February 25, 27, March 1, 2013 Regression Analysis - March 11, 13, 15, 2013 Applications of Hierarchical Linear Models - March 18, 20, 22, 2013 Statistical Analysis with R - March 19, 21, 2013 Introduction to NVivo - April 3, 2013 Applied Structural Equation Modeling - April 10, 11, 12, 2013 Up and Running Series: SAS 52
Further Resources • The Little SAS Book: A Primer • UCLA site – software tutorials, classes and lectures on statistical methods – an incredible site! http: //www. ats. ucla. edu/stat/ • SAS Documentation: http: //support. sas. com/documentation/ Documentation also found in ‘SAS help’ files. Up and Running Series: SAS 53
Other Winter 2013 Workshops from Ann Arbor ASA R - January 31, 1 -3 PM Angell Hall Computing Classroom B (also known as MH 444 -B) For more information go to: http: //community. amstat. org/annarbor/home Up and Running Series: SAS 54
Chapter Meetings open to all PLACE Starbucks State & Liberty, lower level TIME 6: 00 pm – 6: 45 pm, DATE TOPIC 24 -JAN Business Meeting 1 -APR Business Meeting and Election of Officers For more information go to: http: //community. amstat. org/annarbor/home Up and Running Series: SAS 55
- Slides: 54