Computing with SAS Software A SAS program consists

  • Slides: 19
Download presentation
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA

Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS data set. 2. The PROC steps are groups of SAS statements that indicate what kind of statistical analysis to perform The following slides present the DATA STEP commands to read the data into SAS.

The SAS Data Step This is created by entering data, reading raw data, or

The SAS Data Step This is created by entering data, reading raw data, or accessing files created by other software according to a specific syntax. RAW DATA STEP SAS DATA SET Raw Data: Rows are observations and columns are variables Age Gender Exam grade Homework grade 19 F 90 94 20 M 89 90 20 F 78 86 19 M 95 90 21 M 83 85

 In SAS nomenclature: – Columns specify Variables – Rows contain the Observations. The

In SAS nomenclature: – Columns specify Variables – Rows contain the Observations. The general form of the data step is DATA name; NOTICE: All SAS statements Statements; end with a semicolon "; ". DATALINES; Name is the SAS data set name Remember the most common cause of error in SAS is the omission of the semicolon. ALWAYS CHECK FOR IT!!

/* This is an example on how to input data into SAS */ TITLE

/* This is an example on how to input data into SAS */ TITLE 'Example of data input in SAS'; SAS data set name DATA grades; INPUT age gender $ exam hwork; DATALINES; statement 19 F 90 94 20 M 89 90 20 F 78 86 19 M 95 90 21 M 83 85 ; Required: Semi-colon on a line by itself --- The null statement

Rules for SAS names • Total number of characters: 8 or less (Prior to

Rules for SAS names • Total number of characters: 8 or less (Prior to SAS version 8). • Must start with a letter or an underscore ( _ ). • Must not contain blanks Other rules • For a missing value of an observation: – enter a period (. ) for missing numeric – enter a blank ( ) for missing name (character variable).

The input statement It defines the variable name, type and location. 1. If data

The input statement It defines the variable name, type and location. 1. If data are in columns, use Col location INPUT age 1 -2 gender $ 4 exam 6 -7 hwork 9 -10; 2. To read several observations separated by at least one blank space: use the symbol @@ at the end of the input line. Example: DATA grades; INPUT age gender $ exam hwork @@; DATALINES; 19 F 90 94 20 M 89 90 20 F 78 86 19 M 95 90 21 M 83 85 ;

The datalines statement Immediately follows the input statement: DATA grades; INPUT age gender $

The datalines statement Immediately follows the input statement: DATA grades; INPUT age gender $ exam hwork @@; DATALINES; 19 F 90 94 20 M 89 90 20 F 78 86 19 M 95 90 21 M 83 85 ;

However, if the data are to be read from an external file (ex: mydata.

However, if the data are to be read from an external file (ex: mydata. txt), use the infile statement as follows: If data are in columns: DATA grades; INFILE ‘c: tmpgrades. dat’; INPUT age 1 -2 gender $ 4 exam 6 -7 hwork 9 -10; If data are NOT in columns: DATA grades; INFILE ‘c: tmpgrades. dat’; INPUT age gender $ exam hwork @@;

SAS Proc steps PROC <proc_name> DATA=dataset 1 [options 1]; [Statements / options 2]; proc_name

SAS Proc steps PROC <proc_name> DATA=dataset 1 [options 1]; [Statements / options 2]; proc_name of the PROC being used, DATA=dataset 1 name of the SAS data set to be analyzed (If omitted, the most recently created SAS data set is used. ) Options and statements vary from PROC to PROC: Descriptions in your SAS manual and the Help Window!!

PROC Print Is a procedure to display your data on screen (or into a

PROC Print Is a procedure to display your data on screen (or into a file) as part of the output. PROC PRINT DATA=grades; TITLE 2 'Listing of data'; TITLE 3 'Exam and Homework grades'; RUN; Example of data input in SAS Listing of data Exam and Homework grades Obs AGE GENDER EXAM HWORK 1 19 F 90 94 2 20 M 89 90 3 20 F 78 86 4 19 M 95 90 5 21 M 83 85

 If we had more variables and we wanted to print only the variable

If we had more variables and we wanted to print only the variable age and gender , specify it in proc print by using the keyword var: proc print data = grades ; var age gender ; run;

Labeling variables: to describe the variables, you can attach labels to variable names. DATA

Labeling variables: to describe the variables, you can attach labels to variable names. DATA Grades; INPUT AGE GENDER $ EXAM HWORK @@; LABEL AGE=’Student’s age’ GENDER=’Student’s sex’ EXAM = ‘Final exam grade’ HWORK=’Homework grade’; DATALINES; . . .

PROC format ¨To format values of variables PROC FORMAT; VALUE $SX ‘M’=’Male’ ‘F’=’Female’; RUN;

PROC format ¨To format values of variables PROC FORMAT; VALUE $SX ‘M’=’Male’ ‘F’=’Female’; RUN; DATA GRADES; INPUT AGE GENDER $ EXAM HWORK; FORMAT GENDER $SX. ; Notice the dot after $SX DATALINES; . . . REMEMBER: • Labels: for names of variables • Formats: for values of variables

Built-in formats in SAS PROC SORT DATA=GRADES; BY AGE; RUN; PROC PRINT DATA=GRADES; TITLE

Built-in formats in SAS PROC SORT DATA=GRADES; BY AGE; RUN; PROC PRINT DATA=GRADES; TITLE 2 "Students' grades in Age order"; ID AGE; VAR GENDER EXAM HWORK; RUN; Example of SAS Program for CSC 323 Students' grades in Age order AGE GENDER EXAM HWORK 19 Female 90 94 19 Male 95 90 20 Male 89 90 20 Female 78 86 21 Male 83 85

Data manipulation Data Transformations: Assignment statements create new variables. The usual arithmetic operations and

Data manipulation Data Transformations: Assignment statements create new variables. The usual arithmetic operations and transformations are available. DATA GRADES; INPUT AGE GENDER $ EXAM HWORK; MEANGRADE=(EXAM+HWORK)/2; AGE_LOG=LOG(AGE); DATALINES; . . .

/* Example of Program: Data on students’ grades*/ OPTIONS NODATE; /* Suppress the date

/* Example of Program: Data on students’ grades*/ OPTIONS NODATE; /* Suppress the date that is normally printed in the output*/ TITLE 'Example of SAS Program for CSC 323'; PROC FORMAT; VALUE $SX 'M'='Male' 'F'='Female';

/* It inputs the data; computes the final score as the average of the

/* It inputs the data; computes the final score as the average of the exam score and the homework grade and it assigns a letter grade */ DATA GRADES; INPUT AGE GENDER $ EXAM HWORK; FORMAT GENDER $SX. ; FINAL= (EXAM + HWORK)/2; IF FINAL < 75 THEN GRADE ='C'; ELSE IF FINAL >=75 AND FINAL <=85 THEN GRADE = 'B'; ELSE IF FINAL >=85 THEN GRADE = 'A'; LABEL AGE='Student's age' GENDER='Student's sex' EXAM = 'Final exam grade' HWORK='Homework grade'; DATALINES; 19 F 90 94 20 M 89 90 20 F 78 86 19 M 95 90 21 M 83 85 ;

/* It lists the student's grades in student's age order*/ PROC SORT DATA=GRADES; BY

/* It lists the student's grades in student's age order*/ PROC SORT DATA=GRADES; BY AGE; RUN; PROC PRINT DATA=GRADES; TITLE 2 "Students' grades in Age order"; ID AGE; VAR GENDER EXAM HWORK FINAL GRADE; RUN;

Program Output Example of SAS Program for CSC 323 Students' grades in Age order

Program Output Example of SAS Program for CSC 323 Students' grades in Age order AGE GENDER EXAM HWORK FINAL GRADE 19 Female 90 94 92. 0 A 19 Male 95 90 92. 5 A 20 Male 89 90 89. 5 A 20 Female 78 86 82. 0 B 21 Male 83 85 84. 0 B