Introduction to SAS Essentials Mastering SAS for Data

  • Slides: 94
Download presentation
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward 1

Chapter 4: Preparing Data for Analysis 2

Chapter 4: Preparing Data for Analysis 2

LEARNING OBJECTIVES � To be able to label variables with explanatory names � To

LEARNING OBJECTIVES � To be able to label variables with explanatory names � To be able to create new variables � To be able to use SAS® IF-THEN-ELSE statements � To be able to use DROP and KEEP to select variables � To be able to use the SET statement � To be able to use PROC SORT � To be able to append and merge data sets � To be able to use PROC FORMAT � Going Deeper: To be able to find first and last values in a group 3

The information in this chaper is about the DATA Step… � RECALL… typical SAS

The information in this chaper is about the DATA Step… � RECALL… typical SAS program flow: DATA Step – defines the data set. PROCS are procedures that do statistical procedures 4 All of the information in this chapter involve the DATA Step

4. 1 LABELING VARIABLES WITH EXPLANATORY NAMES � SAS labels are used to provide

4. 1 LABELING VARIABLES WITH EXPLANATORY NAMES � SAS labels are used to provide descriptive names for variables. � The LABEL statement uses the format: LABEL VAR 1= ' Label for VAR 1 ' VAR 2= 'Label for VAR 2'; � Use either single or double quotation marks in the LABEL statement, but you must match the type within each definition statement. � When SAS prints out information for VAR 1, it also includes the label, making the output more readable. 5

Labeling Variables � The following program (DLABEL. SAS): illustrates the use of labels. DATA

Labeling Variables � The following program (DLABEL. SAS): illustrates the use of labels. DATA MYDATA; INFILE 'C: SASDATABPDATA. DAT'; * READ DATA FROM FILE; INPUT ID $ 1 SBP 2 -4 DBP 5 -7 GENDER $ 8 AGE 9 -10 WT 11 -13; LABEL ID = 'Identification Number' SBP= 'Systolic Blood Pressure' DBP = 'Diastolic Blood Pressure' AGE = 'Age on Jan 1, 2000' WT = 'Weight' ; PROC MEANS; VAR SBP DBP AGE WT; RUN; � 6 Notice that the LABEL statement is placed within the DATA step.

Hands On Exercise p 77 � Output without labels: � And with labels: 7

Hands On Exercise p 77 � Output without labels: � And with labels: 7

CREATING VARIABLES IN THE SAS DATA STEP � Arithmetic Operators (Table 4. 4) page

CREATING VARIABLES IN THE SAS DATA STEP � Arithmetic Operators (Table 4. 4) page 79. 8

Order of Operations… � You may remember the following mnemonic from a math class

Order of Operations… � You may remember the following mnemonic from a math class that can help you remember the order of operations: “Please excuse my dear Aunt Sally. ” � Parentheses Exponents Multiplication Division Addition Subtraction 9

EXERCISE - CREATING A NEW VARIABLE � Open the file named DCALC. SAS, (p

EXERCISE - CREATING A NEW VARIABLE � Open the file named DCALC. SAS, (p 79) create a new variable by calculation: Note the creation of a new variable DATA ROOMSIZE; named AREA by calculation INPUT ROOM $ W L; AREA=W*L; Label L="Length" W="Width" AREA="Sq. Feet"; DATALINES; LIVING 14 22 DINING 14 12 BREAKFAST 10 12 etcetera… ; RUN; PROC PRINT Label; SUM AREA; RUN; 10

RUN THE CODE AND OBSERVE RESULTS � Note that the resulting SAS data set

RUN THE CODE AND OBSERVE RESULTS � Note that the resulting SAS data set (named ROOMSIZE) contains a column labeled “Sq. Feet” � This column is the calculated AREA variable, and it is labeled as “Sq. Feet” because of the Label statement: � 11 Label AREA="Sq. Feet";

NOTE THE PLACEMENT OF CALCULATION… DATA ROOMSIZE; INPUT ROOM $ W L; AREA=W*L; Note

NOTE THE PLACEMENT OF CALCULATION… DATA ROOMSIZE; INPUT ROOM $ W L; AREA=W*L; Note how the AREA calculation is within the DATA step, after W and L have been defined, and before the DATALINES statement. Label L="Length" W="Width" AREA="Sq. Feet"; DATALINES; LIVING 14 22 DINING 14 12 BREAKFAST 10 12 etcetera… 12

ADD TO THE EXERCISE � Suppose ceilings are 8 feet tall, and you want

ADD TO THE EXERCISE � Suppose ceilings are 8 feet tall, and you want to calculate the total living area volume (cubic feet for air conditioning purposes). � Add the calculation VOL= W*L*8; � To the code. Also add a label to display VOL as Volume and add VOL to the SUM Statement. � Rerun and observe the results. DO THE EXERCISE 13

Your new code should look something like this: Results… 14 Note new Volume column

Your new code should look something like this: Results… 14 Note new Volume column

Creating New Variables as Constant Values Type in this program and run it: DATA

Creating New Variables as Constant Values Type in this program and run it: DATA PI; INPUT RADIUS; PI=3. 1415927; AREA=PI*RADIUS**2; CIRCUM=2*PI*RADIUS; DATALINES; 10 1000 ; PROC PRINT; RUN; NOTE that the value of PI is a constant used in subsequent calculations. When you look at the output, notice that PI is a variable in the data set. 15

4. 3 USING IF-THEN-ELSE CONDITIONAL STATEMENT ASSIGNMENTS � Another way to create a new

4. 3 USING IF-THEN-ELSE CONDITIONAL STATEMENT ASSIGNMENTS � Another way to create a new variable in the DATA step is to use the IF-THEN- ELSE conditional statement construct Format is: IF expression THEN statement; ELSE statement; Thus… IF SBP GE 140 THEN HIGHBP=1; ELSE HIGHBP=0; Creates a variable named HIGHBP with the values 1 or 0. 16

Comparison Operators IF SBP GE 140 THEN HIGHBP=1; ELSE HIGHBP=0; A comparison operator tells

Comparison Operators IF SBP GE 140 THEN HIGHBP=1; ELSE HIGHBP=0; A comparison operator tells SAS how to evaluate a condition – in this care Greater Than or Equal To. 17

Logical Operators IF AGE GT 19 AND GENDER=”M” then Use either the Mnemonic (OR)

Logical Operators IF AGE GT 19 AND GENDER=”M” then Use either the Mnemonic (OR) GROUP=1; or the symbol for OR, | or IF TREATMENT EQ “A” | GROUP=2 THEN CATEGORY=”GREEN”; 18

A more complex use of operators IF TRT=”A” THEN GROUP=1; ELSE IF TRT=”B” OR

A more complex use of operators IF TRT=”A” THEN GROUP=1; ELSE IF TRT=”B” OR TRT=”C” THEN GROUP=2; ELSE GROUP=3; Uses an ELSEIF Clause 19

Do Hands-On Exercise p 83 � Open the file DCONDITION. SAS, and run the

Do Hands-On Exercise p 83 � Open the file DCONDITION. SAS, and run the program. Note the IF statement 20

RESULTS Notice the STATUS column 21

RESULTS Notice the STATUS column 21

ADD A NEW IF STATEMENT… IF AGE GT 65 THEN GROUP="SENIOR"; ELSE IF AGE

ADD A NEW IF STATEMENT… IF AGE GT 65 THEN GROUP="SENIOR"; ELSE IF AGE GE 18 and AGE LE 65 THEN GROUP="ADULT"; ELSE GROUP="YOUTH"; This code creates a new variable named GROUP according to the IF – ELSE IF – and ELSE statement. Note the use of an ELSEIF Clause Exercise: Add this code to the DCONDITION. SAS code. DO THE EXERCISE 22

RESULTS Notice the new column 23

RESULTS Notice the new column 23

Using IF to Assign Missing Values � Be Careful: Data sets often contain missing

Using IF to Assign Missing Values � Be Careful: Data sets often contain missing data codes to record when data are missing. For example for the variable age you might assign an impossible value, say -9, as a missing value code. Then… IF AGE EQ -9 then AGE =. ; IF GENDER ="X" then GENDER=""; In your DATA Step assigns the SAS missing value code. (dot) to AGE when the value is -9. You MUST do this for SAS to know how to handle missing values in statistical procedures. For character variables a missing value is a blank. 24

In this example…What could go wrong? IF AGE GT 12 AND AGE LT 20

In this example…What could go wrong? IF AGE GT 12 AND AGE LT 20 THEN TEEN=1; ELSE TEEN = 0; � Suppose this is your data 25 ID AGE 001 12 002 20 003 19 004 . WHAT VALUE FOR TEEN?

A better way… IF AGE GT 12 AND AGE LT 20 THEN TEEN=1; ELSE

A better way… IF AGE GT 12 AND AGE LT 20 THEN TEEN=1; ELSE TEEN = 0; IF AGE =. THEN TEEN =. ; Guarantees that if AGE is already missing, TEEN will also be coded as missing. 26

Do Hands-On p 84 � Uses another method to create TEEN IF AGE=. Then

Do Hands-On p 84 � Uses another method to create TEEN IF AGE=. Then TEEN=. ; ELSE IF AGE GT 12 and AGE LT 20 then TEEN=1; ELSE TEEN=0; 27

Using IF and IF-THEN To Subset Data Sets � Data sets can be quite

Using IF and IF-THEN To Subset Data Sets � Data sets can be quite large. You may have a data set that contains some group of subjects (records) that you want to eliminate from your analysis. In that case, you can subset the data so it will contain only those records you need. � One method of eliminating certain records from a data set is to use a subsetting IF statement in the DATA step. The syntax for this statement is as follows: IF expression; 28

Subsetting IF � For example, to select records containing the (character) value F (only

Subsetting IF � For example, to select records containing the (character) value F (only females) from a data set, you could use this statement within a DATA step: IF GENDER EQ 'F'; � Note that you can use single or double quotation marks("F" or 'F' ) in this statement. 29

Subsetting with IF… DELETE � The opposite effect can be created by including the

Subsetting with IF… DELETE � The opposite effect can be created by including the statement THEN DELETE at the end of the statement: IF expression THEN DELETE; � For example, to get rid of certain records (all males) in a data set, you could use the code IF GENDER EQ 'M' THEN DELETE; � Do Hands-on Example p 86 30

Hands on Example p 86 (DSUBSET 1. SAS) Note the subsetting IF statement EXERCISE

Hands on Example p 86 (DSUBSET 1. SAS) Note the subsetting IF statement EXERCISE – using the structure IF expression THEN DELETE; Change the IF statement to delete data where GP=“A” --- PAUSE --- do this example, then come back to the tutorial. 31

Answer � Using the statement IF GP = "A" then DELETE; You get these

Answer � Using the statement IF GP = "A" then DELETE; You get these results. Notice no group A subjects in the listing… 32

Using IF-THEN and DO for Program Control � Another use of the IF statement

Using IF-THEN and DO for Program Control � Another use of the IF statement is to control the flow of your SAS program in conjunction with a DO statement. In this case, you can cause a group of SAS commands to be conditionally executed by using the following type of code: IF expression THEN DO; SAS Code to conditionally execute; END; 33

Example of Using IF –THEN - DO � Suppose you want to calculate BMI

Example of Using IF –THEN - DO � Suppose you want to calculate BMI (Body Mass Index) for subjects in a data set, but the formula is only relevant for subjects older than 19 years. Plus, at the same time you want to assign other values for this same group of subjects. You could use the code: IF AGE GT 19 THEN DO; BMI=(WTLBS/HTINCH**2)*703; ISADULT=1; INCLUDEINSTUDY=”Yes”; END; You MUST end the DO phrase with an END; 34 The code within DO to END is performed only if AGE GT 10.

Using @ and IF to Conditionally Read Lines in a Data Set � For

Using @ and IF to Conditionally Read Lines in a Data Set � For big data sets, it is often the case that you don't want to read in all of the data. One method you could use to conditionally read in certain records is to set up a test condition and read in the record only if it meets that condition. � To do this, you can use the @ (at) sign in your input statement: Notice the use of the @ here – stops input so you can use an IF statement… INPUT GP $ 5 AGE 6 -9 @ ; IF GP EQ "A" and AGE GE 10 THEN INPUT ID $ 1 -3 GP $ 5 AGE 6 -9 TIME 1 10 -14 TIME 2 15 -19; 35

4. 4 USING DROP AND KEEP TO SELECT VARIABLES � The DROP and KEEP

4. 4 USING DROP AND KEEP TO SELECT VARIABLES � The DROP and KEEP statements in the DATA step allow you to specify which variables to retain in a data set: DROP variables; KEEP variables; � For example, DATA MYDATA; INPUT A B C D E F G; DROP E F; DATALINES; . . . etc. . . � Do Hands on Example p 88 36

Do Hands on Example p 88 � Open the program file DKEEP. SAS. DATA

Do Hands on Example p 88 � Open the program file DKEEP. SAS. DATA MYDATA; INFILE 'C: SASDATAEXAMPLE. CSV' DLM=', ' FIRSTOBS=26; INPUT GROUP $ AGE TIME 1 TIME 2 TIME 3 TIME 4 SOCIO; KEEP AGE TIME 1 SOCIO; ; PROC PRINT; Notice the KEEP Statement RUN; 37

Run the code, and observe the results � Notice, only the variables in the

Run the code, and observe the results � Notice, only the variables in the KEEP statement are in the resulting SAS data file. Exercise: Change the KEEP statement to DROP GROUP AGE SOCIO; Rerun the program and observe the results… Pause here… 38

Results of the DROP statement: 39

Results of the DROP statement: 39

Extra: Using DROP, KEEP, and RENAME in the DATA statement � Make this change

Extra: Using DROP, KEEP, and RENAME in the DATA statement � Make this change in the program. Take out the DROP statement and modify the DATA statement: Notice how you can RENAME, DROP, or KEEP within the DATA statement. Note that this version of the DROP statement uses DROP= DATA MYDATA (RENAME=(SOCIO=STATUS) DROP=TIME 2 TIME 4); INFILE 'C: SASDATAEXAMPLE. CSV' DLM=', ' FIRSTOBS=26; INPUT GROUP $ AGE TIME 1 TIME 2 TIME 3 TIME 4 SOCIO; PROC PRINT; Exercise: RUN; Pause, enter this code, and return when you’ve run the new code. 40

Results of the RENAME and DROP in the DATA Statement Notice how SOCIO was

Results of the RENAME and DROP in the DATA Statement Notice how SOCIO was renamed STATUS. 41

4. 5 USING THE SET STATEMENT TO READ AN EXISTING DATA SET � Suppose

4. 5 USING THE SET STATEMENT TO READ AN EXISTING DATA SET � Suppose you have a big data set you want to use – modified. Don’t modify your ORIGINAL data set – modify a copy. Original Data set 42 Modified Copy

Another way to “ENTER” data � Suppose you already have a data set named

Another way to “ENTER” data � Suppose you already have a data set named OLD. You can make a “copy” using DATA NEW; SET OLD; Or DATA NEW; SET “C: SASDATAOLD”; � Now the NEW data set is identical to OLD. You can now modify NEW without changing the original data set. 43

Creating a Data Set from an Existing Data Set 44

Creating a Data Set from an Existing Data Set 44

Using SET - Example 1 � Suppose you have a data set named ALL.

Using SET - Example 1 � Suppose you have a data set named ALL. You want to create two subsets, FEMALE and MALE. DATA MALES; SET ALL; IF GENDER ='M'; RUN; DATA FEMALES; SET ALL; IF GENDER =‘F'; RUN; Creates a data set with only Males Creates a data set with only Females � Now you have three data sets to work with…ALL, MALES and FEMALES. 45

Using SET - Example 2 � You receive a data set from the government,

Using SET - Example 2 � You receive a data set from the government, and you need to modify it before using it: This is the original data set This is the new (copied) data set DATA MYSASLIB. STUDY; SET MYSASLIB. GOV; IF AGE =-9 then AGE=. ; IF SBP=-99 then SBP=. ; * A NUMBER OF RECODES; BMI = WT /( HT **2) * 703; * A CALCULATION; *Etc; �Thus… you’re manipulating a copy of the original data… and not changing the original file. 46

Do Hands-On Exercise p 90 � Using a subsetting IF statement… � DSUBSET 3.

Do Hands-On Exercise p 90 � Using a subsetting IF statement… � DSUBSET 3. SAS 47

4. 6 USING PROC SORT � The SORT procedure can be used in the

4. 6 USING PROC SORT � The SORT procedure can be used in the DATA step to rearrange the observations in a SAS data set or create a new SAS data set containing the rearranged observations. � The Sorting Sequence is shown in the table: Character variables Numeric variables: Default 48 Sorting sequence information for SAS data sets blank!"#$%&'()*+, ‑. /0123456789: ; <=>? @ ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_abcdefghijklmnop qrstuvwxyz(|)~ Missing values first, then numeric values Ascending (or indicate Descending)

Syntax for PROC SORT The syntax for PROC SORT is: PROC SORT <options>; BY

Syntax for PROC SORT The syntax for PROC SORT is: PROC SORT <options>; BY variable(s); Common options for PROC SORT include: DATA=datasetname; OUT= outputdatasetname; 49

Example of PROC SORT DATA=MYDATA OUT=MYSORT; BY RECTIME; Requires a BY variable. OPTIONS –

Example of PROC SORT DATA=MYDATA OUT=MYSORT; BY RECTIME; Requires a BY variable. OPTIONS – Specifies that a new resulting data set be created named MYSORT � This example sorts the MYDATA data set by RECTIME and puts the resulting data set into a new dataset named MYSORT. The original data set is NOT CHANGED. � Do Hands-On Examples p 92 & 93. (DSORT 1. SAS, DSORT 2. SAS) 50

Do Hands-On Example p 92 � RESULTS: Open the file DSORT 1. SAS DATA

Do Hands-On Example p 92 � RESULTS: Open the file DSORT 1. SAS DATA MYDATA; INPUT GROUP RECTIME; DATALINES; 1 4. 2 2 3. 6 2 3. 1 1 2. 1 Note create a new Note SORT BY 1 2. 8 data set named S 1 RECTIME 2 1. 5 1 1. 8 ; PROC SORT DATA=MYDATA OUT=S 1; BY RECTIME; Title 'Sorting Example - Ascending'; PROC PRINT DATA=S 1; RUN; 51 Note sorted column.

Exercise for PROC SORT � Change the PROC SORT statement to sort DESCENDING, and

Exercise for PROC SORT � Change the PROC SORT statement to sort DESCENDING, and save the results to a SAS data file named WORK. S 2. PROC SORT DATA=MYDATA OUT=S 2; BY DESCENDING RECTIME; (Change the PROC PRINT so it will print the new data set. ) PAUSE: Run this new code, then come back to the tutorial. 52

Results from PROC SORT Note that now RECTIME is descending. 53

Results from PROC SORT Note that now RECTIME is descending. 53

Example p 93 � Open DSORT 2. SAS. Note the KEEP= and RENAME= statements.

Example p 93 � Open DSORT 2. SAS. Note the KEEP= and RENAME= statements. This is similar to what we recently learned for DROP, KEEP, and RENAME in the DATA Statement. Run this code. PROC SORT DATA="C: SASDATASOMEDATA" OUT=ANALYSIS (KEEP=ID GP AGE GENDER RENAME=( ID=SUBJECT AGE=DXAGE)); BY ID; RUN; TITLE "Example of KEEP and RENAME in SORT"; PROC PRINT DATA=ANALYSIS; RUN; 54

Compare Before and After SORT and KEEP � Data Set Before SORT and KEEP

Compare Before and After SORT and KEEP � Data Set Before SORT and KEEP � Data Set after SORT and KEEP Note how ID is now SUBJECT and AGE is now DXAGE. 55

4. 7 APPENDING AND MERGING DATA SETS � Appending adds new records to an

4. 7 APPENDING AND MERGING DATA SETS � Appending adds new records to an existing data set. (It is sometimes called a vertical merge. ) APPEND adds records to the data set OLD 1 OLD 2 � Merging adds variables to a data set through the use of a key identifier that is present in both data sets (usually an identification code. ) (It is sometimes called a horizontal merge. ) MERGE adds variables to the data set OLD 1 56 OLD 2

APPENDING DATA SETS � Appending is accomplished by including multiple data set names in

APPENDING DATA SETS � Appending is accomplished by including multiple data set names in the SET statement. For example, DATA NEW; SET OLD 1 OLD 2; NOTE: You can append many files: DATA NEW; SET OLD 1 OLD 2 OLD 3 OLD 4 etc; 57

HANDS-ON EXAMPLE P 95 � Open the file named DAPPEND 1. SAS The goal

HANDS-ON EXAMPLE P 95 � Open the file named DAPPEND 1. SAS The goal is to append data set OLD 1 with OLD 2 using the DATA step. After the RUN statement, add these statements DATA NEW; SET OLD 1 OLD 2; PROC PRINT DATA=NEW; RUN; PAUSE the tutorial – Enter this new code and return… 58

RESULTS OF APPENDING The OLD 2 data set is added to OLD 1 resulting

RESULTS OF APPENDING The OLD 2 data set is added to OLD 1 resulting in more records for the NEW data set Note – if variables in the data sets don’t match, the resulting data set will contain missing values where variables don’t match. 59

Merging Data Sets by a Key Identifier � What is a Key Identifier? �

Merging Data Sets by a Key Identifier � What is a Key Identifier? � Usually a variables at is an ID number, Subject Number, Patient Number, etc � Must be unique for each person � There does not have to be the same number in each data set to merge � Unmatching IDs will result in missing values � First, we’ll look at a one-to-one match 60

Two Steps to a Merge: Sort, then Merge � The technique for merging the

Two Steps to a Merge: Sort, then Merge � The technique for merging the data sets using some key identifier (such as patient ID) is as follows: 1. Sort each data set by the key identifier. 2. Within a DATA step, use the MERGE statement along with a BY statement to merge �Example: SORT each data set BY the key identifier PROC SORT DATA=PRE; BY CASE; PROC SORT DATA=POST; BY CASE; Then perform the DATA PREPOST; merge – by the SAME MERGE PRE POST; BY CASE; key identifier 61

HANDS-ON EXAMPLE P 97 � Open the file DMERGE 1. SAS In this example,

HANDS-ON EXAMPLE P 97 � Open the file DMERGE 1. SAS In this example, complete the 2 nd PROC SORT by adding BY CASE. Then do the MERGE. DATA PREPOST; MERGE PRE POST; BY CASE; DIFF=POSTREAT-PRETREAT; PROC PRINT DATA=PREPOST; RUN; PAUSE the tutorial – Enter this new code and return… 62

RESULTS Note how the new variable DIFF was calculated during the MERGE 63

RESULTS Note how the new variable DIFF was calculated during the MERGE 63

Rename While Merging � As with the SORT statement, you can RENAME, DROP, and

Rename While Merging � As with the SORT statement, you can RENAME, DROP, and KEEP variables during the MERGE. You can also merge many files at a time. The following shows the syntax for merging four data sets and performing a RENAME, DROP, and KEEP on the third data set Note that the RENAME and DROP are occurring in the DATA 3 data DATA newdataset; set in this example MERGE datal data 2 data 3 (RENAME=(oldname=newname) DROP=variables or KEEP=variables)) data 4; BY keyvar; RUN; 64

Hands On Example page 97 (DMERGE 1. SAS) DATA PREPOST; MERGE PRE POST; BY

Hands On Example page 97 (DMERGE 1. SAS) DATA PREPOST; MERGE PRE POST; BY CASE; DIFF=POSTREAT - POSTTREAT; TITLE 'Merge Example'; PROC PRINT DATA=PREPOST; RUN; MERGE the data sets PRE and POST, and calculate DIFF. RENAME the variable PRETREAT to BASELINE during the merge (part 4. ) 65

Hands On Example continued � To RENAME a variable during the MERGE, use this

Hands On Example continued � To RENAME a variable during the MERGE, use this code: DATA PREPOST; MERGE PRE (RENAME=(PRETREAT=BASELINE)) POST; BY CASE; DIFF=POSTREAT - BASELINE; TITLE 'Merge Example'; Note the RENAME of the PROC PRINT DATA=PREPOST; variable PRETREAT. Also RUN; make sure you change the variable name in the calculation of DIFF. 66

Few-To-Many-Merge � A Few-To-Many merge is used when you have records in one data

Few-To-Many-Merge � A Few-To-Many merge is used when you have records in one data set that you want to merge into some table that contains (typically) a smaller number of categories. � Suppose you own an auto parts store. You sell products to several kinds of buyers – and each get a particular discount. � You want to produce a report that shows the amount of actual sales price for a number of purchases. 67

Data for Few-To-Many Merge (Hands-On p 99) This is the “MANY” data set. This

Data for Few-To-Many Merge (Hands-On p 99) This is the “MANY” data set. This is the “FEW” data set 68

How to set up the few-to many (match) merge… � Define the Discounts (FEW)

How to set up the few-to many (match) merge… � Define the Discounts (FEW) data set: Repair Shops: 33% Discount � CONSUMERS 0% Discount � Other Auto Stores 40% Discount � Define the TYPE data set (The “FEW”) � DATA TYPE; FORMAT BUYERTYPE $8. ; INPUT BUYERTYPE DISCOUNT; DATALINES; REPAIR. 33 CONSUMER 0 STORE. 40 ; 69 Note here that because you use a FORMAT statement to specify the format of BUYERTYPE, you don’t have to indicate type in the INPUT statement. Otherwise, that statement would have to be INPUT BUYERTYPE $ DISCOUNT;

Define the “MANY” data set Note FORMAT Statement DATA SALES; FORMAT ITEM $20. BUYERTYPE

Define the “MANY” data set Note FORMAT Statement DATA SALES; FORMAT ITEM $20. BUYERTYPE $8. ; INPUT ITEM BUYERTYPE PRICE; DATALINES; This is the “MANY” data set – in real life this may CARBCLEANER REPAIR 2. 30 be thousands of BELT CONSUMER 6. 99 transactions… MOTOROIL CONSUMER 14. 34 CHAIN STORE 18. 99 SPARKPLUGS REPAIR 28. 99 CLEANER CONSUMER 1. 99 WRENCH STORE 18. 88 ; 70

Prepare the two data sets for the merge PROC SORT DATA=SALES; BY BUYERTYPE; PROC

Prepare the two data sets for the merge PROC SORT DATA=SALES; BY BUYERTYPE; PROC SORT DATA=TYPE; BY BUYERTYPE; � And do the merge… DATA REPORT; MERGE SALES TYPE; BY BUYERTYPE; FINAL =ROUND(PRICE*(1 -DISCOUNT), . 01); RUN; PROC PRINT DATA=REPORT; RUN; * GET REPORT; 71

Few-to-Many Merge Results Note: Final price reflects the proper discount 72

Few-to-Many Merge Results Note: Final price reflects the proper discount 72

4. 8 USING PROC FORMAT � The PROC FORMAT procedure allows you to create

4. 8 USING PROC FORMAT � The PROC FORMAT procedure allows you to create your own custom formats. � These custom formats allow you to specify the information that will be displayed for selected values of a variable. � For example, suppose you’ve coded DISEASED and NOT DISEASED as 0 and 1. You can create a format where 0 means DISEASED and 1 means NOT DISEASED so when output is displayed the words instead of the number codes appear. 73

Using PROC FORMAT � The steps for using formatted values are 1. 2. 3.

Using PROC FORMAT � The steps for using formatted values are 1. 2. 3. Create a FORMAT definition using PROC FORMAT. Apply the FORMAT to one or more variables. You can apply a format (once it is defined in PROC FORMAT in a DATA step or in a data analysis PROC statement. Choose any name for the format (similar For example: restrictions as for SAS variables. ) We name them as FMTsomething to make the name obvious. PROC FORMAT; VALUE FMTMARRIED 0 = "No" l = "Yes"; RUN; 74

Numeric and Character Formats Define a format for a numeric PROC FORMAT; variable. VALUE

Numeric and Character Formats Define a format for a numeric PROC FORMAT; variable. VALUE fmtname 1 number 1=“name 1” For a character variable, the format name must number 2=“name 2” start with a $, and the textnames must be in etc; quotes. VALUE $fmtname 2 “textname 1”=“name 1” “textname 2”=“name 2” etc; RUN; 75

Example – Numeric and Character Definitions PROC FORMAT; Numeric format defined. VALUE FMTMARRIED 0="No"

Example – Numeric and Character Definitions PROC FORMAT; Numeric format defined. VALUE FMTMARRIED 0="No" 1="Yes"; VALUE $FMTGENDER “M”=“Male” “F”=“Female”; RUN; Character format defined – take note of format name $FMTGENDER and the values “M” and “F” are in quotes 76

Ways to specify formats � Formats may also use ranges. For example, suppose that

Ways to specify formats � Formats may also use ranges. For example, suppose that you want to classify your AGE data using the designations Child, Teen, Adult, and Senior. You could do this with the following format: Note different ways to indicate ranges. PROC FORMAT; Value FMTAGE LOW- 12 = 'Child' 13, 14, 15, 16, 17, 18, 19 = 'TEEN' 20 - 59 = 'Adult' 60 - HIGH = 'Senior'; RUN; � Do Hands On Exercise p 102 (DFORMAT 1. SAS) 77

HANDS ON EXERCISE P 102 � Open the file DFORMAT 1. SAS Create the

HANDS ON EXERCISE P 102 � Open the file DFORMAT 1. SAS Create the FORMAT Use the FORMAT Exercise – Add a format for GENDER ($FMTGENDER) where “M”=“Male” and “F”=FEMALE and add the format to the FORMAT statement in PROC PRINT. PAUSE – Return when you’ve completed this exercise. 78

RESULTS Results in this output 79

RESULTS Results in this output 79

Assigning Formats to Many Variables � You can also assign the same format to

Assigning Formats to Many Variables � You can also assign the same format to several variables. If you have questionnaire data with variables names Q 1, QS, Q 7 where each question is coded as 0 and 1 for answers Yes and No, respectively, and you have a format called FMTYN, you could use that FORMAT in a procedure as in the following example: PROC PRINT; FORMAT Q 1 Q 5 Q 7 FMTYN. ; RUN; Assigns the same format (FMTYN) to three variables. Note the dot at the end of the assigned format (REQUIRED) 80

Format Assignments (Data Set vs PROC) � Assign formats to variables within PROC STATEMENTS

Format Assignments (Data Set vs PROC) � Assign formats to variables within PROC STATEMENTS – Example: Assigning a FORMAT in a PROC makes the format assignment only within that PROC (temporary) PROC PRINT; FORMAT GENDER $FMTGENDER. ; RUN; Assigning a FORMAT in a DATA statement makes the format � Or in DATA statements permanent in that data set. DATA MYDATA; SET OLDDATA; FORMAT GENDER $FMTGENDER. ; RUN; 81

Creating Permanent Formats � In all the previous examples, formats were applied in a

Creating Permanent Formats � In all the previous examples, formats were applied in a PROC step and are considered temporary formats. � When you assign a format in a DATA step, you can also store those formats in a (permanent) format catalog. � For example, to store an SAS format in a specified permanent library location, you could use code such as Creates a FORMAT LIBRARY PROC FORMAT LIBRARY= MYSASLIB<. name>; � In this case, the MYSASLIB refers to an SAS library location you have previously created. 82

FORMAT Libraries � For example, the code PROC FORMAT LIBRARY = MYSASLIB; VALUE FMTMARRIED

FORMAT Libraries � For example, the code PROC FORMAT LIBRARY = MYSASLIB; VALUE FMTMARRIED 0="No" 1="Yes"; VALUE $FMTGENDER "F"="Female" "M"="Male"; RUN; � creates two subfolders in the MYSASLIB. FORMATS Formats Library folder named FMTMARRIED and $FMTGENDER. 83

View Formats Folder � That is, when you create an SAS format catalog, a

View Formats Folder � That is, when you create an SAS format catalog, a folder icon appears in the designated SAS Library. � In this case, it is named FORMATS and appears in the MYSASLIB library. You can verify its existence by examining the MYSASLIB library using SAS Explorer. � If you double click on the FORMATS folder, you will see sub folders named with the names of the formats you have created. 84

Contents of a Format Folder � Click on the FMTMARRIED Formats folder to see

Contents of a Format Folder � Click on the FMTMARRIED Formats folder to see its contents – the definition of the format: 85

Tell SAS About Your Formats � Once you have created permanent formats, you can

Tell SAS About Your Formats � Once you have created permanent formats, you can use them in both PROC and DATA step statements. To tell SAS the location of a particular format, use the statement OPTIONS FMTSEARCH=(proclib); � where PROCLIB is the name of the SAS Library where your formats folder is located. 86

Using Stored SAS Formats � For example, if you have previously created and stored

Using Stored SAS Formats � For example, if you have previously created and stored the FMTMARRIED and $FMTGENDER formats in your MYSASLIB. FORMATS folder, you could use the following code to access those formats with PROC PRINT (or any PROC. ) Tells SAS where the formats are located… OPTIONS FMTSEARCH=(MYSASLIB. FORMATS); PROC PRINT DATA="C: SASDATASURVEY"; VAR SUBJECT MARRIED GENDER; FORMAT MARRIED FMTMARRIED. GENDER $FMTGENDER. ; RUN; 87

Discovering SAS Formats � To discover what formats are in a particular format library,

Discovering SAS Formats � To discover what formats are in a particular format library, you can use the PROC CATALOG procedure as shown here. This code displays all of the formats stored in the MYSASLIB. FORMATS library. PROC CATALOG = MYSASLIB. FORMATS; CONTENTS; RUN; QUIT; 88

Hands On Example p 106 (DFORMAT 3. SAS) LIBNAME MYSASLIB "C: SASDATA"; PROC FORMAT

Hands On Example p 106 (DFORMAT 3. SAS) LIBNAME MYSASLIB "C: SASDATA"; PROC FORMAT LIBRARY = MYSASLIB. CAT 1; VALUE FMTMARRIED 0="No" 1="Yes"; VALUE $FMTGENDER "F" = "Female" "M" = "Male"; RUN; 89

When Your SAS Format Library is Missing � Suppose you have a. SASB 7

When Your SAS Format Library is Missing � Suppose you have a. SASB 7 DAT file that uses created formats, but you do not have the format library? If you attempt to use that data set, you will get the following error message in the log. ERROR: The format FMTMARRIED was not found or could not be loaded. � If this occurs, you must use the following OPTIONS statement (above the code where you refer to the data set) to tell SAS to access the data set, or run the procedure without using the defined formats: OPTIONS NOFMTERR; � In this case, the output displays the raw values of the variables instead of the assigned format labels. 90

Know the Difference: Format vs Label �A common mistake is to try to use

Know the Difference: Format vs Label �A common mistake is to try to use Labels as Formats or Formats as Labels. Make sure you know the difference: �LABELS are descriptions for variables Label AGE=“AGE in 2013”; �FORMATS are description for values VALUE FMTMARRIED 0="No“ 1="Yes"; 91

4. 9 GOING DEEPER: FINDING FIRST AND LAST VALUES � Suppose that you want

4. 9 GOING DEEPER: FINDING FIRST AND LAST VALUES � Suppose that you want to identify the first and last person (ID) in each of those groups. � In an SAS DATA step, you identify the first and last values by FIRST. GP and LAST. GP, where GP is the name of the sorted grouping (or key) variable. � Do Hands on Example p 107. (DFINDFIRST. SAS) 92

4. 10 SUMMARY � This chapter discussed several techniques for preparing your data for

4. 10 SUMMARY � This chapter discussed several techniques for preparing your data for analysis. In the next chapter, we begin the discussion of SAS procedures that perform analyses on the data. � Continue to Chapter 5: Preparing to Use SAS Procedures 93

These slides are based on the book: Introduction to SAS Essentials Mastering SAS for

These slides are based on the book: Introduction to SAS Essentials Mastering SAS for Data Analytics, 2 nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: 111904216 X ISBN-13: 978 -1119042167 These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to acelliott@smu. edu. Thanks. 94