SAS Data Step Combining Data sets Statistical Analysis
SAS Data Step Combining Data sets Statistical Analysis System- Class-05
Combine Data sets in SAS Many applications require input data to be in a specific format before the data can be processed to produce meaningful results. The data typically comes from multiple sources and might be in different formats. Therefore, we have to take intermediate steps to logically relate and process data before you can analyze it or create reports from it. Once we have determined what we want the output to look like, we must Ø Determine how the input data is related Ø Ensure that the data is properly sorted or indexed, if necessary Ø Select the appropriate access method to process the input data
Methods For Combining SAS Data Sets Concatenating : combines two or more SAS data sets, one after the other, into a single SAS data set. Interleaving : combines individual, sorted SAS data sets into one sorted SAS data set. Merging : combines observations from two or more SAS data sets into a single observation in a new data set. Updating : Updating a SAS data set replaces the values of variables in one data set (the master data set) with values from another data set (the transaction data set). Modifying : Modifying a SAS data set replaces, deletes, or appends observations in an existing data set.
Concatenating SAS Data Sets Concatenating combines two or more SAS data sets, one after the other, into a single data set. The number of observations in the new data set is the sum of the number of observations in the original data sets. We can concatenate SAS data sets by using one of the following methods: • the SET statement in a DATA step The SET statement reads observations from one or more SAS data sets and uses them to build a new data set. • the APPEND procedure The APPEND procedure adds the observations from one SAS data set to the end of another SAS data set
SET statement & APPEND Procedure DATA 1 Year data combined; set data 1 data 2; run; 2008 2009 2010 2011 2012 DATA 2 Year COMBINED YEAR 2008 2009 2010 2011 2012
Interleaving SAS Data Sets § We can interleave data sets by using a SET statement and a BY statement in a DATA step. The number of observations in the new data set is the sum of the number of observations in the original data sets § The BY statement specifies the variable or variables by which you want to interleave the data sets. Ø BY variable specifies a variable that is named in a BY statement and by which the data is sorted or needs to be sorted. Ø BY value specifies the value of a BY variable.
Interleaving SAS Data Sets (contd. ) Ø BY group : specifies the set of all observations with the same value DATA 1 Year 2007 2008 2009 2010 2011 for a BY variable DATA 2 Year DATA COMBINED; SET DATA 1 DATA 2; BY YEAR; RUN; COMBINED YEAR 2008 2007 2009 2008 2010 2008 2011 2009 2012 2009 2010 2011 2012
Merging SAS Data Sets § Merging combines observations from two or more SAS data sets into a single observation in a new SAS data set. We can merge data sets using the MERGE statement in a DATA step. § We can merge the data sets in two ways: Ø one-to-one merging In this process of merge We do not use a BY statement. Observations are combined based on their positions in the input data sets. Ømatch merging In this process of merge we use a BY statement to combine observations from the input data sets based on common values of the variable by which you merge the data sets.
One-to-One Merging We can use the MERGE statement without a BY statement. In this case SAS combines the first observation in all data sets you name in the MERGE statement into the first observation in the new data set, the second observation in all data sets into the second observation in the new data set, and so on. Example:
Match-Merging § Merging with a BY statement enables you to match observations according to the values of the BY variables that you specify. Before you can perform a match-merge, all data sets must be sorted by the variables that you want to use for the merge Example:
Updating SAS Data Sets § When we update, we have to work with two SAS data sets. The data set that contains the original information is the master data set. The data set that contains the new information is the transaction data set. § In a DATA step, the UPDATE statement reads observations from the transaction data set and updates corresponding observations (observations with the same value of all BY variables) from the master data set. Example: data mail_newlist; update mail_list mail_trans; by Subscriber. Id; run;
Updating SAS Data Sets Example:
Modifying SAS Data Sets § We can use the MODIFY statement to replace all values for a specific variable or variables in a data set. § The MODIFY statement replaces data in a master data set with data from a transaction data set, and makes the changes in the original master data set. § You can use a BY statement to match observations from the transaction data set with observations in the master data set Example: 13 data inventory_tool; modify inventory_tool; price=price+(price*. 15); run;
Modifying SAS Data Sets 14 - Example
Thank You http: //mainframes-online-training. weebly. com/ Polsani Anil Kumar
- Slides: 15