PROC SQL Tips and Translations for Data Step

  • Slides: 24
Download presentation
PROC SQL: Tips and Translations for Data Step Users By: Gail Jorgensen Susan Marcella

PROC SQL: Tips and Translations for Data Step Users By: Gail Jorgensen Susan Marcella

AGENDA • SQL Syntax Review • Joins Translated • SQL Strengths & Uses

AGENDA • SQL Syntax Review • Joins Translated • SQL Strengths & Uses

PROC SQL: Tips and Translations for Data Step Users Syntax Proc SQL; create table/view

PROC SQL: Tips and Translations for Data Step Users Syntax Proc SQL; create table/view newdsname as select var 1, var 2, … var. N from dsname where condition ; Quit;

PROC SQL: Tips and Translations for Data Step Users JOIN vs MERGE Types of

PROC SQL: Tips and Translations for Data Step Users JOIN vs MERGE Types of JOINs • Inner Join – selects only matching records (same as: if ina and inb) • Outer Join – selects some non-matching records – Left Join – selects all records from first table, only matching records from second (same as: if ina) – Right join – selects all records from second table, only matching records from first (same as: if inb) -- Full join – selects all records from both tables (same as having no if statement)

PROC SQL: Tips and Translations for Data Step Users Inner Join idfld col 2

PROC SQL: Tips and Translations for Data Step Users Inner Join idfld col 2 Dataset DS_A 1 1 1 2 3 4 5 idno col 3 col 4 M N O P Q R Dataset DS_BB proc sql; create table ds_c as select ds_a 1. *, ds_b. * from ds_a 1, ds_b where ds_a 1. idfld = ds_b. idno; quit; idfld col 2 1 1 2 2 4 5 M N O O Q R idno col 3 col 4 1 1 2 2 4 5 X X X Y Z Z C C D F 1 2 2 4 5 7 X X Y Z Z Z C D F data c; merge ds_a 1(in=ina) ds_b(in=inb rename=(idno=idfld)); by idfld; if ina and inb; run; idfld col 2 1 1 2 2 4 5 M N O O Q R col 3 col 4 X X X Y Z Z C C D F

PROC SQL: Tips and Translations for Data Step Users Left Join idfld col 3

PROC SQL: Tips and Translations for Data Step Users Left Join idfld col 3 idfld col 2 col 5 Dataset inf_a 1 1 2 3 4 5 M N O P Q R Dataset inf_b A D J K N proc sql; create table sql_left as select a. *, b. * from inf_a as a left join inf_b as b on a. idfld = b. idfld; quit; Dataset sql_left idfld col 2 col 5 col 3 col 4 1 1 2 2 3 4 5 N M O O P Q R D A J K N X X Y X Z Z C C F D 1 2 2 4 5 7 X X Y Z Z Z col 4 C D F data ds_left; merge inf_a(in=ina) inf_b(in=inb); by idfld; if ina ; run; Dataset ds_left idfld col 2 col 5 col 3 col 4 1 1 2 2 3 4 5 N M O O P Q R D A J K N X X Y X Z Z C C F D

PROC SQL: Tips and Translations for Data Step Users Right Join proc sql; create

PROC SQL: Tips and Translations for Data Step Users Right Join proc sql; create table sql_right as select a. *, b. * from inf_a as a right join inf_b as b on a. idfld = b. idfld; quit; idfld col 2 col 5 col 3 col 4 1 1 2 2 4 5 N M O O Q R. D A K N X X Y X Z Z Z C C F D data ds_right; merge inf_a(in=ina) inf_b(in=inb); by idfld; if inb; run; idfld col 2 col 5 col 3 col 4 1 M A X C 1 N D X C 2 O X D 2 O Y F 4 Q K Z 5 R N Z 7 Z

PROC SQL: Tips and Translations for Data Step Users Full Join CList 07 Obs

PROC SQL: Tips and Translations for Data Step Users Full Join CList 07 Obs name CList 08 recd sent 1 Amanda yes no 2 Gabi yes 3 Jan yes 4 Jim no yes 5 Pam no no proc sql; create table sql_clist as select c 7. name, c 7. recd as recd 07, c 8. recd as recd 08, c 7. sent as sent 07, c 8. sent as sent 08 from clist 07 as c 7 full join clist 08 as c 8 on c 7. name=c 8. name; quit; Obs name recd sent 1 Alison yes 2 Jan yes 3 Pam no no 4 Tom yes proc sort data=clist 07; by name; run; proc sort data=clist 08; by name; run; data_clist; merge clist 07 clist 08 (rename=(recd=recd 08 sent=sent 08)); by name; run;

PROC SQL: Tips and Translations for Data Step Users Full Join (Con’t) Sql_CList Obs

PROC SQL: Tips and Translations for Data Step Users Full Join (Con’t) Sql_CList Obs Name recd 07 1 Data_CList recd 08 sent 07 yes sent 08 yes Obs name 1 Alison recd sent 2 Amanda yes no 3 Gabi yes yes 4 Jan yes yes 5 Jim no yes 6 Pam no no yes 7 Tom 7 yes yes no recd 08 sent 08 yes yes no no yes

PROC SQL: Tips and Translations for Data Step Users Handling Duplicate Variable Names To

PROC SQL: Tips and Translations for Data Step Users Handling Duplicate Variable Names To always select the variable from one dataset: – Drop unwanted version of variable (PROC SQL permits all SAS dataset options) – Select variable from specific table To keep variable from both tables: – Rename the variable in one dataset To select variable based on value: – Use CASE statement

PROC SQL: Tips and Translations for Data Step Users CASE Statement proc sql; create

PROC SQL: Tips and Translations for Data Step Users CASE Statement proc sql; create table New. CList as select case when missing(c 7. name) then c 8. name else c 7. name end as name, c 7. recd as recd 07, c 8. recd as recd 08, c 7. sent as sent 07, c 8. sent as sent 08 from clist 07 as c 7 full join clist 08 as c 8 on c 7. name=c 8. name; quit; proc sort data=clist 07; by name; run; proc sort data=clist 08; by name; run; data_clist; merge clist 07 clist 08 (rename=(recd=recd 08 sent=sent 08)); by name; run;

PROC SQL: Tips and Translations for Data Step Users CASE Statement - Results Obs

PROC SQL: Tips and Translations for Data Step Users CASE Statement - Results Obs name recd 07 recd 08 sent 07 1 Alison 2 Amanda yes no 3 Gabi yes 4 Jan yes 5 Jim no 6 Pam no 7 Tom yes sent 08 yes yes no no yes

PROC SQL: Tips and Translations for Data Step Users Down Calculations PROC SORT data=shs.

PROC SQL: Tips and Translations for Data Step Users Down Calculations PROC SORT data=shs. exposure; by subject_id; run; DATA counters(KEEP=Table. Name MAXOBS TOTOBS); SET shs. exposure END=LAST; BY subject_id; length Table. Name $ 50; RETAIN MAXOBS OBSCNTR TOTOBS 0; Table. Name=“exposure"; TOTOBS+1; OBSCNTR+1; IF LAST. subject_id THEN DO; IF MAXOBS < OBSCNTR THEN MAXOBS=OBSCNTR; OBSCNTR=0; END; IF LAST THEN OUTPUT chemcnts; label maxobs='Maximum number of obs person' totobs='Total Number obs in table'; run; proc print data=counters; run; Obs 1 Table. Name exposure MAXOBS TOTOBS 14 2124

PROC SQL: Tips and Translations for Data Step Users Down Calculations sqlcounter proc sql;

PROC SQL: Tips and Translations for Data Step Users Down Calculations sqlcounter proc sql; create table sqlcounter as select distinct subject_id, count(*) as subjcnt from fshs. exposure group by subject_id; select “exposure" as Table. Name, max(subjcnt) as Max. Obs, sum(subjcnt) as Tot. Obs from sqlcounter; quit; Table. Name exposure Obs subject_id subjcnt 1 178 7 2 306 1 3 307 1 4 308 1 5 326 3 6 330 1 7 331 1 8 332 1 Max. Obs Tot. Obs 14 2124

PROC SQL: Tips and Translations for Data Step Users Counts and Nesting Queries Genre

PROC SQL: Tips and Translations for Data Step Users Counts and Nesting Queries Genre proc sql; select distinct genre, count(*) from itunes group by genre; quit; proc sql outobs=1; select (select count(*) from itunes) as Total. Songs, (select count(distinct genre) from itunes) as Genre. Cnt, (select count(distinct artist) from itunes) as Artist. Cnt, (select count(distinct album) from itunes) as Album. Cnt from itunes; quit; Total. Songs Genre. Cnt Artist. Cnt Album. Cnt 802 23 160 100 Alternative 9 Bluegrass 43 Blues 14 Children's Music 62 Christian & Gospel 88 Classical 74 Country 77 Easy Listening 31 Electronic 1 Folk 16 General Folk 18 Gospel & Religious 40 Hip Hop/Rap 2 Holiday 13 Inspirational 70

PROC SQL: Tips and Translations for Data Step User Dictionaries proc sql; create view

PROC SQL: Tips and Translations for Data Step User Dictionaries proc sql; create view detail as select * from dictionary. columns ; create view extern as select * from dictionary. members ; create view tbl as select * from dictionary. tables ; create view gotem as select trim(libname) as Lib. Name, trim(memname) as Table. Name, trim(name) as Col. Name, label as Col. Label from sashelp. vcolumn ; quit; SAS

PROC SQL: Tips and Translations for Data Step User Dictionaries – Getting variable names

PROC SQL: Tips and Translations for Data Step User Dictionaries – Getting variable names proc sql; /* get names of all variables you want */ select name into : drinkvars separated by ', ' from dictionary. columns where libname=‘AUG' and memname='DEMOG' and lowcase(name) contains ‘ndrk'; /* use your newly created macro variable in your select statement */ create table drinks as select &drinkvars from aug. demog; quit;

PROC SQL: Tips and Translations for Data Step User Dictionaries – Getting variable names

PROC SQL: Tips and Translations for Data Step User Dictionaries – Getting variable names proc sql; /* add the table alias to the front of each variable name as you create your macro variable */ select 'd. '||name into : aliasvars separated by ', ' from dictionary. columns where libname='AUG' and memname='DEMOG' and lowcase(name) contains ('ndrk'); /* do your merge or whatever using the macro variable you just created */ create table newtable as select &aliasvars, c. expcategory from aug. demog as d left join aug. exposure as c on d. jcml_id=c. jcml_id; quit;

PROC SQL: Tips and Translations for Data Step User Views • • • Views

PROC SQL: Tips and Translations for Data Step User Views • • • Views are ‘virtual tables’ Created with CREATE VIEW statement Can be used as if they are normal physical tables Enhance security – can construct a view of only fields and rows that user is allowed to view Enhance ease-of-use – Can combine rows and columns from multiple tables into a single view Facilitate data integrity – Can have several views on the same table, but only have to update the base table – Users always see up-to-date data proc sql; create view aug. testview as select d. subject_id, d. case_id, d. age, e. job_num, e. exposure_element from aug. demog as d, aug. exposure as e where d. subject_id=e. subject_id; quit;

PROC SQL: Tips and Translations for Data Step Users Creating Data Source Indicators dads

PROC SQL: Tips and Translations for Data Step Users Creating Data Source Indicators dads Obs 1 2 3 4 famid name inc 2 Art 22000 1 Bill 30000 3 Paul 25000 4 Karl 95000 faminc proc sql; create table sql_fj as select *, (dads. famid=faminc. famid) as indic, (dads. famid ~=. ) as dadind, (faminc. famid ~=. ) as famind, coalesce(dads. famid, faminc. famid) as fid from dads full join faminc on dads. famid=faminc. famid; quit; Obs 1 2 3 4 5 famid faminc 96 faminc 97 faminc 98 3 75000 76000 77000 1 40000 40500 41000 2 45000 45400 45800 5 55000 65000 70000 6 22000 24000 28000 proc sort data=dads out=sorted_dads; by famid; run; proc sort data=faminc out=sorted_faminc; by famid; run; data ds_fj; merge sorted_dads(in=in 1) sorted_faminc(in=in 2); by famid; if in 1 and in 2 then indic=1; else indic=0; dadind=in 1; famind=in 2; fid=famid; run;

PROC SQL: Tips and Translations for Data Step Users Full Join - cont Sql_fj

PROC SQL: Tips and Translations for Data Step Users Full Join - cont Sql_fj Ds_fj Obs 1 famid name inc faminc 96 faminc 97 faminc 98 indic dadind famind fid 1 Bill 30000 40500 41000 1 1 2 2 Art 22000 45400 45800 1 1 1 2 3 3 Paul 25000 76000 77000 1 1 1 3 4 4 Karl 95000 . . . 0 1 0 4 5 . . 55000 65000 70000 0 0 1 5 6 . . 22000 24000 28000 0 0 1 6 Obs 1 famid name inc faminc 96 indic dadind famind fid 1 Bill 30000 40000 faminc 97 faminc 98 40500 41000 1 1 2 2 Art 22000 45400 45800 1 1 1 2 3 3 Paul 25000 76000 77000 1 1 1 3 4 4 Karl 95000 . . . 0 1 0 4 5 5 . 55000 65000 70000 0 0 1 5 6 6 . 22000 24000 28000 0 0 1 6

PROC SQL: Tips and Translations for Data Step Users Additional Uses Scenario: For a

PROC SQL: Tips and Translations for Data Step Users Additional Uses Scenario: For a case/control study, verify that all controls have age within 5 to 10 years of the related case age. proc sql; title 'Bad Control Matches'; select c. subject_id, c. casenum, c. gender, age as Cntl. Age label='Cntl. Age', (select age from cases where subject_id=c. casenum) as Case. Age, abs(cntlage-calculated caseage) as Age. Diff from controls as c left join demog as d on c. subject_id=d. subject_id where (not within 5 and not within 10); quit;

PROC SQL: Tips and Translations for Data Step User Merging Multiple Tables Obs name

PROC SQL: Tips and Translations for Data Step User Merging Multiple Tables Obs name 1 Mary proc sql; create table sql_c 3 as select a. name, b. class, case when missing(c. grade 1) then "missing 1“ when missing(c. grade 2) then "missing 2“ when missing(c. grade 3) then "missing 3“ when missing(c. grade 4) then "missing 4" else "none missing“ end as miss_grade from indat_a as a, indat_b as b, indat_c as c where c. classid=b. classid and c. perid=a. perid; quit; class miss_grade Art none missing 2 Olive Art none missing 3 Quincy Art none missing 4 Nat Art none missing 5 Pat Art none missing 6 Quincy 7 Richard Music missing 4 Music none missing 8 Mary Math none missing 9 Nat Math none missing 10 Olive Math none missing 11 Pat Math none missing 12 Quincy 13 Richard Math missing 2 Math none missing 14 Mary English none missing 15 Nat English none missing 16 Olive English none missing 17 Pat English none missing 18 Quincy English none missing 19 Richard English none missing

SAS and all other SAS Institute Inc. product or service names are registered trademarks

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.