Ready To Become Really Productive Using PROC SQL

Ready To Become Really Productive Using PROC SQL? Sunil Gupta Sunil@Gupta. Programming. com Gupta Programming www. SASSavvy. com 1

DATA Step vs. PROC SQL Camps DATA Step Die-Hards Factors: First learned in SAS, First encounter of flexibility, First frustration of complexity PROC SQL Die-Hards Happy Camper – Use the best of both worlds 2

Using PROC SQL, can you identify? - at least four ways to select and create columns? - at least five ways to create macro variables? - at least four ways to subset tables? - at least three ways to subquery tables? Everyone should have a copy of the one page summary sheet. Can use as reference when coding in PROC SQL. 3

PROC SQL Summary Sheet 4

Using PROC SQL to be Productive Topics Covered Table Access, Structure and Retrieval Creating Columns and Macro Variables Useful References, Webcasts and Podcasts Topics Not Covered Table Content for Data Entry Joining Tables to combine Data 5

Four Main PROC SQL components: Columns, Joins, Conditions, Sorts proc sql; select name, sex from sashelp. class where sex = 'F' order by name; quit; Joins Columns Conditions Sorts 80% of PROC SQL’s query can be broken down to 20% of the syntax 6

13 PROC SQL Examples Example 1 Essential building block components Example 2 a, 2 b, 2 c, 2 d, 2 e Four options for selecting columns Example 3 c Creating column using summary functions Example 3 f Creating column using select-case Example 4 a, 4 b Subsetting table using calculated column or function Example 5 a, 5 b Using subquery conditions to create rows Example 6 b Creating macro variables 7

SASHELP. CLASS Data Set Obs Name Sex Age Height Weight 1 Alice F 13 56. 5 84. 0 2 Barbara F 13 65. 3 98. 0 3 Carol F 14 62. 8 102. 5 4 Jane F 12 59. 8 84. 5 5 Janet F 15 62. 5 112. 5 6 Joyce F 11 51. 3 50. 5 7 Judy F 14 64. 3 90. 0 8 Louise F 12 56. 3 77. 0 9 Mary F 15 66. 5 112. 0 10 Alfred M 14 69. 0 112. 5 11 Henry M 14 63. 5 102. 5 12 James M 12 57. 3 83. 0 13 Jeffrey M 13 62. 5 84. 0 14 John M 12 59. 0 99. 5 15 Philip M 16 72. 0 150. 0 16 Robert M 12 64. 8 128. 0 17 Ronald M 15 67. 0 133. 0 18 Thomas M 11 57. 5 85. 0 19 William M 15 66. 5 112. 0 8

Example 1: four main components: Columns, Joins, Conditions, Sort proc sql; select name from sashelp. class where sex = 'F' order by name; quit; Only One SAS statement Name Alice Barbara Carol Janet Joyce Judy Louise Mary 9

Example 2. Four options for Selecting Column Definitions a. Basic Structure b. Column Attributes such as length= c. All Columns with ‘*’ d. Distinct Columns such as distinct e. Distinct Columns without Order By 10

Example 2 a. Select name and sex for all females proc sql; select name, sex from sashelp. class where sex = 'F' order by name; quit; Multiple columns, control column and record order Name Alice Barbara Sex F F Carol Janet Joyce Judy Louise F F F Mary F 11

Example 2 b. Define attributes for name: label, format and length proc sql; select name label = 'My label' format = $10. length = 10 from sashelp. class where sex = 'F' order by name; quit; Column attributes after column name My label Alice Barbara Carol Janet Joyce Judy Louise Mary 12

Example 2 c. Select all columns in table for all females proc sql; select * from sashelp. class where sex = 'F' order by name; quit; Wildcard selects all columns Name Sex Alice F 13 56. 5 84 Barbara F 13 65. 3 98 Carol F 14 62. 8 102. 5 Jane F 12 59. 8 84. 5 Janet F 15 62. 5 112. 5 Joyce F 11 51. 3 50. 5 Judy F 14 64. 3 90 Louise F 12 56. 3 Mary F 15 66. 5 Age Height Weight 13 77 112

Example 2 c. Select all columns in table for all females proc sql; select * from sashelp. class (drop = weight) where sex = 'F' order by name; quit; Data set options are still valid Name Sex Age Height Alice F 13 56. 5 Barbara F 13 65. 3 Carol F 14 62. 8 Jane F 12 59. 8 Janet F 15 62. 5 Joyce F 11 51. 3 Judy F 14 64. 3 Louise F 12 56. 3 Mary F 15 66. 5 14

Example 2 d. Select distinct sex for all females proc sql; select distinct sex from sashelp. class where sex = 'F' order by name; quit; Selects unique combination of columns, generally order by sex also, same as distinct column name Sex F F F F F 15

Example 2 e. Select distinct sex for all females without repeats proc sql; select distinct sex from sashelp. class where sex = 'F'; quit; Sex F Excluding the order by name clause results in one record by preventing sex from being repeated for each record 16

Example 3. Eight options for Creating Column Definitions a. Functions such as int((age + 150)/10) b. Functions such as max(height, weight) c. Summary Functions such as sum(weight) d. Constant such as ‘my constant’ e. Character String Expression name || ‘, ’ || sex f. Select-Case When Condition g. Select-Case <Var_Name> When Condition h. Summary function with subset condition 17

Example 3 c 1. Select weight and percent of total weight using summary functions proc sql; Re-merging technique select weight, ((weight/sum(weight))*100) as wpercnt length = 8 format = 4. 1 Sample Output from sashelp. class; Weight wpercnt quit; 112. 5 5. 9 Once specified, new columns can be referenced with CALCULATED 84 98 4. 4 5. 1 102. 5 5. 3 18

Example 3 c 2. Select sex, weight and percent of total weight by sex proc sql; select sex, weight, sum(weight) as sum_weight length=8 format=4. , ((weight/sum(weight))*100) as wpercnt length = 8 format = 4. 1 Sample Output from sashelp. class group by sex; Sex Weight sum_weight wpercnt quit; F 90 811 11. 1 Group by prevents overall summaries F 84. 5 811 10. 4 19

Example 3 c 3. Select sex, weight and percent of total weight by sex proc sql; select sex, sum(weight) as sum_weight length=8 format=4. from sashelp. class group by sex; Output quit; Sex Only group by variable sum_weight F 811 M 1090 20

Example 3 f. Select age and new column depthead based on age values proc sql; Any valid expression select age, Required case when age > 0 and age < 13 then 1 Mutually when age between 13 and 15 then 2 exclusive condition when age > 15 then 3 Tip else. Required end as agegrp length = 8 from sashelp. class; Age agegrp Sample Output 21 14 2 quit;

Example 4. Four options for Subsetting Tables a. Calculated variable b. Function such as index() c. Direct variable d. Summary Function using having clause Option d. allows for subsetting by summary function in one step instead of multiple steps using DATA Step. 22

Example 4 a. Select age and new column depthead based on age values for depthead = 3 proc sql; Age agegrp select age, 16 3 case when age > 0 and age < 13 then 1 when age between 13 and 15 then 2 when age > 15 then 3 else. end as agegrp length = 4 from sashelp. class where calculated agegrp= 3; 23 quit;

Example 4 b. Select name, sex where name contains ‘J’ proc sql; select name, sex from sashelp. class where index(name, 'J') > 0; quit; Name James Janet Sex M F F Jeffrey M John Joyce M F Judy F 24

Example 5. Two options for Subqueries (One Column) a. Returns one value b. Returns multiple values Best Practice to apply a three part approach: 1. Subquery Results 2. Population without condition 3. Confirm Subset What do subqueries resemble? What is the difference between WHERE and HAVING clause? For multiple columns, can create a subquery table or an intermediate table and apply multiple column conditions. 25

Example 5 a. Select sex, weight where weight is greater than the average weight proc sql; Select sex, weight from sashelp. class having weight > (select m_wgt from mean_wgt); quit; 2 3 1 Result from mean_wgt table is used in outer query. Expect one value from subquery. Which column condition applied? What happens if condition is on sex? 26

Example 5 a. Select sex, weight where weight is greater than the average weight proc sql; create table mean_wgt as select avg(weight) as m_wgt from sashelp. class; select m_wgt from mean_wgt; quit; First create m_wgt column in mean_wgt table. 27

Example 5 a. Select sex, weight where weight is greater than the average weight 1. Subquery Result 100. 0263 2. Sample Population Sex F F F Weight 50. 5 77 84 84. 5 90 3. Final Subset Sex M F F M M Weight 112. 5 102. 5 112 150 128 133 112 All weights > 100 28

Example 5 b. Select age where age does not equal any female ages proc sql; Select sex, age from sashelp. class having age ~in (select distinct age from sashelp. class where sex = "F") ; quit; 2 3 1 Expect multiple values from subquery. Without a group by clause, having and where clauses provide similar results. 29

Example 5 b. Select age where age does not equal any female ages 2. Population 1. Subquery Result Sex F F F Age 11 12 13 14 15 3. Final Subset Unique male age Sex M Age 16 F F F Age 11 12 12 13 13 14 14 15 15 Sex M M M M M Age 11 12 12 13 14 14 15 15 16 30

Example 6. Five options for Creating Macro Variables 1 macro variable, 1 value a. Into : b. Into : separated by 1 macro variable, multiple values multiple macro variables, 1 value each c. Into : - : d. Summary Function into : 1 macro variable, 1 value e. Select-case into : 1 macro variable, 1 or more values Key questions: 1. Number of macro variable 2. Number of values 3. Delimiter value (, ‘, ’, ‘/’, etc. ) 31

Example 6. Create macro variable storing male names proc sql; select name as male into : male_name separated by ', ' from sashelp. class 1 macro variable, multiple where sex = 'M'; quit; %put 'Names of Males = ' &male_name; values SAS Log: (What is this order? ) Names of Males = Alfred, Henry, James, Jeffrey, John, Philip, Robert, Ronald, Thomas, William 32

Macro Variables from PROC SQL # of Values # of Macro Variables One Value Many Values* One Macro Variable Case 1 Case 2 a, 2 b Many Macros Variables Case 3 Case # Example 1 %let mcnt = 4; 2 a %let gender = F, M; 2 b %let mlst = sex, age, height, weight; 3 %let gender 1 = F; %let gender 2 = M; Application * Can process many values as one list or as each item. Proc SQL syntax example Variable Value select sex from sashelp. class where sex = “&gender 1”; Variable Name select &mlst, name from sashelp. class;

DATA Step Merge/PROC SQL Joins 34

DATA Step Merge/PROC SQL Joins 35

Summary PROC SQL components: Columns, Joins, Conditions, Sorts 1. Generally all or few columns specified. 2. Columns created towards the end instead of beginning in DATA Step. 3. Select-case syntax is Joins the only method for Columns conditional processing. 4. Syntax is easier to Conditions Sorts code and maintain since more standard structure. 36

Four Dimensions of PROC SQL over DATA Step proc sql; Group By select name, weight Subsets from sashelp. class where sex=‘F’ Summary Conditions group by sex Functions having weight > avg(weight); quit; Which females weigh more than the average weight by sex? How many steps are performed and in what order? 37

Four Dimensions of PROC SQL over DATA Step proc sql; Name Weight select name, weight Janet 112. 5 from sashelp. class Carol 102. 5 Mary 112 where sex=‘F’ Barbara 98. 0 group by sex Four Steps in One PROC SQL having weight > 1. Select female records only avg(weight); 2. Group by sex (F, M) quit; 3. Select weights > 89 (average female weight) 4. Display name and weight 38

Comparing Data Step with PROC SQL Data Step (SAS Programming) PROC SQL (Database Programming) SAS Functions, Data set options SAS Functions, COALESCE(), Data set options If-Then Statements, Case-Select Clause Do Loop, Output Joins can simulate Do Loop Space to separate variables Comma to separate variables New variable = valid expression; Valid expression AS new variable IF/Where Statements WHERE for details/HAVING for summaries Multiple SAS Statements One SAS Statement By default, includes all variables By default, excludes all variables Many-to-many merge Cartesian Product is better By default, If A or B; Full Outer By default, If A and B; Inner Join Can recycle data set names Requires new data set names, else get warning N/A Unique PROC SQL keywords

No PROC SQL Equivalent DATA Step Only Description By Group Processing FIRST (DOT) OUTPUT LAG() Ex. if FIRST. SEX then sexcount + 1; Write record to dataset Ex. OUTPUT MYDATA; Access values from previous record Ex. PREV_WEIGHT = LAG(WEIGHT); ARRAYS DO LOOPS Create and process arrays Ex. ARRAY VAR(3) $ VAR 1 – VAR 3; Create and process do-loops Ex. DO I = 1 TO 10;

Unique PROC SQL Keywords Keyword Description Creating new columns AS CALCULATED Ex. ((weight/sum(weight))*100) as wpercent Referencing new columns after being specified Ex. where calculated wgroup = 'high' DISTINCT Displaying unique combination of variables Ex. distinct patno INTO : Creating macro variables Ex. sum(weight) into : wsum CHAR / DATE / NUM Variable type when creating tables and variables Ex. create table mydata (client char(25) format=$25. label ='Client‘);

PROC SQL Mind Maps 42

PROC SQL Mind Maps 43

Summary to be PROC SQL Productive ü Four selecting column options ü Eight creating column options ü Four subsetting table options ü Two sorting options ü Three subquery options ü Five macro variable creation options Helpful to prepare for SAS Advanced Certification exam. 44

Top PROC SQL References Ø Lafler, Kirk Paul, PROC SQL Book, Papers, Tips and Techniques Webcast: http: //support. sas. com/publishing/bbu/webina r/Lafler_junewebinar. wmv Ø http: //www. sascommunity. org/wiki/Quick_Re sults_with_Proc_SQL 45

Acknowledgement SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand product names are registered trademarks or trademarks of their respective companies. 1/20/2022 46

Ready To Become Really Productive Using PROC SQL? Sunil Gupta Sunil@Gupta. Programming. com Gupta Programming www. SASSavvy. com 47
- Slides: 47