SAS SQL programming tricks for handling longitudinal datasets











- Slides: 11

SAS® SQL programming tricks for handling longitudinal datasets Hua (Josh) Weng Washington University School of Medicine

Advantages of SAS SQL • Join tables without sorting, on different matching names • Match multiple tables at different levels • Summarize data while doing queries • …… SAS user SAS Other coder • …… user • …… • Has similar syntax to other SQL.

SAS® SQL aggregate/summary functions • • • • AVG|MEAN arithmetic mean or average of values COUNT|FREQ|N number of nonmissing values CSS corrected sum of squares CV coefficient of variation (percent) MAX largest value MIN smallest value NMISS number of missing values PRT is the two-tailed p-value for Student's t statistic, T with degrees of freedom. RANGE range of values STD standard deviation STDERR standard error of the mean SUM sum of values SUMWGT sum of the WEIGHT variable values T Student's t value for testing the hypothesis that the population mean is zero USS uncorrected sum of squares VAR variance

Column-wise vs. Row-wise calculation ID Time (yr. ) 101 2012 1 2 3 101 2013 2 3 4 101 2014 3 4 5 102 2012 4 5 6 102 2013 5 6 7 102 2014 6 7 8

Example 1. To extract data for a particular visit ID Time (yr. ) 101 2012 1 1 101 2013 2 2 101 2014 3 3 102 2013 4 1 102 2013 2014 5 2 102 2014 2015 6 3 … … TASK: To isolate the data from the 2 nd visit.

Example 1 (cont. ) • Through DATA step: combing built-in keyword FIRST. and a counter statement: data example 1; set example; Vnum + 1; by ID; if first. ID then Vnum = 1; run; The third line, Vnum + 1, creates the variable Vnum and adds one to each observation as SAS processes the data step. There is an implicit retain statement in this statement. • Through SQL: combing MONOTONIC() and an aggregate function MIN(): proc sql noprint; create table example 1 as select *, MONOTONIC() MIN(MONOTONIC()) + 1 as Vnum from example group by id; ID MONOTO NIC() MIN() Vnum 101 1 101 2 101 3 102 4 4 1 102 5 4 2 102 6 4 3

Example 2. To join tables with unmatched TIME ID date_x 101 11 MAR 05 101 ID date_y 11 101 14 FEB 05 1 12 MAY 08 22 101 27 MAR 07 2 102 15 NOV 07 44 101 09 MAR 09 3 102 27 JAN 11 55 102 14 NOV 06 4 … … … 102 27 AUG 08 5 102 05 DEC 10 6 … date_y … … TASK: To join the two tables, chose y value with the nearest date. ID date_x 101 11 MAR 05 11 14 FEB 05 1 101 12 MAY 08 22 09 MAR 09 3 102 15 NOV 07 44 27 AUG 08 5 102 27 JAN 11 55 05 DEC 10 6 … … …

Example 2. (cont. ) proc sql noprint; create table x_y as select x. *, y. * from x left join y on x. id = y. id GROUP BY X. ID, DATE_X HAVING ABS(DATE_X - DATE_Y) = MIN(ABS(DATE_X - DATE_Y)); ID date_x 101 11 MAR 05 101 abs(date _x date_y) MIN(abs( date_x date_y)) ID date_y 11 101 14 FEB 05 1 25 25 11 MAR 05 11 101 27 MAR 07 2 746 25 101 11 MAR 05 11 101 09 MAR 09 3 1459 25 101 12 MAY 08 22 101 14 FEB 05 1 1183 301 12 MAY 08 22 101 27 MAR 07 2 412 301 12 MAY 08 22 101 09 MAR 09 3 301 102 … … … …

Example 3. To impute missing values with values having closest TIME ID Time 101 01 Nov 2001 555 555 101 02 Nov 2002 . 555 101 30 Oct 2003 . 555 666 101 03 Nov 2004 666 666 101 04 Nov 2005 . 666 102 01 Dec 2010 888 888 102 11 Nov 2011 . 888 102 03 Dec 2012 . 888 999 102 15 Nov 2013 . 888 999 102 08 Nov 2014 999 999 *LOCF: Last Observation Carried Forward; **NNI: Nearest Neighbor Imputation

Example 3. (cont. ) proc sql noprint; create table X_imputed as select a. *, b. x as x_new, b. visit_date as imputation_date from X_original as a, X_original (where = (x ne. )) as b where a. id = b. id group by id, visit_date having abs(a. visit_date b. visit_date) = MIN(abs(a. visit_date - b. visit_date)) abs(a. visit_da MIN(abs(a. vis ; Imputation_d ID visit_date 101 01 Nov 2001 101 ate te b. visit_date) it_date b. visit_date)) 555 01 Nov 2001 0 0 . 666 03 Nov 2004 1098 0 02 Nov 2002 . 555 01 Nov 2001 366 101 02 Nov 2002 . 666 03 Nov 2004 732 366 101 30 Oct 2003 . 555 01 Nov 2001 728 370 101 30 Oct 2003 . 666 03 Nov 2004 370 101 03 Nov 2004 . 555 01 Nov 2001 1098 0 101 03 Nov 2004 . 666 03 Nov 2004 0 0 101 04 Nov 2005 . 555 01 Nov 2001 1464 366 101 04 Nov 2005 . 666 03 Nov 2004 366

Summary