SAS SQL programming tricks for handling longitudinal datasets

  • Slides: 11
Download presentation
SAS® SQL programming tricks for handling longitudinal datasets Hua (Josh) Weng Washington University School

SAS® SQL programming tricks for handling longitudinal datasets Hua (Josh) Weng Washington University School of Medicine

Advantages of SAS SQL • Join tables without sorting, on different matching names •

Advantages of SAS SQL • Join tables without sorting, on different matching names • Match multiple tables at different levels • Summarize data while doing queries • …… SAS user SAS Other coder • …… user • …… • Has similar syntax to other SQL.

SAS® SQL aggregate/summary functions • • • • AVG|MEAN arithmetic mean or average of

SAS® SQL aggregate/summary functions • • • • AVG|MEAN arithmetic mean or average of values COUNT|FREQ|N number of nonmissing values CSS corrected sum of squares CV coefficient of variation (percent) MAX largest value MIN smallest value NMISS number of missing values PRT is the two-tailed p-value for Student's t statistic, T with degrees of freedom. RANGE range of values STD standard deviation STDERR standard error of the mean SUM sum of values SUMWGT sum of the WEIGHT variable values T Student's t value for testing the hypothesis that the population mean is zero USS uncorrected sum of squares VAR variance

Column-wise vs. Row-wise calculation ID Time (yr. ) 101 2012 1 2 3 101

Column-wise vs. Row-wise calculation ID Time (yr. ) 101 2012 1 2 3 101 2013 2 3 4 101 2014 3 4 5 102 2012 4 5 6 102 2013 5 6 7 102 2014 6 7 8

Example 1. To extract data for a particular visit ID Time (yr. ) 101

Example 1. To extract data for a particular visit ID Time (yr. ) 101 2012 1 1 101 2013 2 2 101 2014 3 3 102 2013 4 1 102 2013 2014 5 2 102 2014 2015 6 3 … … TASK: To isolate the data from the 2 nd visit.

Example 1 (cont. ) • Through DATA step: combing built-in keyword FIRST. and a

Example 1 (cont. ) • Through DATA step: combing built-in keyword FIRST. and a counter statement: data example 1; set example; Vnum + 1; by ID; if first. ID then Vnum = 1; run; The third line, Vnum + 1, creates the variable Vnum and adds one to each observation as SAS processes the data step. There is an implicit retain statement in this statement. • Through SQL: combing MONOTONIC() and an aggregate function MIN(): proc sql noprint; create table example 1 as select *, MONOTONIC() MIN(MONOTONIC()) + 1 as Vnum from example group by id; ID MONOTO NIC() MIN() Vnum 101 1 101 2 101 3 102 4 4 1 102 5 4 2 102 6 4 3

Example 2. To join tables with unmatched TIME ID date_x 101 11 MAR 05

Example 2. To join tables with unmatched TIME ID date_x 101 11 MAR 05 101 ID date_y 11 101 14 FEB 05 1 12 MAY 08 22 101 27 MAR 07 2 102 15 NOV 07 44 101 09 MAR 09 3 102 27 JAN 11 55 102 14 NOV 06 4 … … … 102 27 AUG 08 5 102 05 DEC 10 6 … date_y … … TASK: To join the two tables, chose y value with the nearest date. ID date_x 101 11 MAR 05 11 14 FEB 05 1 101 12 MAY 08 22 09 MAR 09 3 102 15 NOV 07 44 27 AUG 08 5 102 27 JAN 11 55 05 DEC 10 6 … … …

Example 2. (cont. ) proc sql noprint; create table x_y as select x. *,

Example 2. (cont. ) proc sql noprint; create table x_y as select x. *, y. * from x left join y on x. id = y. id GROUP BY X. ID, DATE_X HAVING ABS(DATE_X - DATE_Y) = MIN(ABS(DATE_X - DATE_Y)); ID date_x 101 11 MAR 05 101 abs(date _x date_y) MIN(abs( date_x date_y)) ID date_y 11 101 14 FEB 05 1 25 25 11 MAR 05 11 101 27 MAR 07 2 746 25 101 11 MAR 05 11 101 09 MAR 09 3 1459 25 101 12 MAY 08 22 101 14 FEB 05 1 1183 301 12 MAY 08 22 101 27 MAR 07 2 412 301 12 MAY 08 22 101 09 MAR 09 3 301 102 … … … …

Example 3. To impute missing values with values having closest TIME ID Time 101

Example 3. To impute missing values with values having closest TIME ID Time 101 01 Nov 2001 555 555 101 02 Nov 2002 . 555 101 30 Oct 2003 . 555 666 101 03 Nov 2004 666 666 101 04 Nov 2005 . 666 102 01 Dec 2010 888 888 102 11 Nov 2011 . 888 102 03 Dec 2012 . 888 999 102 15 Nov 2013 . 888 999 102 08 Nov 2014 999 999 *LOCF: Last Observation Carried Forward; **NNI: Nearest Neighbor Imputation

Example 3. (cont. ) proc sql noprint; create table X_imputed as select a. *,

Example 3. (cont. ) proc sql noprint; create table X_imputed as select a. *, b. x as x_new, b. visit_date as imputation_date from X_original as a, X_original (where = (x ne. )) as b where a. id = b. id group by id, visit_date having abs(a. visit_date b. visit_date) = MIN(abs(a. visit_date - b. visit_date)) abs(a. visit_da MIN(abs(a. vis ; Imputation_d ID visit_date 101 01 Nov 2001 101 ate te b. visit_date) it_date b. visit_date)) 555 01 Nov 2001 0 0 . 666 03 Nov 2004 1098 0 02 Nov 2002 . 555 01 Nov 2001 366 101 02 Nov 2002 . 666 03 Nov 2004 732 366 101 30 Oct 2003 . 555 01 Nov 2001 728 370 101 30 Oct 2003 . 666 03 Nov 2004 370 101 03 Nov 2004 . 555 01 Nov 2001 1098 0 101 03 Nov 2004 . 666 03 Nov 2004 0 0 101 04 Nov 2005 . 555 01 Nov 2001 1464 366 101 04 Nov 2005 . 666 03 Nov 2004 366

Summary

Summary