Working sideways in Stata Jakob Hjort Data Manager
Working sideways in Stata Jakob Hjort Data. Manager, MPH Department of Cardiology Aarhus University Hospital DK-8200 Aarhus Denmark 2014 Nordic and Baltic Stata Users Group Metting
The rectangular dataset
The rectangular dataset Statistics
The rectangular dataset Statistics results ”It is not the data we want it’s the ssence of data”
The rectangular dataset Datamanagement
The rectangular dataset Datamanagement
The rectangular dataset Datamanagement Statistics
The rectangular dataset Datamanagement Statistics - transpose?
The rectangular dataset – subset in matrix using mata? use ”family. dta”, clear * Dataset with: fam_name, inc_mother & inc_father mata st_view(x=0, . , (”inc_mother”, ”inc_father”)) income=colsum(x’)’ st_addvar(”long”, ”inc_household”) st_store(. , ”inc_household”, income) end list fam_name inc_mother inc_father inc_household
The direct approach generate [type] newvar=exp [if] [in] Datamanagement
The direct approach generate [type] newvar=exp [if] [in] Datamanagement Weight Height Ex. : generate BMI=Weight/Height^2 BMI
The direct approach egen [type] newvar=fcn(arguments) [if] [in] [, options] Datamanagement rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss, rowpctile, rowsd, concat, anycount, anymatch, anyvalue, count, diff, fill, group, iqr, kurt, max, mdev, mean, median, min, mode, mtr, pctile, rank, sd, seq, skew, std, tag, total
The direct approach egen [type] newvar=fcn(arguments) [if] [in] [, options] Datamanagement Ex. : egen income=rowtotal(inc*) Inc. Jan Inc. Feb Inc. Mar Inc. Apr Inc. May Inc. Jun Inc. Jul … income rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss, rowpctile, rowsd, concat, anycount, anymatch, anyvalue, count, diff, fill, group, iqr, kurt, max, mdev, mean, median, min, mode, mtr, pctile, rank, sd, seq, skew, std, tag, total
Looking under the skirts – just for inspiration viewsource _growmin. ado the rowmin() function of egen program define _growmin version 6, missing gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0 syntax varlist [if] [in] [, BY(string)] if `"`by'"' != "" { _egennoby rowmin() `"`by'"' } end tempvar touse mark `touse' `if' `in' quietly { gen `type' `g' =. tokenize `varlist' while "`1'"!="" { replace `g' = cond(`1' < `g', `1', `g') mac shift } }
Looking under the skirts – just for inspiration viewsource _growmin. ado the rowmin() function of egen program define _growmin version 6, missing gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0 syntax varlist [if] [in] [, BY(string)] if `"`by'"' != "" { _egennoby rowmin() `"`by'"' } 1. 2. 3. 4. 5. 6. tempvar touse mark `touse' `if' `in' quietly { gen `type' `g' =. tokenize `varlist' while "`1'"!="" { replace `g' = cond(`1' < `g', `1', `g') mac shift } } end 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands
Prepare the variable-list. local vars inc. Jan inc. Feb inc. Mar inc. Apr inc. May inc. Jun /// inc. Jul inc. Aug inc. Sep inc. Oct inc. Nov inc. Dec Full specification of each and every variable – OK with 12 but what in case of hundreds? The list is stored in `vars'. unab vars: inc* . unab vars: inc. Jan-inc. Dec Variables can be specified with wildcards - The expanded list is stored in `vars' (unab means unabbreviate – however the command itself can’t be un-abbreviated) . ds inc*. ds inc. Jan-inc. Dec inc. Jan inc. Feb inc. Mar inc. Apr inc. May inc. Jun inc. Jul inc. Aug inc. Sep inc. Oct inc. Nov inc. Dec Variables can be specified with wildcards - The list is stored in `r(varlist)’ Nice feature: the expanded list is shown for inspection 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands
Looping ”foreach” is the quickest and the most transparent loop command foreach lvar in inc. Jan inc. Feb { // do stuff with "`lvar'” } unab lvar: inc* foreach lvar in `lvar' { // do stuff with "`lvar'” } ds inc* foreach lvar in `r(varlist)' { // do stuff with "`lvar'” } 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands
Looping Hold + press … Left single-quote 0 9 altloop command 6 ”foreach” is the quickest and the most transparent = ` on numeric keypad foreach lvar in inc. Jan inc. Feb { // do stuff with "`lvar'” } Hold + press … alt 0 3 Right single-quote 9 = ’ on numeric keypad unab lvar: inc* foreach lvar in `lvar' { // do stuff with "`lvar'” } ds inc* foreach lvar in `r(varlist)' { // do stuff with "`lvar'” } 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands
In the loop generate minimum=. unab vars: inc* foreach lvar in `vars' { replace minimum = cond(`lvar' < minimum, `lvar’, minimum) } generate minimum=. unab vars: inc* foreach lvar in `vars' { replace minimum = `lvar’ if `lvar’<minimum } generate minimum=. unab vars: inc* foreach lvar in `vars' { if `lvar’<minimum { replace minimum = `lvar’ } } ! 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands
Some of the danish participants who might know ”the DREAM database” will propably be able to see how these approaches can be useful when working with this fantastic but difficult construction.
Thank you very much
- Slides: 21