Computing for Research I Spring 2013 Stata Graphics

  • Slides: 27
Download presentation
Computing for Research I Spring 2013 Stata Graphics February 12 Primary Instructor: Elizabeth Garrett-Mayer

Computing for Research I Spring 2013 Stata Graphics February 12 Primary Instructor: Elizabeth Garrett-Mayer

Basic syntax for commands • prefix: command varlist, options • Examples: – regress y

Basic syntax for commands • prefix: command varlist, options • Examples: – regress y x, level(90) – by race: sum y x, detail – ttest y, by(x) unequal

Stata Graphics • Maybe we can just end class now! • Check out these

Stata Graphics • Maybe we can just end class now! • Check out these links: – http: //www. ats. ucla. edu/stata/library/Graph Examples/default. htm – http: //www. ats. ucla. edu/stata/topics/graphi cs. htm – http: //data. princeton. edu/stata/graphics. html – http: //www. stata. com/capabilities/graphics. html

Basic univariate displays • • Boxplots Stem and leaf Histograms Density plots

Basic univariate displays • • Boxplots Stem and leaf Histograms Density plots

Ceramide Data • • Let’s look at the ceramide markers What are their distributions?

Ceramide Data • • Let’s look at the ceramide markers What are their distributions? Are there outliers? Should we consider taking logs, or using % change? Results of a phase II trial of gemcitabine plus doxorubicin in patients with recurrent head and neck cancers: serum C₁₈-ceramide as a novel biomarker for monitoring response. Saddoughi SA, Garrett-Mayer E, Chaudhary U, O'Brien PE, Afrin LB, Day TA, Gillespie MB, Sharma AK, Wilhoit CS, Bostick R, Senkal CE, Hannun YA, Bielawski J, Simon GR, Shirai K, Ogretmen B. Clin Cancer Res. 2011 Sep 15; 17(18): 6097 -105. Epub 2011 Jul 26.

Histogram • hist c 18

Histogram • hist c 18

Let’s make it prettier * prettier histograms hist c 18 , freq xaxis(1 2)

Let’s make it prettier * prettier histograms hist c 18 , freq xaxis(1 2) ylabel(0(2)24) xlabel(20 "Twenty" 40 "Forty") hist c 18, title("Histogram of C 18 Ceramide") subtitle("PI: K. Shirai") hist c 18, ytitle("number of patients") freq yline(0(10)20) hist c 18, xaxis(1 2) xlabel(19. 6 "mean" 11. 9 "median", axis(2) grid) finding help on these can sometimes be tricky! e. g. help axis_choice_options

Boxplots • graph box c 18

Boxplots • graph box c 18

Boxplots graph box c 18, by(cycle) graph box c 18, over(cycle) tab cycle graph

Boxplots graph box c 18, by(cycle) graph box c 18, over(cycle) tab cycle graph box c 18 if cycle<7, over(cycle) sort patient cycle merge m: 1 patient using "Ptdata. Gem. Dox. dta" graph box c 18 if cycle<7, over(cycle) over(gender) graph hbox c 18, over(initial) capsize(5)

graph hbox c 18, over(initial) capsize(5) graph hbox c 18, over(initial) medtype(marker)medmarker(msymbol(+) msize(large)) graph

graph hbox c 18, over(initial) capsize(5) graph hbox c 18, over(initial) medtype(marker)medmarker(msymbol(+) msize(large)) graph hbox c 18, over(initial) ytitle(“C 18”)

Labels • Sometimes xlabels cannot be applied (e. g. boxplots) • need to label

Labels • Sometimes xlabels cannot be applied (e. g. boxplots) • need to label your values • Example: cycle for boxplots – label define cycle 1 "cycle 1" 3 "cycle 3" 5 "cycle 5" 7 "cycle 7" – label values cycle – graph box c 18 if cycle<7, over(cycle) • (Hint: use this on the homework!)

Stem and Leaf. stem c 18 Stem-and-leaf plot for c 18 ceramide (C 18

Stem and Leaf. stem c 18 Stem-and-leaf plot for c 18 ceramide (C 18 ceramide) c 18 ceramide rounded to nearest multiple of. 1 plot in units of. 1 0** 1** 2** 3** 4** 5** 6** | | | | 42, 43, 44, 46 57, 67, 81, 89, 90, 96, 98, 99 01, 06, 08, 14, 15, 19, 20, 35, 44 62 03, 15, 16, 18, 19, 22 82 17 23, 49 58, 68 37 86

Dotplot • Excellent way to show data across groups when you have a relatively

Dotplot • Excellent way to show data across groups when you have a relatively small dataset • dotplot y, over(group) dotplot dotplot c 18, c 18, over(cycle) over(gender) nogroup jitter(3) over(gender) nogroup median center

Dotplot, by gender

Dotplot, by gender

Scatterplots • Two way graph • Syntax: – graph twoway scatter y x 1

Scatterplots • Two way graph • Syntax: – graph twoway scatter y x 1 x 2 – graph twoway scatter y x 1 • Example: – graph twoway scatter c 18 totalceramide

Regression example • • Scatterplot Residual plots Leverage Fitted line with raw data

Regression example • • Scatterplot Residual plots Leverage Fitted line with raw data

Code graph twoway scatter c 18 totalcer regress c 18 totalcer * residual plot

Code graph twoway scatter c 18 totalcer regress c 18 totalcer * residual plot * (residual vs. fitted) rvfplot * the long way * 1. generate a new variable from the regression, residuals predict resid, res * 2. generate a new variable from the regression, fitted values predict fit scatter res fit, yline(0) * leverage vs. residual plot lvr 2 plot * take transform of C 18? gladder c 18 boxcox c 18 * generate new variable gen logc 18=log(c 18) scatter logc 18 totalcer, mlabel(gender) s(i) scatter logc 18 totalcer, s(Oh) * redo regression regress logc 18 totalcer rvfplot, yline(0) lvr 2 plot predict logfit * make plot of fitted model and raw data scatter logfit logc 18 totalcer, s(i o) c(l. ) graph twoway scatter logfit totalcer, s(i) c(l) || scatter logc 18 totalcer, s(o) c(. )

The next graph to create

The next graph to create

Fancier way to put regression lines * data is described at http: //data. princeton.

Fancier way to put regression lines * data is described at http: //data. princeton. edu/wws 509/datasets / infile str 14 country setting effort change /// using http: //data. princeton. edu/wws 509/datasets/effort. raw graph twoway scatter change setting (scatter change setting ) (lfit change setting ) (scatter change setting ) (qfit change setting ) (scatter change setting ) (lfitci change setting ) • /// “continuation” comment • scatter makes a scatterplot of the two variables • lfit plots the regression line of y on x • qfit plots a fitted quadratic model of y on x • lfitci plots the line AND a confidence interval!

Fancier way to put regression lines Plot using qfit Plot using lfitci

Fancier way to put regression lines Plot using qfit Plot using lfitci

graph twoway (lfitci change setting) • • • (scatter change setting, mlabel(country) ) One

graph twoway (lfitci change setting) • • • (scatter change setting, mlabel(country) ) One slight problem with the labels is the overlap of Costa Rica and Trinidad Tobago (and to a lesser extent Panama and Nicaragua). We can solve this problem by specifying the position of the label relative to the marker using a 12 -hour clock (so 12 is above, 3 is to the right, 6 is below and 9 is to the left) and the mlabv() option. We create a variable to hold the position set by default to 3 o'clock and then move Costa Rica to 9 o'clock and Trinidad Tobago to just a bit above that at 11 o'clock (we can also move Nicaragua and Panama up a bit, say to 2 o'clock).

gen pos=3 replace pos = 11 if country == "Trinidad. Tobago" replace pos =

gen pos=3 replace pos = 11 if country == "Trinidad. Tobago" replace pos = 9 if country == "Costa. Rica" replace pos = 2 if country == "Panama" | country == "Nicaragua“ graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) * see ‘marker_label_options’ in help

Legends graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) ///

Legends graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(pos(5) order(2 "linear fit" 1 "95% CI")) graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(off) * see help ‘title_options’ for pos and ring in legend

Spaghetti plots Command available from UCLA: spagplot * spaghetti plots clear insheet using "I:

Spaghetti plots Command available from UCLA: spagplot * spaghetti plots clear insheet using "I: MUSC OncologyShirai, KeisukeOctober 2010ceramide. csv" findit spagplot c 18 cycle, id(patient) nofit * remove patients who only have cycle=1 sort patient cycle by patient: gen visit=_n egen maxvis=max(visit), by(patient) spagplot c 18 cycle if maxvis>1, id(patient) nofit * or, use c(L) graph twoway scatter c 18 cycle if maxvis>1, c(L) help connectstyle

other neat stuff • graph matrix • saving graphs: click and save as desired

other neat stuff • graph matrix • saving graphs: click and save as desired format • saving and combining (see princeton site, section 3. 3) – http: //data. princeton. edu/stata/graphics. html • See Graph. Examples on ucla site: – http: //www. ats. ucla. edu/stata/library/Graph. Examples/