Lecture 6 Regression Tables Other Programming Topics By
Lecture 6: Regression Tables & Other Programming Topics By: Kevin Baier 1
Lecture Summary 2
Topics Covered Estout Tables Outreg 2 Deduping Observations Macros and Looping One-Way Tab Advanced Options 3
Estout Tables 4
What is estout? Estout is a command to export and display regression results in editables The estout command can be used immediately after a regression command or using the stored results of the regression After a regression use: estimates store name This stores the regression results to whatever you call “name” 5
estout Command Generically: estout [what] [using filename] [, options] “what” can be: Stored results with name A matrix Results stored in e() or r() “filename” is some file address and name, typically an excel file “options” are many and deeply-layered We’ll only cover the basics but looking over the “help estout” file shows just how customizable estout is 6
estout Command Options Parameter statistics: “cells(element [(subopts)] [element [(subopts)]]…) Common elements: “b”: coefficient “se”: standard error “t”: t or z stat “p”: p-value “ci”: confidence interval Common suboptions “fmt(format)”: specifies display format of each element “label(string)”: define a label for element “star”: attach significance stars 7
estout Command Example reg incp_all i. wbhaom i. educ i. female estimates store slide 8 estout slide 8 using "D: UsersKevinDropboxCAPSTATA CourseLecturesLecture 6lecture 6 table. xls", cells("b(star fmt(%7. 0 g) label(Coefficient)) se(label(Standard Error)) t p(fmt(%5. 0 g) label(P-value)) ci(label(Confidence Interval))") drop(1. wbhaom 1. educ 0. female) replace This tells STATA to export to excel the coefficient, standard error, t-stat, p-value, and confidence interval of the regression in line 1 The double quotes tell STATA to put these elements in adjacent columns Without the double quotes, STATA would put these elements in adjacent rows There are some other options, significance stars and formats, used here 8
estout Command Example, cont. 9
estout Command Example, cont. Take some time to go over the estout help file There are TONS of options and sub-options that allow for so much customization I only gave you all a taste so as not to overwhelm you The “Labeling” options provide a lot of ways to visually alter your tables These tables can go in papers and more 10
Outreg 2 Command 11
What is outreg 2? Outreg 2 is very similar to estout in that it exports regression tables in paper-ready formats The main difference is outreg 2 is a bit more “plug-and- play” in that the base specification already does a lot of stuff without a whole lot of customization Furthermore, while estout can export summary statistics, outreg 2, I think, is a bit easier to use for that purpose 12
outreg 2 Command Generically: outreg 2 [varlist] [estlist] using filename [, options] [: command] As with estout, outreg 2, while a bit more ready-to-go out of the box, has a lot of options and customization features “varlist” specifies variables to export “estlist” specifies stored estimates (see estimates store command) We’ll ignore the “: command” for now Important “options” “replace”: ensures do-file runs smoothly “excel”: exports table to excel “title(String)”: gives the table a title “ctitle(String)”: gives estimates column a title 13
outreg 2 Command, cont. outreg 2 slide 8 using "D: UsersKevinDropboxCAPSTATA CourseLecturesLecture 6lecture 6 table 2. xls", replace excel title(Outreg 2 Table Output) ctitle(Effect on Income) This tells STATA to output our “slide 8” estimates Notice again that there are very few options/customizations other than the titles 14
outreg 2 Command, cont. Notice that this just has our coefficients, standard errors, and significance stars We can create the same exact thing using estout but this does it more quickly and efficiently For basic regression output tables, outreg 2 is my recommended option 15
outreg 2 Command, cont. Outreg 2 is also incredibly useful and easy for exporting summary statistics; for example, those generated by the summarize command The command does not change much outreg 2 using "D: UsersKevinDropboxCAPSTATA CourseLecturesLecture 6lecture 6 table 3. xls", replace excel title(Summary Statistics) keep(incp_all wbhaom educ female) sum(detail) eqkeep(N mean p 50 sd min max) Notice we add the “keep(varlist)”, “sum(type)”, and “eqkeep(elements)” options and drop the estimates name and column title option “keep” tells STATA what variables to include in the output “sum” tells STATA whether to just do a basic or detailed summarize “eqkeep” tells STATA what elements of summarize to keep (e. g. mean) 16
outreg 2 Command, cont. 17
Deduping Observations 18
When to dedupe? It’s possible that after several rounds of data cleaning that some dataset you have may contain duplicate observations Sometimes there are many units of analysis in a particular dataset and some units may be elements of a larger unit and thus would create duplicates of the larger unit In the consumer expenditure survey (CES), there is both the consumer-unit (CU) and household (HH) units of analysis and it’s possible to have multiple CUs in the same HH 19
When to dedupe? In cases where you have duplicate observations, you’ll likely want to get rid of (drop) these duplicates so as to not adversely affect your analysis Before deduping, you should have clear justification about why you are deduping and that the observation your dropping contains no unique information that should be kept Make sure to use your cleaning commands to extract any of this important information 20
Deduping Techniques Using the duplicates command Generically: Example: duplicates drop varlist [if] [in], force duplicates drop hhid, force Using the seq() function of egen Generically: bysort varlist: egen [type] newvar=seq() [if] [in] [, options] drop if newvar>1 Example: bysort hhid: egen seq=seq() drop if seq>1 21
Deduping Techniques, cont. Both techniques accomplish the same thing with the seq function technically taking two steps I prefer the seq function approach as the duplicates command does not give each duplicate observation a unique count number whereas the seq function does (this could be important in some data work) You can use the following command to find out how many duplicates per id variable(s) exist but not marking the first unique one: duplicates tag [varlist] [if] [in] , generate(newvar) Remember that you can use more than one variable to identify duplicates Think of a business with multiple locations: each unique location can be defined by its name and address 22
Deduping Techniques, cont. Example: duplicates tag hhid, generate(dupflag) 23 Example: bysort hhid: egen seq=seq()
Deduping Techniques, cont. From the preceding tables we can see that the seq function provides us with an accurate count of how many unique values there are The duplicates tag command essentially has those 66 k observations buried across many total duplicate values The duplicates drop command is most useful if there are no subtleties in your duplicate observations Duplicates are pretty much straight copy-and-paste jobs 24
Macros and Looping 25
What are these? Macros are basically a pre-defined series of values Example: “a b c d e f g h i j k” could be a macro Looping is the repeated execution of some STATA command(s) over a series of values (could be macro values, could be a variable list, could be a sequence of numbers) Macros and looping help make your programming efficient and less taxing on your do-file “real estate” This is a very deep subject so we’ll only be covering the basics 26
Macros Although there are many types of macros, we are going to stick with using global macros for now Generically: global macroname [=exp | : extended_fcn | [`]"[string]"['] ] “macroname” is some name you give to your macro and can be just about anything “exp” can be any of the expressions we’ve discussed so far in this course and can also be just a listing of values A listing of values does not require the equals sign (=) nor double quotes (“) Example: global macro 1 incp_ern incp_wag incp_uer This macro just happens to be a list of variables 27
Macros, cont. Generically: global macroname [=exp | : extended_fcn | [`]"[string]"['] ] “: extended_fcn”: we’re going to ignore this for now “[`]”[string]”[‘]]”: use the `”string”’ notation for macro values that are some discontinuous string Example: global macro 2 `”Investment banker”’ `”Regular banker”’ `”Currency banker”’ You can always enclose a macro in double quotes and it should work just fine Example: global macro 3 “center left right” 28
levelsof Command The levelsof command creates a macro from all the values (whether string or not) of a variable For example: suppose a variable called job_title contained two values, “King’s Guard” and “Night’s Watchman” The levelsof command would create a macro with the values `”King’s Guard”’ and `”Night’s Watchman”’ Generically: levelsof varname [if] [in], local(localmacroname) There are more options but for now we’ll disregard them “local(localmacroname)” designates that the macro be a local macro and have a name of localmacroname 29
levelsof Command, cont. Example: levelsof educ, local(educmacro) We see that the macro is just a sequence of numbers given that educ is a numeric variable 30
Looping is another one of those topics that is like a rabbit- hole of which we only will cover the basics We are going to stick with three types of loops: foreach with a variable list foreach with macro values forvalues with a range of numbers foreach/forvalues execute over each element of the command, macro, list, whatever 31
foreach Command w/Macros Generically: foreach loopname of globalmacroname { commands } Braces, “{ | }”, must be specified The open brace, “{“, must appear on the same line as the foreach Nothing may follow the open brace except comments The first command must appear on a new line The close brace, “}”, must appear on a line by itself 32
foreach Command w/Macros, cont. global lst incp_ern incp_wag incp_se foreach x of global lst { gen `x'_flag=1 if `x'>0 & `x'!=. } This tells STATA to generate a “flag” variable equal to 1 if the element of the macro “lst” is greater than zero and not missing Notice the `x’: this tells STATA to use the macro element here x could be anything you want `narsil’ `heartsbane’ 33
foreach Command w/Macros, cont. global lst incp_ern incp_wag incp_se foreach x of global lst { gen `x'_flag=1 if `x'>0 & `x'!=. } The “lst” is the name of our global macro “x” is our loopname and must always be enclosed in `loopname’ in our commands inside the braces 34
foreach Command w/Variable List Generically: foreach loopname of varlist { commands } This foreach command allows you to tell STATA to do some command(s) over a list of variables Our “dash” rule still applies here All the rules of foreach already discussed apply here For this command, you do not need to make or specify a macro 35
foreach Command w/Variable List, cont. Example: foreach var of varlist hrearn-incp_se { replace `var'=0 if `var'==. } This tells STATA to replace missings with 0 for all the variables between and including hrearn and incp_se Notice that “var” is our “loopname” and is enclosed like `var’ in the commands where it is necessary 36
forvalues Command w/Consecutive Values Generically: forvalues loopname=range { commands } Forvalues is best for looping over consecutive numbers range is “i (d) j”: meaning i to j in steps of d “i/j”: meaning i to j in steps of 1 “i t to j”: meaning i to j in steps of t – i “i t: j”: meaning i to j in steps of t – i 37
forvalues Command w/Consecutive Values Example: forvalues i=1/100 { replace incp_all=. if incp_all==`i' } This tells STATA to replace all values from 1 to 100 of income to missing Notice again that the loopname is “i“ and that it is enclosed as such, `i‘, when being used in the commands Remember this loopname can be anything All the looping statements have endless applicability and are something to be familiar with moving forward 38
One-Way Tab Advanced Options 39
Returning to Tab (again) Remember our generic one-way tabulation: tabulate varname [if] [in] [weight] [, tabulate 1_options] The “, generate(stubname)” is very useful for creating indicator variables from categorical variables The stubname refers to the prefix, or stub, given to each new indicator variable Each new indicator variable is named by the following: stubname. X where “X” is the category number (the categories themselves do not have to be numeric values) 40
tab Command w/Looping & Advanced Options Over this course, we’ve talked about lots of categorical variables so we’ll focus on three: race, education, and citizenship status Let’s say we want to create dummy variables for each category of each variable: foreach var of varlist wbhaom educ citstat { tab `var', generate(`var'_) } Our stubname here is the variable of varlist followed by an underscore, “_” 41
42
43
- Slides: 43