Stata and logit recap Topics Introduction to Stata

  • Slides: 23
Download presentation
Stata and logit recap

Stata and logit recap

Topics • Introduction to Stata – Files / directories – Stata syntax – Useful

Topics • Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions • Logistic regression analysis with Stata – Estimation – Goodness Of Fit – Coefficients – Checking assumptions

Overview of Stata commands • Note: we did this interactively for the larger part

Overview of Stata commands • Note: we did this interactively for the larger part …

Stata file types • . ado – programs that add commands to Stata •

Stata file types • . ado – programs that add commands to Stata • . do – Batch files that execute a set of Stata commands • . dta – Data file in Stata’s format • . log – Output saved as plain text by the log using command (you could add. txt as well)

The working directory • The working directory is the default directory for any file

The working directory • The working directory is the default directory for any file operations such as using & saving data, or logging output cd “d: my work”

Saving output to log files • Syntax for the log command log using [filename],

Saving output to log files • Syntax for the log command log using [filename], replace text • To close a log file log close

Using and saving datasets • Load a Stata dataset use d: myprojectdata. dta, clear

Using and saving datasets • Load a Stata dataset use d: myprojectdata. dta, clear • Save save d: myprojectdata, replace • Using change directory cd d: myproject use data, clear save data, replace

Entering data • Data in other formats – You can use SPSS to convert

Entering data • Data in other formats – You can use SPSS to convert data that can be read with Stata. Unfortunately, not the other way around (anymore) – You can use the infile and insheet commands to import data in ASCII format – Direct import and export of Excel files in Stata is possible too • Entering data by hand (don’t do this …) – Type edit or just click on the data-editor button

Do-files • You can create a text file that contains a series of commands.

Do-files • You can create a text file that contains a series of commands. It is the equivalent of SPSS syntax (but way easier to memorize) • Use the do-file editor to work with do-files

Adding comments in do-files • // or * denote comments stata should ignore •

Adding comments in do-files • // or * denote comments stata should ignore • Stata ignores whatever follows after /// and treats the next line as a continuation • Example II

A recommended template for do-files capture log close set more off cd d: myproject

A recommended template for do-files capture log close set more off cd d: myproject //if a log file is open, close it, otherwise disregard //dont'pause when output scrolls off the page //change directory to your working directory log using myfile, replace text //log results to file myfile. log … here you put the rest of your Stata commands … log close //close the log file

Serious data analysis • Ensure replicability use do+log files • Document your do-files –

Serious data analysis • Ensure replicability use do+log files • Document your do-files – What is obvious today, is baffling in six months • Keep a research log – Diary that includes a description of every program you run • Develop a system for naming files

Serious data analysis • New variables should be given new names • Use variable

Serious data analysis • New variables should be given new names • Use variable labels and notes (I don’t like value labels though) • Double check every new variable • ARCHIVE

Stata syntax examples

Stata syntax examples

Stata syntax example regress y x 1 x 2 if x 3<20, cluster(x 4)

Stata syntax example regress y x 1 x 2 if x 3<20, cluster(x 4) 1. regress = command – What action do you want to performed 2. y x 1 x 2 = Names of variables, files or other objects – On what things is the command performed 3. if x 3 <20 = Qualifier on observations – On which observations should the command be performed 4. , cluster(x 4) = Options appear behind the comma – What special things should be done in executing the command

More examples tabulate smoking race if agemother>30, row More elaborate if-statements: sum agemother if

More examples tabulate smoking race if agemother>30, row More elaborate if-statements: sum agemother if smoking==1 & weightmother<100

Elements used for logical statements Operator Definition Example == is equal in value to

Elements used for logical statements Operator Definition Example == is equal in value to if male == 1 != not equal in value to if male !=1 > greater than if age > 20 >= greater than or equal to if age >=21 < less than if age < 66 <= less than or equal to if age <=65 & and if age==21 & male==1 | or if age<=21 | age>=65

Missing values • Automatically excluded when Stata fits models (same as in SPSS); they

Missing values • Automatically excluded when Stata fits models (same as in SPSS); they are stored as the largest positive values • Beware!! – The expression “age>65” can thus also include missing values (these are also larger than 65) – To be sure type: “age>65 & age!=. ”

Selecting observations drop [variable list] keep [variable list] drop if age<65 Note: they are

Selecting observations drop [variable list] keep [variable list] drop if age<65 Note: they are then gone forever. This is not SPSS’s [filter] command.

Creating new variables Generating new variables generate age 2 = age*age (for more complicated

Creating new variables Generating new variables generate age 2 = age*age (for more complicated functions, there also exists a command “egen”, as we will see later)

Useful functions Function Definition Example + addition gen y = a+b - subtraction gen

Useful functions Function Definition Example + addition gen y = a+b - subtraction gen y = a-b / Division gen density=population/area * Multiplication gen y = a*b ^ Take to a power gen y = a^3 ln Natural log gen lnwage = ln(wage) exponential gen y = exp(b) sqrt Square root gen agesqrt = sqrt(age)

Replace command • replace has the same syntax as generate but is used to

Replace command • replace has the same syntax as generate but is used to change values of a variable that already exists gen age_dum 5 =. replace age_dum 5 = 0 if age < 5 replace age_dum 5 = 1 if age >=5

Recode • Change values of existing variables – Change 1 to 2 and 3

Recode • Change values of existing variables – Change 1 to 2 and 3 to 4 in origvar, and call the new variable myvar 1: recode origvar (1=2)(3=4), gen(myvar 1) – Change 1’s to missings in origvar, and call the new variable myvar 2: recode origvar (1=. ), gen(myvar 2)