Introduction to Multiple Imputation CFDR Workshop Series Spring

  • Slides: 20
Download presentation
Introduction to Multiple Imputation CFDR Workshop Series Spring 2008

Introduction to Multiple Imputation CFDR Workshop Series Spring 2008

Outline • • Missing data mechanisms What is Multiple Imputation? SAS Proc MI, Proc

Outline • • Missing data mechanisms What is Multiple Imputation? SAS Proc MI, Proc MIANALYZE Stata ICE, MICOMBINE SAS IVEware What’s the diff? Problems with categorical imputation 2

Missing data mechanisms • Missing Completely At Random (MCAR) – The probability of missingness

Missing data mechanisms • Missing Completely At Random (MCAR) – The probability of missingness doesn't depend on anything. • Missing At Random (MAR) – The probability of missingness does not depend on the unobserved value of the missing variable, but it can depend on any of the other variables in your dataset • Not Missing at Random (NMAR) – The probability of missingness depends on the unobserved value of the missing variable itself 3

4

4

What is Multiple Imputation? 1. Imputation • Make M=3 to 10 copies of incomplete

What is Multiple Imputation? 1. Imputation • Make M=3 to 10 copies of incomplete data set filling in with conditionally random values 2. Analyses • Of each data set separately 3. Pooling • • Point estimates. Average across M analyses Standard errors. Combine variances. 5

1. Imputation: Multiple Copies of Dataset 6

1. Imputation: Multiple Copies of Dataset 6

Three steps 1. Imputation • Make M=2 to 10 copies of incomplete data set

Three steps 1. Imputation • Make M=2 to 10 copies of incomplete data set filling in with conditionally random values 2. Analyses • Of each data set separately 3. Pooling • • Point estimates. Average across M analyses Standard errors. Combine variances. 7

What is MI? • STATA – based on each conditional density – chained equations

What is MI? • STATA – based on each conditional density – chained equations • SAS – joint distribution of all the variables – assumed multivariate normal distribution • SAS IVEware – same as Stata, more options. 8

Stata Example • ICE to impute – Regression commands may be logistic, mlogit, or

Stata Example • ICE to impute – Regression commands may be logistic, mlogit, or regress. • MICOMBINE to analyze and combine the results. – Supported regression cmds are clogit, cnreg, glm, logistic, logit, mlogit, oprobit, poisson, probit, qreg, regress, rreg, stcox, streg, or xtgee. • Easy to use, nice documentation 9

SAS example 10

SAS example 10

Step 1: Proc MI • Typical syntax: proc mi data=mi_example out=outmi seed=1234; var Oxygen

Step 1: Proc MI • Typical syntax: proc mi data=mi_example out=outmi seed=1234; var Oxygen Run. Time Run. Pulse; run; 11

Step 2: Run Models proc reg data=outmi outest=outreg covout noprint; model Oxygen = Run.

Step 2: Run Models proc reg data=outmi outest=outreg covout noprint; model Oxygen = Run. Time RUn. Pulse; by _Imputation_; run; Note that the regression output is stored as dataset “outreg” Proc’s= Reg, Logistic, Genmod, Mixed, GLM 12

Parameter Estimates & Covariance Matrices proc print data=outreg(obs=8); var _Imputation_ _Type_ _Name_ Intercept Run.

Parameter Estimates & Covariance Matrices proc print data=outreg(obs=8); var _Imputation_ _Type_ _Name_ Intercept Run. Time Run. Pulse; run; 13

Step 3. Proc Mianalyze proc mianalyze data=outreg; modeleffects Intercept Run. Time Run. Pulse; run;

Step 3. Proc Mianalyze proc mianalyze data=outreg; modeleffects Intercept Run. Time Run. Pulse; run; 14

Irritating Parameter Est. & Covariance Matrices • Syntax depends on what procedure you used

Irritating Parameter Est. & Covariance Matrices • Syntax depends on what procedure you used in previous step: • proc mianalyze data=parmcov; (or) • proc mianalyze parms=parmsdat covb=covbdat; (or) • proc mianalyze parms=parmsdat xpxi=xpxidat; PROC’s: reg, genmod, logit, mixed, glm. 15

SAS IVEware: 4 Components 1. IMPUTE -- nice options. 2. DESCRIBE estimates the population

SAS IVEware: 4 Components 1. IMPUTE -- nice options. 2. DESCRIBE estimates the population means, proportions, subgroup differences, contrasts and linear combinations of means and proportions. A Taylor Series approach is used to obtain variance estimates appropriate for a user specified complex sample design. 3. REGRESS fits linear, logistic, polytomous, Poisson, Tobit and proportional hazard regression models for data resulting from a complex sample design. 4. SASMOD allows users to take into account complex sample design features when analyzing data with several SAS procedures. SAS PROCS can be called: CALIS, CATMOD, GENMOD, LIFEREG, MIXED, NLIN, PHREG, and PROBIT. 16

IVEware Impute IMPUTE assumes the variables in the data set are one of the

IVEware Impute IMPUTE assumes the variables in the data set are one of the following five types: (1) continuous (2) binary (3) categorical (polytomous with more than two categories) (4) counts (5) mixed The types of regression models used are linear, logistic, Poisson, generalized logit or mixed logistic/linear, depending on the type of variable being imputed. 17

SAS IVEware: 4 Components 1. IMPUTE -- nice options. 2. DESCRIBE estimates the population

SAS IVEware: 4 Components 1. IMPUTE -- nice options. 2. DESCRIBE estimates the population means, proportions, subgroup differences, contrasts and linear combinations of means and proportions. A Taylor Series approach is used to obtain variance estimates appropriate for a user specified complex sample design. 3. REGRESS fits linear, logistic, polytomous, Poisson, Tobit and proportional hazard regression models for data resulting from a complex sample design. 4. SASMOD allows users to take into account complex sample design features when analyzing data with several SAS procedures. SAS PROCS can be called: CALIS, CATMOD, GENMOD, LIFEREG, MIXED, NLIN, PHREG, and PROBIT. 18

A Few Issues • Do I impute the dependent variable? • Which model has

A Few Issues • Do I impute the dependent variable? • Which model has more information? The imputation model or the analyst model? • How many imputations do I need to do? • Can I impute in one language and analyze in another? • How do I get summary statistics such as R squared? • Can I do this in SPSS? • Where do I go with questions? 19

Thanks Next up: “COLLATERAL CONSEQUENCES OF VIOLENCE IN DISADVANTAGED NEIGHBORHOODS” Dr. David Harding Wednesday,

Thanks Next up: “COLLATERAL CONSEQUENCES OF VIOLENCE IN DISADVANTAGED NEIGHBORHOODS” Dr. David Harding Wednesday, February 13, Noon - 1: 00 pm Accessing and Analyzing Add Health Data Instructor: Dr. Meredith Porter Monday, February 25, 12: 00 -1: 00 pm 20