PROC SURVEYCORR Jessica Hampton CCSU New Britain CT
PROC SURVEYCORR Jessica Hampton CCSU, New Britain, CT September 2013
Introduction
Medical Expenditures Panel Survey (MEPS) • • • Administered annually by the U. S. Department of Health and Human Services since 1996 Agency for Healthcare Research and Quality (ARHQ) Anonymity protected by removing individual identifiers from the public data files MEPS 2010 consolidated data file released September 2012 Multiple components (household, insurance/employer, and medical provider). Household component (1, 911 variables) covers the following topics: • Demographics • Household income • Employment • Diagnosed health conditions • Additional health status issues • Medical expenditures and utilization • Satisfaction with and access to care • Insurance coverage 18, 692 after excluding out of scope, negative person weights, under 18 and 65+ U. S. civilian, noninstitutionalized population ~3% out of scope (birth/adoption, death, incarceration, living abroad) 3
MEPS Survey Design Methods • • MEPS is a representative but NOT a random sample of the population Person weights must be used to produce reliable population estimates Stratification: • By demographic variables such as age, race, sex, income, etc. • Goal is to maximize homogeneity within and heterogeneity between strata • Sometimes used to oversample certain groups under-represented in the general population or with interesting characteristics relevant to study • For example: blacks, Hispanics, and low-income households Clustering: • By geography in order to reduce survey costs -- not feasible or costeffective to do a random sample of the entire population of the U. S. • Within-cluster correlation underestimates variance/error -- two families in the same neighborhood are more likely to be similar demographically (for example, similar income) • Desire clusters spatially close for cost effectiveness but as heterogeneous within as possible for reasonable variance. • Multi-stage clustering used in MEPS: • sample of counties >> sample of blocks >> individuals/households surveyed from block sample 4
Survey Design Considerations • • • If person weights are ignored and one tries to generalize sample findings to the entire population, total numbers, percentages, or means are inflated for the groups that are oversampled and underestimated for others In regression analysis, ignoring person weights leads to biased coefficient estimates If sampling strata and cluster variables are ignored, means and coefficient estimates are unaffected, but standard error (or population variance) may be underestimated; that is, the reliability of an estimate may be overestimated Or when comparing one estimated population mean to another, the difference may appear to be statistically significant when it is not (Machlin, S. , Yu, W. , & Zodet, M. , 2005) 5
SAS Survey Procedures
SAS Survey Procedures • • Intended for use with sample designs that may include unequal person weights, clustering, and stratification. PROC SURVEYMEANS estimates population totals, percentages, and means. Includes estimated variance, confidence intervals, and descriptive statistics. PROC SURVEYFREQ produces frequency tables, population estimates, percentages, and standard error. PROC SURVEYREG estimates regression coefficients by generalized least squares. PROC SURVEYLOGISTIC fits logistic regression models for discrete response (categorical) survey data by maximum likelihood. PROC SURVEYMEANS and PROC SURVEYREG available starting with SAS version 8. PROC SURVEYFREQ and PROC SURVEYLOGISTIC available starting with version 9. PROC SURVEYSELECT for sampling which will not be used in this project 7
PROC SURVEYMEANS Syntax PROC SURVEYMEANS DATA=PQI. MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT 10 F; DOMAIN INSCOV 10; VAR TOTEXP 10 TOTSLF 10; RUN; 8
PROC SURVEYMEANS Output 9
PROC SURVEYFREQ Syntax PROC SURVEYFREQ DATA=PQI. MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT 10 F; TABLES PRIEU 10 PRING 10 INSCOV 10; RUN; 10
PROC SURVEYFREQ Output 11
PROC SURVEYREG Syntax PROC SURVEYREG DATA=PQI. MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT 10 F; MODEL &TARGET=&&VAR&I /SOLUTION; ODS OUTPUT PARAMETERESTIMATES=PARAMETER_EST FITSTATISTICS=FIT; RUN; 12
PROC SURVEYLOGISTIC Syntax PROC SURVEYLOGISTIC DATA=SASUSER. MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT 10 F; MODEL TOTEXP_HIGH(EVENT='1')=AGE 10 X MARRIED--HISPANX POVLEV 10 --PHYACT 53 OBESE--ADSMOK 42 ADINSA 42 --LOCATN_ER; ODS OUTPUT PARAMETERESTIMATES=WORK. PARAM; RUN; 13
PROC SURVEYLOGISTIC/REG Output Default output (similar to PROC LOGISTIC and PROC REG): • • fit statistics (AIC, Schwartz’s criterion, R-square) chi-squared tests of the global null hypothesis degrees of freedom coefficient estimates standard error of coefficient estimates and p-values odds ratio point estimates 95% Wald confidence intervals Does not include: • Option for stepwise selection • chi-squared test of residuals/tabled residuals (assumptions of normality and equal variance do not apply) • influential obs/outliers (person weights) 14
PROC SURVEYCORR
Correlations • • Three approaches Unweighted PROC CORR with person weights “PROC SURVEYCORR” macro with PROC SURVEYREG: • Uses all survey design variables (strata/cluster/weight) • Iteratively runs simple regression models for each predictor variable • Builds table with r-squared, r, and p-values • Sorted by r Similar results for all three approaches PROC CORR output unwieldy with large # of predictor variables PROC CORR cannot use strata and cluster variables 16
PROC CORR DATA=PQI. MEPS_2010 PLOTS=MATRIX RANK; VAR AGE 10 X WAGEP 10 X TTLP 10 X FAMINC 10 POVLEV 10 TOTSLF 10 ERTEXP 10 ERTOT 10 RXEXP 10 OPTOTV 10 OBVEXP 10 OBTOTV 10 IPTEXP 10 IPNGTD 10; WITH TOTEXP 10; WEIGHT PERWT 10 F; RUN; 17
Step 1: PROC SURVEYCORR PROC SQL; SELECT NVAR INTO : NVAR FROM DICTIONARY. TABLES WHERE LIBNAME='PQI' AND MEMNAME='MEPS_2010'; QUIT; • SQL dictionary tables used to select # of predictor variables in the dataset and store in macro variable. • Note: Data set names stored in dictionary tables using all caps. • # of predictor variables (nvar) = # of iterations SAS will use in DO LOOP later on in the program. 18
Step 2: PROC SURVEYCORR PROC CONTENTS DATA=PQI. MEPS_2010 OUT=CONTENTS NOPRINT; RUN; PROC SQL NOPRINT; SELECT NAME INTO: VAR 1 -: VAR 76 FROM WORK. CONTENTS; QUIT; • PROC CONTENTS used to obtain a list of predictor variable names • List of variable names stored as macro variables using PROC SQL SELECT INTO statement: 19
Step 3: PROC SURVEYCORR PROC SQL; CREATE TABLE SURVEYCORR (PARAMETER CHAR(15), R_SQUARE CHAR(8), R NUM(8), PROBT NUM(8)); QUIT; • Create empty table to store data • Output from PROC SURVEYREG will be inserted one row at a time 20
Step 4: PROC SURVEYCORR %MACRO CORR(TARGET=); PROC SURVEYREG DATA=PQI. MEPS_2010; STRATA VARSTR; CLUSTER VARPSU; WEIGHT PERWT 10 F; MODEL &TARGET=&&VAR&I /SOLUTION; ODS OUTPUT PARAMETERESTIMATES=PARAMETER_EST FITSTATISTICS=FIT; RUN; • First part of macro • PROC SURVEYREG uses survey design variables in strata, cluster, and weight statements • Optional ODS OUTPUT statement stores parameter estimates, fit statistics, and other information created when the model runs 21
Step 5: PROC SURVEYCORR PROC SQL; INSERT INTO SURVEYCORR SELECT PARAMETER , CVALUE 1 AS R_SQUARE , SIGN(ESTIMATE)* SQRT(INPUT(CVALUE 1, 8. )) AS R , PROBT AS PVALUE FROM FIT , PARAMETER_EST WHERE LABEL 1 = "R-SQUARE" AND PARAMETER = "&&VAR&I"; QUIT; %MEND CORR; • • • R-square value extracted from Fit. Statistics output with PROC SQL P-value and sign of estimated regression coefficient from Parameter. Estimates Square root function to get correlation coefficient Sign of regression coefficient = direction of correlation (-/+) with target Target variable input as a parameter when the macro is called 22
Step 6: PROC SURVEYCORR %MACRO LOOP; %DO I=1 %TO &NVAR; %CORR(TARGET=PUBAT 10 X); %END; %MEND LOOP; • • Call the macro Input desired target variable as parameter Iterate for each predictor variable (NVAR times) Each time macro is run, new row inserted in table SURVEYCORR 23
Step 7: PROC SURVEYCORR PROC SQL; CREATE TABLE PQI. SURVEYCORR AS SELECT PARAMETER , R_SQUARE , R FORMAT BEST 6. 4 , PROBT AS PVALUE FORMAT PVALUE 6. 4 , CASE WHEN PROBT <=0. 05 THEN "YES" ELSE "NO" END AS SIGNIFICANT_95 FROM SURVEYCORR WHERE PARAMETER NOT IN ('DUPERSID', 'VARSTR', 'VARPSU', 'PERWT 10 F') ORDER BY ABS(R) DESC; QUIT; • Use PROC SQL to: • Format results • Sort by correlation size • Exclude survey design variables from tabulated output 24
PROC SURVEYCORR Output parameter r-square r p-value significance (95% C. L. ) TOTEXP 10 1. 000 <0. 0001 yes IPTEXP 10 0. 687 0. 829 <0. 0001 yes TOTEXP_HIGH 0. 287 0. 536 <0. 0001 yes IPNGTD 10 0. 270 0. 520 <0. 0001 yes OBVEXP 10 0. 228 0. 477 <0. 0001 yes RXEXP 10 0. 206 0. 454 <0. 0001 yes OBTOTV 10 0. 158 0. 398 <0. 0001 yes OPTEXP 10 0. 121 0. 348 <0. 0001 yes TOTSLF 10 0. 116 0. 340 <0. 0001 yes ADAPPT 42 0. 089 0. 298 <0. 0001 yes 25
Conclusions
Recommendations/Conclusions • Only 4 SAS Survey Procedures • No PROC SURVEYCORR • Person weights, but • No strata/cluster variables • Significance level (p values) may be less accurate with complex survey designs • Iterative approach with PROC SURVEYREG • Can get r and p for large # of predictor variables • Output tabled and ranked • For categorical variables: • Either reformat to numeric first • Or use CLASS statement in PROC SURVEYREG 27
References
References • • • Carrington, W. J. , Eltinge, J. L. , & Mc. Cue, K. (2000). An Economist’s Primer on Survey Samples. Working Paper no. 00 -15. Suitland, MD: Center for Economic Studies, U. S. Bureau of the Census, October 2000. Retrieved from ftp: //tigerline. census. gov/ces/wp/2000/CES-WP-00 -15. pdf January 15, 2013. Cohen, J. W. , & Rhoades, J. A. (2009). Group and Non-Group Private Health Insurance Coverage, 1996 to 2007: Estimates for the U. S. Civilian Noninstitutionalized Population under Age 65. Medical Expenditure Panel Survey (MEPS) Statistical Brief #267. Agency for Healthcare Research and Quality, Rockville, MD. Retrieved from http: //meps. ahrq. gov/data_files/publications/st 267/stat 267. pdf Di. Julio, B. , & Claxton, G. (2010). Comparison of Expenditures in Nongroup and Employer. Sponsored Insurance: 2004 -2007. Kaiser Family Foundation, Menlo Park, CA. Retrieved from http: //www. kff. org/insurance/snapshot/chcm 111006 oth. cfm Kaiser Family Foundation (2008). How Non-Group Health Coverage Varies with Income. Menlo Park, CA. Retrieved from http: //www. kff. org/insurance/upload/7737. pdf Machlin, S. , & Yu, W. (2005). MEPS Sample Persons In-Scope for Part of the Year: Identification and Analytic Considerations. April 2005. Agency for Healthcare Research and Quality, Rockville, MD. Retrieved from http: //www. meps. ahrq. gov /survey_comp/hc_survey/hc_sample. shtml 29
References (continued) • • • Machlin, S. , Yu, W. , & Zodet, M. (2005). Computing Standard Errors for MEPS Estimates. January 2005. Agency for Healthcare Research and Quality, Rockville, Md. Retrieved from http: //www. meps. ahrq. gov/survey_comp/standard_errors. jsp Medical Expenditure Panel Survey (MEPS). (2012). MEPS HC-138: 2010 Full Year Consolidated Data File. Rockville, MD: Agency for Healthcare Research and Quality (AHRQ), September 2012. Retrieved from http: //meps. ahrq. gov/data_stats/download_data/pufs/h 138 doc. pdf September 27, 2012. Medical Expenditure Panel Survey (MEPS). (2012). MEPS HC-138: 2010 Full Year Consolidated Data Codebook. Rockville, MD: Agency for Healthcare Research and Quality (AHRQ), August 30, 2012. Retrieved from http: //meps. ahrq. gov/mepsweb/data_stats/download_data_files_codebook. jsp? PUFId=H 13 8 September 27, 2012. Medical Expenditure Panel Survey (MEPS). MEPS-HC Panel Design and Collection Process. Agency for Healthcare Research and Quality, Rockville, Md. Retrieved from http: //www. meps. ahrq. gov/survey_comp/hc_data_collection. jsp Medical Expenditure Panel Survey (MEPS). Data Use Agreement. Agency for Healthcare Research and Quality, Rockville, Md. Retrieved from http: //meps. ahrq. gov/mepsweb/data_stats/data_use. jsp 30
References (continued) • • • O’Neill, J. , & O’Neill, D. (2009). Who are the uninsured? An Analysis of America’s Uninsured Population, Their Characteristics, and Their Health. Employment Policies Institute, Washington, D. C. SAS Institute Inc. (2008). SAS/STAT 9. 2 User’s Guide. Chapter 14: Introduction to Survey Sampling and Analysis Procedures. Pp. 259 -270. Cary, NC: SAS Institute Inc. Retrieved from http: //support. sas. com/documentation/cdl/en/statugsurveysamp/61762/PDF/default/statugs urveysamp. pdf on January 15, 2013. Trish, E. , Damico, A. , Claxton, G. , Levitt, L. , & Garfield, R. (2011). A Profile of Health Insurance Exchange Enrollees. Kaiser Family Foundation, Menlo Park, CA. Retrieved from http: //www. kff. org/healthreform/upload/8147. pdf 31
- Slides: 31