Multilevel Modeling Raul CruzCano HLTH 653 Spring 2013

  • Slides: 59
Download presentation
Multilevel Modeling Raul Cruz-Cano, HLTH 653 Spring 2013 1

Multilevel Modeling Raul Cruz-Cano, HLTH 653 Spring 2013 1

Multilevel Question • • Turns out the Simple Random Sampling is very expensive Travel

Multilevel Question • • Turns out the Simple Random Sampling is very expensive Travel to Moscow, Idaho to give survey to a single student. The subsets are conventionally called primary sampling units or psu's. In a two-stage sample, rst a sample is drawn from the primary sampling units (the rst-stage sample), and within each psu included in the rst-stage sample, a sample of population elements is drawn (the second-stage sample). This can be extended to situations with more than two levels, e. g. , individuals within households within municipalities, and then is called a multistage Raul Cruz-Cano, HLTH 653 sample. Spring 2013 2

These are examples of two-level data structures, but extensions to multiple levels n are

These are examples of two-level data structures, but extensions to multiple levels n are possible: 10 cities ->In each city: 5 schools ->In each school: 2 classes ->In each class: 5 students ->Each student given the test twice n Raul Cruz-Cano, HLTH 653 Spring 2013 3

What is Multilevel or Hierarchical Linear Modeling? Nested Data Structures Raul Cruz-Cano, HLTH 653

What is Multilevel or Hierarchical Linear Modeling? Nested Data Structures Raul Cruz-Cano, HLTH 653 Spring 2013 4

Individuals Undivided Unit of Analysis = Individuals Raul Cruz-Cano, HLTH 653 Spring 2013 5

Individuals Undivided Unit of Analysis = Individuals Raul Cruz-Cano, HLTH 653 Spring 2013 5

Individuals Nested Within Groups Unit of Analysis = Individuals + Classes Raul Cruz-Cano, HLTH

Individuals Nested Within Groups Unit of Analysis = Individuals + Classes Raul Cruz-Cano, HLTH 653 Spring 2013 6

… and Further Nested Unit of Analysis = Individuals + Classes + Schools Raul

… and Further Nested Unit of Analysis = Individuals + Classes + Schools Raul Cruz-Cano, HLTH 653 Spring 2013 7

Examples of Multilevel Data Structures n n n Neighborhoods are nested within communities Families

Examples of Multilevel Data Structures n n n Neighborhoods are nested within communities Families are nested within neighborhoods Children are nested within families Raul Cruz-Cano, HLTH 653 Spring 2013 8

Examples of Multilevel Data Structures n Schools are nested within districts n Classes are

Examples of Multilevel Data Structures n Schools are nested within districts n Classes are nested within schools n Students are nested within classes Raul Cruz-Cano, HLTH 653 Spring 2013 9

Multilevel Data Structures Level 4 District (l) Level 3 School (k) Level 2 Class

Multilevel Data Structures Level 4 District (l) Level 3 School (k) Level 2 Class (j) Level 1 Student (i) Raul Cruz-Cano, HLTH 653 Spring 2013 10

2 nd Type of Nesting n Repeated Measures Nested Within Individuals Focus = Change

2 nd Type of Nesting n Repeated Measures Nested Within Individuals Focus = Change or Growth Raul Cruz-Cano, HLTH 653 Spring 2013 11

Time Points Nested Within Individuals Raul Cruz-Cano, HLTH 653 Spring 2013 12

Time Points Nested Within Individuals Raul Cruz-Cano, HLTH 653 Spring 2013 12

Nested Data n n Data nested within a group tend to be more alike

Nested Data n n Data nested within a group tend to be more alike than data from individuals selected at random. Nature of group dynamics will tend to exert an effect on individuals. Raul Cruz-Cano, HLTH 653 Spring 2013 13

Multilevel Modeling Seems New But…. Extension of General Linear Modeling Simple Linear Regression Multiple

Multilevel Modeling Seems New But…. Extension of General Linear Modeling Simple Linear Regression Multiple Linear Regression ANOVA ANCOVA Repeated Measures ANOVA Raul Cruz-Cano, HLTH 653 Spring 2013 14

Why Multilevel Modeling vs. Traditional Approaches? Traditional Approaches – 1 -Level 1. 2. Individual

Why Multilevel Modeling vs. Traditional Approaches? Traditional Approaches – 1 -Level 1. 2. Individual level analysis (ignore group) Group level analysis (aggregate data and ignore individuals) Raul Cruz-Cano, HLTH 653 Spring 2013 15

Problems with Traditional Approaches 1. Individual level analysis (ignore group) Violation of independence of

Problems with Traditional Approaches 1. Individual level analysis (ignore group) Violation of independence of data assumption leading to misestimated standard errors (standard errors are smaller than they should be). Raul Cruz-Cano, HLTH 653 Spring 2013 16

Problems with Traditional Approaches 1. Group level analysis (aggregate data and ignore individuals) Aggregation

Problems with Traditional Approaches 1. Group level analysis (aggregate data and ignore individuals) Aggregation bias = the meaning of a variable at Level-1 (e. g. , individual level SES) may not be the same as the meaning at Level-2 (e. g. , school level SES) Raul Cruz-Cano, HLTH 653 Spring 2013 17

Example: Patient SBP 1 2 3 4 5 6 7 8 9 10 11

Example: Patient SBP 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Before DBP 210 169 187 160 167 176 185 206 173 146 174 201 198 148 154 SBP 130 122 124 104 112 101 124 115 102 98 119 106 107 100 After DBP 201 165 166 157 145 168 180 147 136 151 168 179 129 131 Paired t-test: the average change in DBP is significantly different from zero (p = 0. 000951) 125 121 106 101 85 98 105 103 98 90 98 110 103 82 Unpaired t-test: the average change in DBP is significantly different from zero (p = 0. 036) Raul Cruz-Cano, HLTH 653 Spring 2013 18

“Multilevel” Approach n n 2 or more levels can be considered simultaneously Can analyze

“Multilevel” Approach n n 2 or more levels can be considered simultaneously Can analyze within- and betweengroup variability Raul Cruz-Cano, HLTH 653 Spring 2013 19

How Many Levels Are Usually Examined? 2 or 3 levels very common 15 students

How Many Levels Are Usually Examined? 2 or 3 levels very common 15 students x 10 classes x 10 schools = 1, 500 Raul Cruz-Cano, HLTH 653 Spring 2013 20

Types of Outcomes n n n Continuous Scale (Achievement, Attitudes) Binary (pass/fail) Categorical with

Types of Outcomes n n n Continuous Scale (Achievement, Attitudes) Binary (pass/fail) Categorical with 3 + categories Raul Cruz-Cano, HLTH 653 Spring 2013 21

Effect for estimation of a mean n if the sample is a two-stage sample

Effect for estimation of a mean n if the sample is a two-stage sample using random sampling with replacement at either stage or if the sampling fractions are so low that the difference between sampling with and sampling without replacement is negligible. Raul Cruz-Cano, HLTH 653 Spring 2013 22

Effect for estimation of a mean n n n Since considerations for the choice

Effect for estimation of a mean n n n Since considerations for the choice of a design always are of an approximate nature, only those designs are considered here where each level-two unit contains the same number of level-one units. Level-two units will sometimes be referred to as clusters. The number of level-two units is denoted N The number of level-one units within each level-two unit is denoted n These numbers are called the level-two sample size and the cluster size, respectively The total sample size is Nn. If in reality the number of level-one units fluctuates between level -two units, it will almost always be a reasonable approximation to use for n the average number of sampled level-one units per level -two unit. Raul Cruz-Cano, HLTH 653 Spring 2013 23

Effect for estimation of a mean n Suppose that the mean is to be

Effect for estimation of a mean n Suppose that the mean is to be estimated of some variable Y in a population which has a two-level structure. As an example, Y could be the duration of hospital stay after a certain operation under the condition that there are no complications or additional health problems. Random Intercept Raul Cruz-Cano, HLTH 653 Spring 2013 24

Effect for estimation of a mean 1. This increase in complexity permeates to regression,

Effect for estimation of a mean 1. This increase in complexity permeates to regression, etc 2. This is a relatively simple model, more complex models lead to more complex calculations that require the calculation of large covariance matrices 25

Easier Case The effect of each level-2 unit is a constant (fixed), not a

Easier Case The effect of each level-2 unit is a constant (fixed), not a random variable Raul Cruz-Cano, HLTH 653 Spring 2013 26

Fixed Effects An equivalent to this operation is to add a dummy variable for

Fixed Effects An equivalent to this operation is to add a dummy variable for each uj Actually a constants is a random variable with no variation hence fixed effects is special case of random effects 27

Software to do Multilevel Modeling SAS Users PROC MIXED Extension of General Linear Modeling

Software to do Multilevel Modeling SAS Users PROC MIXED Extension of General Linear Modeling PROC REG PROC GLM PROC ANOVA Simple Linear Regression Multiple Linear Regression ANOVA ANCOVA Repeated Measures ANOVA Raul Cruz-Cano, HLTH 653 Spring 2013 28

Example: Family and Gender n n The response variable Height measures the heights (in

Example: Family and Gender n n The response variable Height measures the heights (in inches) of 18 individuals. The individuals are classified according to Family and Gender data heights; input Family Gender$ Height @@; datalines; 1 F 67 1 F 66 1 F 64 1 M 71 1 M 72 2 F 63 2 F 67 2 M 69 2 M 68 2 M 70 3 F 63 3 M 64 4 F 67 4 F 66 4 M 67 4 M 69 ; run; Different than “Effects…” because now we have more cluster levels, but no random intercepts Raul Cruz-Cano, HLTH 653 Spring 2013 29

Example: Family and Gender n n n The PROC MIXED statement invokes the procedure.

Example: Family and Gender n n n The PROC MIXED statement invokes the procedure. The CLASS statement instructs PROC MIXED to consider both Family and Gender as classification variables. Dummy (indicator) variables are, as a result, created corresponding to all of the distinct levels of Family and Gender. For these data, Family has four levels and Gender has two levels. proc mixed data=heights; class Family Gender; model Height = Gender Family*Gender/s; run; s : requests that a solution for the fixed-effects parameters be produced along with their approximate standard errors 30

Family and Gender n Run program simple-proc_mixed 2. sas What happens when you try

Family and Gender n Run program simple-proc_mixed 2. sas What happens when you try to use the statement CLASS in a PROC REG? Raul Cruz-Cano, HLTH 653 Spring 2013 31

Dorsal shells in lizards Two-sample t-test: the small observed difference is not significant (p

Dorsal shells in lizards Two-sample t-test: the small observed difference is not significant (p = 0. 1024). Raul Cruz-Cano, HLTH 653 Spring 2013 32

Mother effect n n n We have 102 lizards from 29 mothers Mother effects

Mother effect n n n We have 102 lizards from 29 mothers Mother effects might be present Hence a comparison between male and female animals should be based on within-mother comparisons. Raul Cruz-Cano, HLTH 653 Spring 2013 33

Mother effect # of dorsal shells Raul Cruz-Cano, HLTH 653 Mother Spring 2013 34

Mother effect # of dorsal shells Raul Cruz-Cano, HLTH 653 Mother Spring 2013 34

First Choice Β can be interpreted as the average difference between males and females

First Choice Β can be interpreted as the average difference between males and females for each mother Test for a ‘sex’ effect, correcting for ‘mother’ effects, Raul Cruz-Cano, HLTH 653 Spring 2013 More complex example than “Effect…” because now we have a variable xij for each observation 35

SAS program proc mixed data = lizard; class mothc; model dors = sex mothc;

SAS program proc mixed data = lizard; class mothc; model dors = sex mothc; run; Source SEX MOTHC F Value 7. 19 3. 95 Pr > F 0. 0091 <. 0001 1. Highly significant mother effect. 2. Significant gender effect. 3. Many degrees of freedom are spent to the estimation of the mother effect, which is not even of interest 36

Later in this semester… n Note the different nature of the two factors: n

Later in this semester… n Note the different nature of the two factors: n n n SEX: defines 2 groups of interest MOTHER: defines 29 groups not of real interest. A new sample would imply other mothers. In practice, one therefore considers the factor ‘mother’ as a random factor. The factor ‘sex’ is a fixed effect. Thus the model is a mixed model. In general, models can contain multiple fixed and/or random factors. Fixed Effects Model Random Effects Model Raul Cruz-Cano, HLTH 653 As in the Spring 2013 slides of “Effect…” 37

Later in this semester… n Note the different nature of the two factors: n

Later in this semester… n Note the different nature of the two factors: n n n SEX: defines 2 groups of interest MOTHER: defines 29 groups not of real interest. A new sample would imply other mothers. In practice, one therefore considers the factor ‘mother’ as a random factor. The factor ‘sex’ is a fixed effect. Thus the model is a mixed model. In general, models can contain multiple fixed and/or random factors. proc mixed data = lizard; class mothc; model dors = sex / solution; random mothc; run; Raul Cruz-Cano, HLTH 653 Spring 2013 38

Is a variable random or fixed effect? La. Motte 1983, pp. 138– 139 n

Is a variable random or fixed effect? La. Motte 1983, pp. 138– 139 n Treatment levels used are the only ones about which inferences are sought => fixed Effect n Inferences are sought about a broader collection of treatment effects than those used in the experiment, or if the treatment levels are not selected purposefully => Random Effect Raul Cruz-Cano, HLTH 653 Spring 2013 39

More terminology n Balanced design n n Unbalanced design n n Unequal number of

More terminology n Balanced design n n Unbalanced design n n Unequal number of observation per unit Unconditional model n n Equal number of observations per unit Simplest level 2 model; no predictors of the level 1 parameters (e. g. , intercept and slope) Conditional model n Level 2 model contains predictors of level 1 parameters Raul Cruz-Cano, HLTH 653 Spring 2013 40

Weighted Data Problem: Pct. of Voting Population Minority Voters White Voters Pct. of People

Weighted Data Problem: Pct. of Voting Population Minority Voters White Voters Pct. of People who have a phone Minority Voters White Voters Solution: Give more “weight” to the minority people with telephone 41

Weighted Data Not limited to 2 categories Pct. of Voting Population Pct. of People

Weighted Data Not limited to 2 categories Pct. of Voting Population Pct. of People who have a phone Minority/Dem. Minority/Rep. White /Dem White /Rep How many categories? As many as there are significant 42

Proportion Suppose minority voters are 1/3 of the voting population but only 1/6 of

Proportion Suppose minority voters are 1/3 of the voting population but only 1/6 of the people with phone A sampling weight for a given data point is the number of receipts in the target population which that sample point represents. Needless to say that in reality this is a much more complex issue Raul Cruz-Cano, HLTH 653 Spring 2013 43

Which weight we need to use? n Oversimplified example (don’t take seriously) Pct. of

Which weight we need to use? n Oversimplified example (don’t take seriously) Pct. of People who have a phone Minority Voters White Voters Pct. of Voting Population in 2008 Minority Voters O White Voters Pct. of Voting Population in 2010 Minority Voters White Voters M 44

Proportion Suppose minority voters are 1/3 of the voting population but only 1/6 of

Proportion Suppose minority voters are 1/3 of the voting population but only 1/6 of the people with phone 1. 100 minority + 500 white answer the phone survey 2. 75 Minority will vote for candidate X 3. 250 White will votes for candidate X 4. Non-Weighted Conclusion: 325/600 =54. 16% of the voters will vote for candidate X 5. Weighted Conclusion: 1. 75 minority = 75% of minority with phone=>(. 75)*(1/6)=12. 5% of people with phone * 2 weight= 25% pct of voting population 2. 250 white = 50% of white people with phone =>(. 5)*(5/6)= 41. 66% of people with phone *. 8 weight =>33. 33% 3. 25% +33. 33%=58. 33% Raul Cruz-Cano, HLTH 653 Spring 2013 45

SAS Weighted Mean proc means data=sashelp. class; var height; run; proc means data=sashelp. class;

SAS Weighted Mean proc means data=sashelp. class; var height; run; proc means data=sashelp. class; weight; var height; run; Raul Cruz-Cano, HLTH 653 Spring 2013 46

Fish Measurement Data The data set contains 35 fish from the species Bream caught

Fish Measurement Data The data set contains 35 fish from the species Bream caught in Finland's lake Laengelmavesi with the following measurements: • Weight (in grams) • Length 3 (length from the nose to the end of its tail, in cm) • Ht. Pct (max height, as percentage of Length 3) • Width. Pct (max width, as percentage of Length 3) title 'Fish Measurement Data'; proc corr data=fish 1 nomiss plots=matrix(histogram); var Height Width Length 3 Weight 3; run; The statement weight can be used by many different PROC’s data Fish 1 (drop=Ht. Pct Width. Pct); title 'Fish Measurement Data'; input Weight Length 3 Ht. Pct Width. Pct @@; Weight 3= Weight**(1/3); Height=Ht. Pct*Length 3/100; Width=Width. Pct*Length 3/100; datalines; 242. 0 30. 0 38. 4 13. 4 290. 0 31. 2 40. 0 13. 8 340. 0 31. 1 39. 8 15. 1 363. 0 33. 5 38. 0 13. 3 430. 0 34. 0 36. 6 15. 1 450. 0 34. 7 39. 2 14. 2 500. 0 34. 5 41. 1 15. 3 390. 0 35. 0 36. 2 13. 4 450. 0 35. 1 39. 9 13. 8 500. 0 36. 2 39. 3 13. 7 475. 0 36. 2 39. 4 14. 1 500. 0 36. 2 39. 7 13. 3 500. 0 36. 4 37. 8 12. 0. 37. 3 13. 6 600. 0 37. 2 40. 2 13. 9 600. 0 37. 2 41. 5 15. 0 700. 0 38. 3 38. 8 13. 8 700. 0 38. 5 38. 8 13. 5 610. 0 38. 6 40. 5 13. 3 650. 0 38. 7 37. 4 14. 8 575. 0 39. 5 38. 3 14. 1 685. 0 39. 2 40. 8 13. 7 620. 0 39. 7 39. 1 13. 3 680. 0 40. 6 38. 1 15. 1 700. 0 40. 5 40. 1 13. 8 725. 0 40. 9 40. 0 14. 8 720. 0 40. 6 40. 3 15. 0 714. 0 41. 5 39. 8 14. 1 850. 0 41. 6 40. 6 14. 9 1000. 0 42. 6 44. 5 15. 5 920. 0 44. 1 40. 9 14. 3 955. 0 44. 0 41. 1 14. 3 925. 0 45. 3 41. 4 14. 9 975. 0 45. 9 40. 6 14. 7 950. 0 46. 5 37. 9 13. 7 ; run;

Weighted PROC MIXED proc mixed data=sashelp. class covtest; class Sex; model height=Sex Age/solution; weight;

Weighted PROC MIXED proc mixed data=sashelp. class covtest; class Sex; model height=Sex Age/solution; weight; run; Notice the difference (kind of small) in let’s say the coefficients of the model Raul Cruz-Cano, HLTH 653 (Solution for Fixed Effects/Estimates) Spring 2013 48

Farms Example n n n It's stratified by regions within Iowa and Nebraska. Regress

Farms Example n n n It's stratified by regions within Iowa and Nebraska. Regress on farm area, with separate intercept and slope for each state Farms. sas The population is first partitioned into disjoint classes (the strata) which together are exhaustive. Thus each population element should be within one and only one stratum. The main difference between stratified and cluster sampling is that in stratified sampling all the strata need to be sampled. In cluster sampling one proceeds by first selecting a number of clusters at random and then sampling each cluster or conduct a census of each cluster. But usually not all clusters would be included. Raul Cruz-Cano, HLTH 653 Spring 2013 49

Another (better? ) approach for weighted data n n Experimental design data have all

Another (better? ) approach for weighted data n n Experimental design data have all the properties that we learned about in statistics classes. n The data are going to be independent n Identically-distributed observations with some known error distribution n there is an underlying assumption that the data come to use as a finite number of observations from a conceptually infinite population n Simple random sampling without replacement for the sample data Sample survey data, n Does not come from a finite target population n The sample survey data do not have independent errors. The sample survey data do not come from a conceptually infinite population. n The sample survey data may cover many small sub-populations, so we do not expect that the errors are identically distributed. Raul Cruz-Cano, HLTH 653 Spring 2013 50

PROC MEANS vs PROC SURVEYMEANS 1. We have a target population of 647 receipt

PROC MEANS vs PROC SURVEYMEANS 1. We have a target population of 647 receipt amounts, classified by the company region. If we need to perform a full audit, and it is too expensive to perform the full audit on every one of the receipts in the company database, then we need to take a sample. 2. We want to sample the larger receipt amounts more frequently. 3. Sample 'proportional to size'. That means that we choose a multiplier variable, and we make our choices based on the size of that multiplier. If the multiplier for receipt A is five times the multiplier for receipt B, then receipt A will be five times more likely to be selected than receipt B. we will use the receipt amount as the multiplier data Audit. Frame (drop=seed); seed=18354982; do i=1 to 600; if i<101 then region='H'; else if i<201 then region='S'; else if i<401 then region='R'; else region='G'; Amount = round ( 9990*ranuni(seed)+10, 0. 01); output; end; do i=601 to 617; if i<603 then region='H'; else if i<606 then region='S'; else if i<612 then region='R'; else region='G'; Amount = round ( 10000*ranuni(seed)+10000, 0. 01); output; end; do i = 618 to 647; If i<628 then region='H'; else if i<638 then region='S'; else if i<642 then region='R'; else region='G'; Amount = round ( 9*ranuni(seed)+1, 0. 01); output; end; run; 51

PROC MEANS vs PROC SURVEYMEANS Probability Proportional to Size We can perform this sample

PROC MEANS vs PROC SURVEYMEANS Probability Proportional to Size We can perform this sample selection using PROC SURVEYSELECT: proc surveyselect data=Audit. Frame out=Audit. Sample 3 method=PPS seed=39563462 sampsize=100; size Amount; run; This gives us a weighted random sample of size 100. Now we'll just (artificially) create the data set of the audit results. We will have a validated receipt amount for each receipt. Ideally, the validated amount would be exactly equal to the listed amount in every case. data Audit. Check 3; set Audit. Sample 3; Validated. Amt = Amount; if region='S' and mod(i, 3)=0 then Validated. Amt = round(Amount*(. 8+. 2*ranuni(1234)), 0. 01); if region='H' then do; if floor(Amount/100)=13 then Validated. Amt=1037. 50; if floor(Amount/100)=60 then Validated. Amt=6035. 30; if floor(Amount/100)=85 then Validated. Amt=8565. 97; if floor(Amount/100)=87 then Validated. Amt=8872. 92; if floor(Amount/100)=95 then Validated. Amt=9750. 05; end; diff = Validated. Amt - Amount; Raul Cruz-Cano, HLTH 653 run; Spring 2013 52

PROC MEANS vs PROC SURVEYMEANS n n n The WEIGHT statement in PROC MEANS

PROC MEANS vs PROC SURVEYMEANS n n n The WEIGHT statement in PROC MEANS and PROC SUMMARY allows a user to give some data points more emphasis. But that isn't the right way to address the weights we have here. We built a sample using a specific sample design, and we have sampling weights which have a real, physical meaning. A sampling weight for a given data point is the number of receipts in the target population which that sample point represents. The primary difference is the inclusion of the TOTAL= option. PROC SURVEYMEANS allows us to compute a Finite Population Correction Factor and adjust the error estimates accordingly. This factor adjusts for the fact that we already know the answers for some percentage of the finite population, and so we really only need to make error estimates for the remainder of that finite population. proc means data=Audit. Check 3 mean stderr clm; var Validated. Amt diff; run; proc means data=Audit. Check 3 mean stderr clm; var Validated. Amt diff; weight Sampling. Weight; run; proc surveymeans data=Audit. Check 3 mean stderr clm total=647; var Validated. Amt diff; weight Sampling. Weight; run; Raul Cruz-Cano, HLTH 653 Spring 2013 53

Farms Example Revisited (proc surveyreg) First, we need to specify any stratum and cluster

Farms Example Revisited (proc surveyreg) First, we need to specify any stratum and cluster information, and provide a weight variable. We have that information ready. The stratum variables are STATE and REGION, while the sub-population totals for the five strata are in a separate data set named Stratum. Totals. data Stratum. Totals; input State $ Region _TOTAL_; datalines; Iowa 1 100 Iowa 2 50 Iowa 3 15 Nebraska 1 30 Nebraska 2 40 ; run; Finite Population Correction Factor CLASS now indicates categorical variable, not that the variable is a cluster or a strata PROC MIXED does not allow cluster or strata, class covers them proc surveyreg data=Farms. By. State total=Stratum. Totals; class state; model Corn. Yield = State Farm. Area. IA Farm. Area. NE / noint covb solution; strata State Region; Raul Cruz-Cano, HLTH 653 weight Weight; run; Spring 2013 54

Disadvantages of proc surveyeg n n If the survey proc’s are more exact… Why

Disadvantages of proc surveyeg n n If the survey proc’s are more exact… Why not use proc surveyreg the rest of the semester? Does not allow random effects PROC MIXED does Raul Cruz-Cano, HLTH 653 Spring 2013 55

Household Component of the Medical Expenditure Panel Survey (MEPS HC) n n n The

Household Component of the Medical Expenditure Panel Survey (MEPS HC) n n n The MEPS HC is a nationally representative survey of the U. S. civilian noninstitutionalized population. It collects medical expenditure data as well as information on demographic characteristics, access to health care, health insurance coverage, as well as income and employment data. MEPS is cosponsored by the Agency for Healthcare Research and Quality (AHRQ) and the National Center for Health Statistics (NCHS). For the comparisons reported here we used the MEPS 2005 Full Year Consolidated Data File (HC-097). This is a public use file available for download from the MEPS web site (http: //www. meps. ahrq. gov). Raul Cruz-Cano, HLTH 653 Spring 2013 56

Transforming from SAS transport (SSP) format to SAS Dataset (SAS 7 BDAT) n n

Transforming from SAS transport (SSP) format to SAS Dataset (SAS 7 BDAT) n n The MEPS is not a simple random sample, its design includes: n Stratification n Clustering n Multiple stages of Selection n Disproportionate sampling. The MEPS public use files (such as HC-097) include variables for generating weighted national estimates and for use of the Taylor method for variance estimation. These variables are: n person-level weight (PERWT 05 F on HC-097) Needed for even better n stratum (VARSTR on HC-097) estimates of the CI n cluster/psu(VARPSU on HC-097). LIBNAME PUFLIB 'C: '; FILENAME IN 1 'C: H 97. SSP'; PROC XCOPY IN=IN 1 OUT=PUFLIB IMPORT; RUN; Raul Cruz-Cano, HLTH 653 Spring 2013 H 97. SASBDAT occupies 408 MB vs. 257 MB for H 97. SSP vs. 14 MB for H 97. ZIP 57

PROC SURVEYFREQ Simple Example SAS 7 BDAT PROC SURVEYFREQ DATA= PUFLIB. H 97; TABLES

PROC SURVEYFREQ Simple Example SAS 7 BDAT PROC SURVEYFREQ DATA= PUFLIB. H 97; TABLES HISPANX*INSCOV 05 / ROW; WEIGHT PERWT 05 F; RUN; Raul Cruz-Cano, HLTH 653 Spring 2013 58

References n n La. Motte, L. R. (1983). Fixed-, random-, and mixed-effects models. In

References n n La. Motte, L. R. (1983). Fixed-, random-, and mixed-effects models. In Encyclopedia of Statistical Sciences, S. Kotz, N. L. Johnson, and C. B. Read Xiuhua Chen and Paul Gorrell, An Introduction to the SAS Survey Analysis PROCs, NESUG 2008 David L. Cassell, (2006) “Wait, Don't Tell Me… You're Using the Wrong Proc! SUGI 31. Paper 193 -31. Many others… Raul Cruz-Cano, HLTH 653 Spring 2013 59