Ordered probit models 1 Ordered Probit Many discrete

Ordered probit models 1

Ordered Probit • Many discrete outcomes are to questions that have a natural ordering but no quantitative interpretation: • Examples: – Self reported health status • (excellent, very good, fair, poor) – Do you agree with the following statement • Strongly agree, disagree, strongly disagree 2

• Can use the same type of model as in the previous section to analyze these outcomes • Another ‘latent variable’ model • Key to the model: there is a monotonic ordering of the qualitative responses 3

Self reported health status • Excellent, very good, fair, poor • Coded as 1, 2, 3, 4, 5 on National Health Interview Survey • We will code as 5, 4, 3, 2, 1 (easier to think of this way) • Asked on every major health survey • Important predictor of health outcomes, e. g. mortality • Key question: what predicts health status? 4

• Important to note – the numbers 1 -5 mean nothing in terms of their value, just an ordering to show you the lowest to highest • The example below is easily adapted to include categorical variables with any number of outcomes 5

Model • yi* = latent index of reported health • The latent index measures your own scale of health. Once yi* crosses a certain value you report poor, then good, then very good, then excellent health 6

• yi = (1, 2, 3, 4, 5) for (fair, poor, VG, G, excel) • Interval decision rule • • • yi=1 yi=2 yi=3 yi=4 yi=5 if if if yi* ≤ u 1 < yi* ≤ u 2 < yi* ≤ u 3 < yi* ≤ u 4 yi* > u 4 7

• As with logit and probit models, we will assume yi* is a function of observed and unobserved variables • yi* = β 0 + x 1 i β 1 + x 2 i β 2 …. xki βk + εi • yi* = xi β + εi 8

• The threshold values (u 1, u 2, u 3, u 4) are unknown. We do not know the value of the index necessary to push you from very good to excellent. • In theory, the threshold values are different for everyone • Computer will not only estimate the β’s, but also the thresholds – average across people 9

• As with probit and logit, the model will be determined by the assumed distribution of ε • In practice, most people pick nornal, generating an ‘ordered probit’ (I have no idea why) • We will generate the math for the probit version 10

Probabilities • Lets do the outliers, Pr(yi=1) and Pr(yi=5) first • Pr(yi=1) • = Pr(yi* ≤ u 1) • = Pr(xi β +εi ≤ u 1 ) • =Pr(εi ≤ u 1 - xi β) • = Φ[u 1 - xi β] = 1 - Φ[xi β – u 1] 11

Likelihood function • There are 5 possible choices for each person • Only 1 is observed • L = Σi ln[Pr(yi=k)] for k 15

Programming example • Cancer control supplement to 1994 National Health Interview Survey • Question: what observed characteristics predict self reported health (1 -5 scale) • 1=poor, 5=excellent • Key covariates: income, education, age, current and former smoking status • Programs • sr_health_status. do, . dta, . log 16

• desc; • • • male age educ smoke 5 black othrace sr_health byte byte float %9. 0 g %9. 0 g famincl float %9. 0 g =1 if male age in years of education current smoker smoked in past 5 years =1 if respondent is black =1 if other race (white is ref) 1 -5 self reported health, 5=excel, 1=poor log family income 17

• tab sr_health; • 1 -5 self | • reported | • health, | • 5=excel, | • 1=poor | Freq. Percent Cum. • ------+----------------- • 1 | 342 2. 65 • 2 | 991 7. 68 10. 33 • 3 | 3, 068 23. 78 34. 12 • 4 | 3, 855 29. 88 64. 00 • 5 | 4, 644 36. 00 100. 00 • ------+----------------- • Total | 12, 900 100. 00 18

In STATA • oprobit sr_health male age educ famincl black othrace smoke 5; 19

• • Ordered probit estimates • • • • • ---------------------------------------sr_health | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------+--------------------------------male |. 1281241. 0195747 6. 55 0. 000. 0897583. 1664899 age | -. 0202308. 0008499 -23. 80 0. 000 -. 0218966 -. 018565 educ |. 0827086. 0038547 21. 46 0. 000. 0751535. 0902637 famincl |. 2398957. 0112206 21. 38 0. 000. 2179037. 2618878 black | -. 221508. 029528 -7. 50 0. 000 -. 2793818 -. 1636341 othrace | -. 2425083. 0480047 -5. 05 0. 000 -. 3365958 -. 1484208 smoke | -. 2086096. 0219779 -9. 49 0. 000 -. 2516855 -. 1655337 smoke 5 | -. 1529619. 0357995 -4. 27 0. 000 -. 2231277 -. 0827961 -------+--------------------------------_cut 1 |. 4858634. 113179 (Ancillary parameters) _cut 2 | 1. 269036. 11282 _cut 3 | 2. 247251. 1138171 _cut 4 | 3. 094606. 1145781 --------------------------------------- Log likelihood = -16401. 987 Number of obs LR chi 2(8) Prob > chi 2 Pseudo R 2 = = 12900 2379. 61 0. 0000 0. 0676 20

Interpret coefficients • Marginal effects/changes in probabilities are now a function of 2 things – Point of expansion (x’s) – Frame of reference for outcome (y) • STATA – Picks mean values for x’s – You pick the value of y 21

Continuous x’s • Consider y=5 • d Pr(yi=5)/dxi = d Φ[xi β – u 4]/dxi = βφ[xi β – u 4] • Consider y=3 • d Pr(yi=3)/dxi = βφ[xi β – u 3] - βφ[xi β – u 4] 22

Discrete X’s • xi β = β 0 + x 1 i β 1 + x 2 i β 2 …. xki βk – X 2 i is yes or no (1 or 0) • ΔPr(yi=5) = • Φ[β 0 + x 1 i β 1 + β 2 + x 3 i β 3 +. . xki βk] - Φ[β 0 + x 1 i β 1 + x 3 i β 3 …. xki βk] • Change in the probabilities when x 2 i=1 and x 2 i=0 23

Ask for marginal effects • mfx compute, predict(outcome(5)); 24

• mfx compute, predict(outcome(5)); • • • • Marginal effects after oprobit y = Pr(sr_health==5) (predict, outcome(5)) =. 34103717 ---------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C. I. ] X -----+----------------------------------male*|. 0471251. 00722 6. 53 0. 000. 03298. 06127. 438062 age | -. 0074214. 00031 -23. 77 0. 000 -. 008033 -. 00681 39. 8412 educ |. 0303405. 00142 21. 42 0. 000. 027565. 033116 13. 2402 famincl |. 0880025. 00412 21. 37 0. 000. 07993. 096075 10. 2131 black*| -. 0781411. 00996 -7. 84 0. 000 -. 097665 -. 058617. 124264 othrace*| -. 0843227. 01567 -5. 38 0. 000 -. 115043 -. 053602. 04124 smoke*| -. 0749785. 00773 -9. 71 0. 000 -. 09012 -. 059837. 289147 smoke 5*| -. 0545062. 01235 -4. 41 0. 000 -. 078719 -. 030294. 081395 ---------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1 25

Interpret the results • Males are 4. 7 percentage points more likely to report excellent • Each year of age decreases chance of reporting excellent by 0. 7 percentage points • Current smokers are 7. 5 percentage points less likely to report excellent health 26

Minor notes about estimation • Wald tests/-2 log likelihood tests are done the exact same was as in PROBIT and LOGIT 27

• Use PRCHANGE to calculate marginal effect for a specific person prchange, x(age=40 black=0 othrace=0 smoke 5=0 educ=16); – When a variable is NOT specified (famincl), STATA takes the sample mean. 28

• PRCHANGE will produce results for all outcomes • • • male 0 ->1 Avg|Chg|. 0203868 0 ->1 5. 05096698 1 -. 0020257 2 -. 00886671 3 -. 02677558 4 -. 01329902 29

• • • age Min->Max -+1/2 -+sd/2 Marg. Efct Avg|Chg|. 13358317. 00321942. 03728014. 00321947 1. 0184785. 00032518. 00382077. 00032515 2. 06797072. 00141642. 01648743. 00141639 3. 17686112. 00424452. 04910323. 00424462 4. 07064757. 00206241. 0237889. 00206252 30