Count Models 2 Sociology 8811 Lecture 13 Copyright

  • Slides: 15
Download presentation
Count Models 2 Sociology 8811 Lecture 13 Copyright © 2007 by Evan Schofer Do

Count Models 2 Sociology 8811 Lecture 13 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Announcements • Paper #1 deadline coming up: March 8 • You should have a

Announcements • Paper #1 deadline coming up: March 8 • You should have a dataset by now • You should have some simple models by now • If not, you need to do something right away!!! • Class schedule • Today: – Talk a bit about papers – Wrap up count models • Thursday: New topic – Event History Analysis

Review: Count Models • Many dependent variables are counts: Nonnegative integers • OLS is

Review: Count Models • Many dependent variables are counts: Nonnegative integers • OLS is inappropriate: linearity and normality assumptions are violated – Solution: Poisson & Negative Binomial models • Coefficient interpretation = similar to logit – Exponentiated coefficients show multiplicative effect on rate – Poisson assumes there is no overdispersion • Skewed variables may lead to overdispersion • If overdispersion is identified, use neg binomial model – Neg binomial model offers chi-square test to identify overdispersion!

Negative Binomial Example: Web Use • Note: Info on overdispersion is provided Negative binomial

Negative Binomial Example: Web Use • Note: Info on overdispersion is provided Negative binomial regression Log likelihood = -4368. 6846 Number of obs LR chi 2(5) Prob > chi 2 Pseudo R 2 = = 1552 57. 80 0. 0000 0. 0066 ---------------------------------------wwwhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------+--------------------------------male |. 3617049. 0634391 5. 70 0. 000. 2373666. 4860433 age | -. 0109788. 0024167 -4. 54 0. 000 -. 0157155 -. 006242 educ |. 0171875. 0120853 1. 42 0. 155 -. 0064992. 0408742 lowincome | -. 0916297. 0724074 -1. 27 0. 206 -. 2335457. 0502862 babies | -. 1238295. 0624742 -1. 98 0. 047 -. 2462767 -. 0013824 _cons | 1. 881168. 1966654 9. 57 0. 000 1. 495711 2. 266625 -------+--------------------------------/lnalpha |. 2979718. 0408267. 217953. 3779907 -------+--------------------------------alpha | 1. 347124. 0549986 1. 243529 1. 459349 ---------------------------------------Likelihood-ratio test of alpha=0: chibar 2(01) = 8459. 61 Prob>=chibar 2 = 0. 000 Alpha is clearly > 0! Overdispersion is evident; LR test p<. 05 You should not use Poisson Regression in this case

General Remarks • It is often useful to try both Poisson and Negative Binomial

General Remarks • It is often useful to try both Poisson and Negative Binomial models • The latter allows you to test for overdispersion • Use LRtest on alpha (a) to guide model choice – If you don’t suspect dispersion and alpha appears to be zero, use Poission Regression • It makes fewer assumptions – Such as gamma-distributed error.

Example: Labor Militancy Isaac & Christiansen 2002 Note: Results are presented as % change

Example: Labor Militancy Isaac & Christiansen 2002 Note: Results are presented as % change

Zero-Inflated Poisson & NB Reg • If outcome variable has many zero values it

Zero-Inflated Poisson & NB Reg • If outcome variable has many zero values it tends to be highly skewed • Under those circumstances, NBREG works better than ordinary Poisson due to overdispersion – But, sometimes you have LOTS of zeros. Even nbreg isn’t sufficient • Model under-predicts zeros, doesn’t fit well – Examples: • # violent crimes committed by a person in a year • # of wars a country fights per year • # of foreign subsidiaries of firms.

Zero-Inflated Poisson & NB Reg • Logic of zero-inflated models: Assume two types of

Zero-Inflated Poisson & NB Reg • Logic of zero-inflated models: Assume two types of groups in your sample • Type A: Always zero – no probability of non-zero value • Type ~A: Non-zero chance of positive count value – Probability is variable, but not zero – 1. Use logit to model group membership – 2. Use poisson or nbreg to model counts for those in group ~A – 3. Compute probabilities based on those results.

Zero-Inflated Poisson & NB Reg • Example: Web usage at work • More skewed

Zero-Inflated Poisson & NB Reg • Example: Web usage at work • More skewed than overall web usage. Why? Many people don’t have computers at work! So, web usage is zero for many

Zero-Inflated Poisson & NB Reg • Zero-inflated models in Stata • “zip” = Poisson,

Zero-Inflated Poisson & NB Reg • Zero-inflated models in Stata • “zip” = Poisson, zinb = negative binomial • Commands accept two separate variable lists – Variables that affect counts • For those with non-zero counts • Modeled with Poisson or NB regression – Variables that predict membership in “zero” group • Modeled with logit – Ex: zinb webatwork male age educ lowincome babies, inflate(male age educ lowincome babies)

ZINB Example: Web Hrs at Work • “Inflate” output = logit for group membership

ZINB Example: Web Hrs at Work • “Inflate” output = logit for group membership Zero-inflated negative binomial regression Number of obs Nonzero obs Zero obs = = = 1135 562 573 Inflation model = logit LR chi 2(5) = 13. 25 Log likelihood = -2239. 23 Prob > chi 2 = 0. 0212 ---------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------+--------------------------------webatwork | male |. 2348353. 1298324 1. 81 0. 070 -. 0196315. 4893021 age | -. 0152071. 0053766 -2. 83 0. 005 -. 0257451 -. 0046692 Education reduces educ |. 0126503. 0265321 0. 48 0. 634 -. 0393517. 0646523 odds of zero value lowincome | -. 4183108. 2164324 -1. 93 0. 053 -. 8425105. 0058889 babies |. 0588977. 1385245 0. 43 0. 671 -. 2126053. 3304008 But doesn’t have _cons | 1. 703158. 4538886 3. 75 0. 000. 8135524 2. 592763 -------+--------------------------------an effect on count Model inflate | predicting zero group for those that. 9311853 are male |. 2630493. 340892 0. 77 0. 440 -. 4050866 non-zero age | -. 0197401. 0195075 -1. 01 0. 312 -. 057974. 0184939 educ | -. 3601863. 071167 -5. 06 0. 000 -. 4996711 -. 2207015 lowincome |. 844378. 4013074 2. 10 0. 035. 0578299 1. 630926 babies |. 4504404. 2502363 1. 80 0. 072 -. 0400138. 9408947 _cons | 4. 137417 1. 172503 3. 53 0. 000 1. 839354 6. 43548

Zero-Inflated Poisson & NB Reg • Remarks – ZINB produces estimate of alpha •

Zero-Inflated Poisson & NB Reg • Remarks – ZINB produces estimate of alpha • Helps choose between zip & zinb – Long and Freese (2006) have helpful tool to compare fit of count models: countfit • See textbook – Zero-inflated models seem very useful • Count variables often have many zeros • It is often reasonable to assume a “always zero” group – But, they are fairly new • Not many examples in the literature • Haven’t been widely scrutinized.

Zero-truncated Poisson & NB reg • Truncation – the absence of information about cases

Zero-truncated Poisson & NB reg • Truncation – the absence of information about cases in some range of a variable • Example: Suppose we study income based on data from tax returns… – Cases with income below a certain value are not required to submit a tax return… so data is missing • Example: Data on # crimes committed, taken from legal records – Individuals with zero crimes are not evident in data • Example: An on-line survey of web use – Individuals with zero web use are not in data • Poisson & NB have been adapted to address truncated data: – Zero-truncated Poisson & Zero-trunciated NB reg.

Example: Zero-truncated NB Reg • Web use (zeros removed) Zero-truncated negative binomial regression Dispersion

Example: Zero-truncated NB Reg • Web use (zeros removed) Zero-truncated negative binomial regression Dispersion = mean Log likelihood = -3653. 162 Number of obs LR chi 2(5) Prob > chi 2 Pseudo R 2 = = 1304 34. 87 0. 0000 0. 0047 ---------------------------------------wwwhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------+--------------------------------male |. 3744582. 0874595 4. 28 0. 000. 2030407. 5458758 age | -. 0114399. 0033817 -3. 38 0. 001 -. 0180679 -. 0048119 educ |. 0081191. 016731 0. 49 0. 627 -. 024673. 0409112 lowincome |. 1899431. 1111248 1. 71 0. 087 -. 0278574. 4077437 babies | -. 1375942. 0860954 -1. 60 0. 110 -. 306338. 0311496 _cons | 1. 533013. 2907837 5. 27 0. 000. 9630872 2. 102938 -------+--------------------------------/lnalpha | 1. 099164. 1385789. 8275543 1. 370774 -------+--------------------------------alpha | 3. 001656. 4159661 2. 287717 3. 938396 ---------------------------------------Likelihood-ratio test of alpha=0: chibar 2(01) = 6857. 67 Prob>=chibar 2 = 0. 000 Coefficient interpretation works just like ordinary poisson or NB regression.

Empirical Example 2 • Example: Haynie, Dana L. 2001. “Delinquent Peers Revisited: Does Network

Empirical Example 2 • Example: Haynie, Dana L. 2001. “Delinquent Peers Revisited: Does Network Structure Matter? ” American Journal of Sociology, 106, 4: 1013 -1057.