NOMINAL RESPONSES BASELINECATEGORY LOGIT MODELS Agresti 7 1

NOMINAL RESPONSES: BASELINE-CATEGORY LOGIT MODELS (Agresti 7. 1) Kathy Fung and Lin Zhang Statistics 6841 Project Winter 2005

Objective Introduction of NOMINAL RESPONSES (BASELINE CATEGORY LOGIT MODELS) The Concept and Example 2020/9/16 2

Model Definition 2020/9/16 3

Some Notes: • With categorical predictors, X 2 and G 2 goodness of fit statistics provide a model check when data are not sparse. • When an explanatory variable is continuous or the data are sparse such statistics are still valid for comparing nested models differing by relative few terms. 2020/9/16 4

Alligator Food Choice Example 2020/9/16 5

SAS code of Table 7. 1 *SAS for Baseline Category Logit Models with Alligator Data in Table 7. 1; data gator; infile 'K: CSU HaywardStat 6841projectgator. txt'; input lake gender size food count ; proc logistic; freq count; class lake size / param=ref; model food(ref='1') = lake size / link=glogit aggregate scale=none; proc catmod; weight count; population lake size gender; model food = lake size / pred=freq pred=prob; run; 2020/9/16 6

Output The LOGISTIC Procedure Model Information Data Set Response Variable Number of Response Levels Frequency Variable Model Optimization Technique Number of Observations Read Number of Observations Used Sum of Frequencies Read Sum of Frequencies Used WORK. GATOR food 5 count generalized logit Fisher's scoring 80 56 219 Response Profile Ordered Value 1 2 3 4 5 food 1 2 3 4 5 Total Frequency 94 61 19 13 32 Logits modeled use food=1 as the reference category. NOTE: 24 observations having nonpositive frequencies or weights were excluded since they do not contribute to the analysis. 2020/9/16 7

Output Class Level Information Class Value Design Variables lake 1 2 3 4 1 0 0 size 1 2 1 0 0 0 1 0 Model Convergence Status Convergence criterion (GCONV=1 E-8) satisfied. 2020/9/16 8

Output Deviance and Pearson Goodness-of-Fit Statistics Criterion Deviance Pearson Value DF Value/DF Pr > Chi. Sq 17. 0798 15. 0429 12 12 1. 4233 1. 2536 0. 1466 0. 2391 Number of unique profiles: 8 Model Fit Statistics Criterion Intercept Only Intercept and covariates AIC SC -2 Log L 612. 363 625. 919 604. 363 580. 080 647. 862 540. 080 Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald 2020/9/16 Chi-Square 64. 2826 57. 2475 49. 7584 DF 16 16 16 Pr > Chi. Sq <. 0001 9

Output Type 3 Analysis of Effects Parameter food Intercept lake 1 2 3 4 5 2 3 4 2020/9/16 Effect DF Wald Chi-Square lake size 12 4 35. 4890 18. 7593 Pr > Chi. Sq 0. 0004 0. 0009 Analysis of Maximum Likelihood Estimates Standard Wald DF Estimate Error Chi-Square 1 1 1 1 -1. 5490 -3. 3139 -2. 0931 -1. 9043 -1. 6583 1. 2422 0. 6951 0. 4249 1. 0528 0. 6622 0. 5258 0. 6129 1. 1852 0. 7813 13. 2890 9. 9081 9. 9894 13. 1150 7. 3216 1. 0985 0. 7916 Pr > Chi. Sq 0. 0003 0. 0016 0. 0003 0. 0068 0. 2946 0. 3736 10

Output Analysis of Maximum Likelihood Estimates Parameter lake lake lake size 2020/9/16 food 1 2 2 3 3 1 1 5 2 3 4 5 DF Estimate 1 1 1 1 0. 8262 0. 9372 2. 4583 -0. 6532 0. 00565 1. 1220 2. 9347 1. 0878 1. 5164 1. 4582 -0. 3513 -0. 6307 0. 3316 Standard Error 0. 5575 0. 4719 1. 1179 1. 2021 0. 7766 0. 4905 1. 1161 0. 8417 0. 6214 0. 3959 0. 5800 0. 6425 0. 4483 Wald Chi-Square 2. 1959 3. 9443 4. 8360 0. 2953 0. 0001 5. 2321 6. 9131 1. 6703 5. 9541 13. 5634 0. 3668 0. 9635 0. 5471 Pr > Chi. Sq 0. 1384 0. 0470 0. 0279 0. 5869 0. 9942 0. 0222 0. 0086 0. 1962 0. 0147 0. 0002 0. 5448 0. 3263 0. 4595 11

Output Effect lake 1 lake 2 lake 3 size 1 2020/9/16 vs vs vs vs 4 4 4 2 2 Odds Ratio Estimates Point 95% Wald food Estimate Confidence Limits 2 0. 190 0. 057 0. 633 3 3. 463 0. 339 35. 343 4 2. 004 0. 433 9. 266 5 2. 285 0. 766 6. 814 2 2. 553 1. 012 6. 437 3 11. 685 1. 306 104. 508 4 0. 520 0. 049 5. 490 5 1. 006 0. 219 4. 608 2 3. 071 1. 174 8. 032 3 18. 815 2. 111 167. 717 4 2. 968 0. 570 15. 447 5 4. 556 1. 348 15. 400 2 4. 298 1. 978 9. 339 3 0. 704 0. 226 2. 194 4 0. 532 0. 151 1. 875 5 1. 393 0. 579 3. 354 12

Output The CATMOD Procedure Response Weight Variable Data Set Frequency Missing Data Summary food count GATOR 0 Response Levels Populations Total Frequency Observations 5 16 219 56 Population Profiles Sample lake size gender Sample Size -----------------------1 1 13 2 1 1 2 26 3 1 2 1 7 4 1 2 2 9 5 2 1 1 5 6 2 15 7 2 2 1 26 8 2 2 9 3 1 1 12 10 3 1 2 12 11 3 2 1 28 12 3 2 2 1 13 4 1 1 27 14 4 1 2 14 15 4 2 1 12 16 4 2 2 10 2020/9/16 13

Output Response Profiles Response food --------1 1 2 2 3 3 4 4 5 5 Maximum Likelihood Analysis Maximum likelihood computations converged. Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > Chi. Sq -------------------------Intercept 4 70. 39 <. 0001 lake 12 35. 49 0. 0004 size 4 18. 76 0. 0009 Likelihood Ratio 44 52. 48 0. 1784 2020/9/16 14

Output Analysis of Maximum Likelihood Estimates Function Standard Chi. Parameter Number Estimate Error Square Pr > Chi. Sq --------------------------------------Intercept 1 1. 1514 0. 2343 24. 14 <. 0001 2 0. 4317 0. 2737 2. 49 0. 1147 3 -0. 6795 0. 3818 3. 17 0. 0751 4 -0. 9745 0. 4049 5. 79 0. 0161 lake 1 1 -0. 2391 0. 3458 0. 4892 1 2 -1. 9977 0. 4946 16. 31 <. 0001 1 3 -0. 6556 0. 6071 1. 17 0. 2802 1 4 0. 1736 0. 5654 0. 09 0. 7589 2 1 0. 5814 0. 5061 1. 32 0. 2506 2 2 1. 4184 0. 5250 7. 30 0. 0069 2 3 1. 3810 0. 6279 4. 84 0. 0278 2 4 -0. 3542 0. 9153 0. 15 0. 6988 3 1 -0. 9293 0. 3836 5. 87 0. 0154 3 2 0. 0925 0. 3910 0. 06 0. 8131 3 3 0. 3467 0. 5130 0. 46 0. 4991 3 4 -0. 1240 0. 5830 0. 05 0. 8316 size 1 1 -0. 1658 0. 2241 0. 55 0. 4595 1 2 0. 5633 0. 2525 4. 98 0. 0257 1 3 -0. 3414 0. 3257 1. 10 0. 2945 1 4 -0. 4811 0. 3564 1. 82 0. 1770 2020/9/16 15

Table 7. 2 2020/9/16 16

Some Test Results for Table 7. 2 • The data are sparse, 219 observations scattered among 80 cells. Thus, G 2 is more reliable for compar ing models than for testing fit. • The statistics • G 2 [( )|(G)] = 2. 1 and • G 2=[(L + S)|(G + L + S)] = 2. 2, each based on df = 4, suggest simplifying by collapsing the table over gender. (Other analyses, not presented here, show that adding interaction terms including G do not improve the fit significantly. ) • The G 2 and X 2 values for the collapsed table indicate that both L and S have effects. 2020/9/16 17

Table 7. 3 2020/9/16 18

Table 7. 4 2020/9/16 19

Prediction Equation for Log Odds of Selecting Invertebrates Instead of Fish • where s=1 for size 2. 3 meters and 0 otherwise, • z. H is a dummy variable for Lake Hancock (z. H=1 for alligators in that lake and 0 otherwise), and • z. O and z. T are dummy variables for lakes Oklawaha and Trafford. • Size of alligators has a noticeable effect. For a given lake, for small alligators the estimated odds that primary food choice was invertebrates instead of fish are exp(1. 46) = 4. 3 times the estimated odds for large alligators; • the Wald 95% confidence interval is exp[1. 46 ± 1. 96(0. 396)] = (2. 0, 9. 3). • The lake effects indicate that the estimated odds that the primary food choice was invertebrates instead of fish are relatively higher at Lakes Trafford and Oklawaha and relatively lower at Lake Hancock than they are at Lake George. 2020/9/16 20

Further Estimate Calculation 2020/9/16 21

Estimating Response Probabilities (Model) The equation that expresses multinomial logit models directly in terms of response probabilities is 2020/9/16 22

Estimating Response Probabilities (Results) • From Table 7. 4 the estimated probability that a large alligator in Lake Hancock has invertebrates as the primary food choice is • The estimated probabilities for reptile, bird, other, and fish are 0. 072, 0. 141, 0. 194, and 0. 570. 2020/9/16 23

Quality vs. Quantity 2020/9/16 24

Summary and Conclusion 2020/9/16 25