GENERALIZED LINEAR MODELS Copyright 2013 SAS Institute Inc

  • Slides: 45
Download presentation
GENERALIZED LINEAR MODELS Copyright © 2013, SAS Institute Inc. All rights reserved.

GENERALIZED LINEAR MODELS Copyright © 2013, SAS Institute Inc. All rights reserved.

GENERALIZED LINEAR MODELS OVERVIEW Copyright © 2013, SAS Institute Inc. All rights reserved.

GENERALIZED LINEAR MODELS OVERVIEW Copyright © 2013, SAS Institute Inc. All rights reserved.

OVERVIEW GENERAL LINEAR MODELS Actually, proc glm Copyright © 2013, SAS Institute Inc. All

OVERVIEW GENERAL LINEAR MODELS Actually, proc glm Copyright © 2013, SAS Institute Inc. All rights reserved.

OVERVIEW GENERALIZED LINEAR MODELS … The distribution of the observations can come from the

OVERVIEW GENERALIZED LINEAR MODELS … The distribution of the observations can come from the exponential family of distributions. • The variance of the response variable is a specified function of its mean. • X is fit to a function of E(y) (called a link function) suggested by the distribution of the observations: • g(E(y)) = g( ) = X Link function Copyright © 2013, SAS Institute Inc. All rights reserved.

OVERVIEW LOGIT LINK FUNCTION FOR BINARY RESPONSE Logit (pi) pi Logit Transform Predictor Copyright

OVERVIEW LOGIT LINK FUNCTION FOR BINARY RESPONSE Logit (pi) pi Logit Transform Predictor Copyright © 2013, SAS Institute Inc. All rights reserved. Predictor

OVERVIEW LOG LINK FUNCTION FOR COUNT DATA Count Log(count) Log Transform Predictor Copyright ©

OVERVIEW LOG LINK FUNCTION FOR COUNT DATA Count Log(count) Log Transform Predictor Copyright © 2013, SAS Institute Inc. All rights reserved. Predictor

OVERVIEW EXAMPLES OF GENERALIZED LINEAR MODELS *Models often use the LOG link in practice.

OVERVIEW EXAMPLES OF GENERALIZED LINEAR MODELS *Models often use the LOG link in practice. Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON PROPERTIES AND EXAMPLES REGRESSION • is one type of generalized linear model •

POISSON PROPERTIES AND EXAMPLES REGRESSION • is one type of generalized linear model • assumes that the response variable follows a Poisson distribution conditional on the values of the predictor variables • can be used to model the number of occurrences of an event of interest or the rate of occurrence of an event of interest as a function of some predictor variables • is most appropriate for rare events • Response dist. should have small mean (<10 or even <5 and ideally ~1) • If no, gamma and lognormal could be better choice Copyright © 2013, SAS Institute Inc. All rights reserved. Examples include • number of ear infections in infants • number of equipment failures • colony counts for bacteria or viruses • counts of a rare disease in a population • number of fatal crashes at an intersection • homicide rates in a given state • rate of insurance claims • number of infected areas per unit volume of a tree • response rates to a marketing campaign

POISSON VERSUS NORMAL DISTRIBUTION REGRESSION Poisson distribution • is skewed to the right for

POISSON VERSUS NORMAL DISTRIBUTION REGRESSION Poisson distribution • is skewed to the right for rare events • is for nonnegative integer values • has only one parameter (the mean) • has a variance that is equal to the mean Copyright © 2013, SAS Institute Inc. All rights reserved. Normal distribution • is symmetric • can be for negative as well as positive real values • has two unrelated parameters (mean and variance)

POISSON MODEL REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON MODEL REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON PARAMETER ESTIMATES REGRESSION multiplicative effect on for a one-unit change in X. Example

POISSON PARAMETER ESTIMATES REGRESSION multiplicative effect on for a one-unit change in X. Example 1, if 1. 20, then a one-unit increase in X 1 yields a 20% increase in the estimated mean. Example 2, if 0. 80, then a one-unit increase in X 2 yields a 20% decrease in the estimated mean. Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON ПРИМЕР: ДАННЫЕ REGRESSION Number of Self-Diagnosed Ear Infections Age in Years Frequent or

POISSON ПРИМЕР: ДАННЫЕ REGRESSION Number of Self-Diagnosed Ear Infections Age in Years Frequent or Occasional Ocean Swimmer Typical Swimming Location Gender Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON CATEGORICAL REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved. Typical Swimming

POISSON CATEGORICAL REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved. Typical Swimming Location Male Female Gender non. Beach Occasional Frequent or Occasional Ocean Swimmer

POISSON INTERVAL REGRESSION Age in Years Copyright © 2013, SAS Institute Inc. All rights

POISSON INTERVAL REGRESSION Age in Years Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON ПРИМЕР REGRESSION proc genmod data=sasuser. earinfection; class Swimmer (param=ref ref='Freq') Location (param=ref ref='Beach')

POISSON ПРИМЕР REGRESSION proc genmod data=sasuser. earinfection; class Swimmer (param=ref ref='Freq') Location (param=ref ref='Beach') Gender (param=ref ref='Male'); model infections = swimmer location gender age*age / dist=poisson link=log type 3; run; Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON ПРИМЕР: PROC GENMOD OUTPUT REGRESSION Scale = 1* Copyright © 2013, SAS Institute

POISSON ПРИМЕР: PROC GENMOD OUTPUT REGRESSION Scale = 1* Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON OVERDISPERSION REGRESSION • Poisson regression models assume the variance is equal to the

POISSON OVERDISPERSION REGRESSION • Poisson regression models assume the variance is equal to the mean. • Count data often exhibit variability exceeding the mean. • Overdispersion leads to underestimates of the standard errors of parameter estimates. WHAT TO DO Use the negative binomial distribution [NOW] • Apply a multiplicative adjustment factor (PSCALE or DSCALE option in the MODEL statement) [HW] • • Overdispersion results in overestimates of the test statistic and liberal p-values. • Subject heterogeneity due to an under-specified model • Outliers in the data • Positive correlation between the responses in clustered data Copyright © 2013, SAS Institute Inc. All rights reserved.

NEGATIVE BINOMIAL REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved.

NEGATIVE BINOMIAL REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved.

NEGATIVE BINOMIAL DISTRIBUTION AND MODEL REGRESSION The negative binomial distribution • is the distribution

NEGATIVE BINOMIAL DISTRIBUTION AND MODEL REGRESSION The negative binomial distribution • is the distribution for count data that permits the variance to exceed the mean • enables the model to have greater flexibility in modeling the relationship between the mean and the variance of the response variable than the Poisson model Response Variable Distribution Link Function Count Negative Binomial Natural Log Copyright © 2013, SAS Institute Inc. All rights reserved. Variance Function

NEGATIVE BINOMIAL DISPERSION PARAMETER K REGRESSION • The dispersion parameter k is not allowed

NEGATIVE BINOMIAL DISPERSION PARAMETER K REGRESSION • The dispersion parameter k is not allowed to vary over observations. • The limiting case when the parameter k is equal to 0 corresponds to a Poisson regression model. • When the parameter is greater than 0, overdispersion is evident and the standard errors will increase. The fitted values are similar, but the larger standard errors reflect the overdispersion uncaptured with the Poisson model. Copyright © 2013, SAS Institute Inc. All rights reserved.

NEGATIVE BINOMIAL ПРИМЕР REGRESSION proc genmod data=sasuser. earinfection; class Swimmer (param=ref ref='Freq') Location (param=ref

NEGATIVE BINOMIAL ПРИМЕР REGRESSION proc genmod data=sasuser. earinfection; class Swimmer (param=ref ref='Freq') Location (param=ref ref='Beach') Gender (param=ref ref='Male'); model infections = swimmer location gender age*age / dist=negbin link=log type 3; run; Copyright © 2013, SAS Institute Inc. All rights reserved.

NEGATIVE BINOMIAL ПРИМЕР: PROC GENMOD OUTPUT REGRESSION Copyright © 2013, SAS Institute Inc. All

NEGATIVE BINOMIAL ПРИМЕР: PROC GENMOD OUTPUT REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON REGRESSION FOR RATES Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON REGRESSION FOR RATES Copyright © 2013, SAS Institute Inc. All rights reserved.

POISSON RATES DATA: DEFINITION & EXAMPLES REGRESSION: RATES • When events occur over time,

POISSON RATES DATA: DEFINITION & EXAMPLES REGRESSION: RATES • When events occur over time, space, or some other index of exposure, it is more relevant to model the rate at which they occur rather than the number of events. • Rates provide the necessary standardization to make the outcomes comparable. • You use the OFFSET= option in the MODEL statement in PROC GENMOD. Copyright © 2013, SAS Institute Inc. All rights reserved. • How crime rates are related to the city’s unemployment rate • How melanoma incidence rates are related to demographic variables • How the rate of loan defaults is related to region of the country • How response rates to marketing campaigns relate to known characteristics of the recipients

POISSON RATES DATA: OFFSET REGRESSION: RATES … • Log(T) is called the offset variable

POISSON RATES DATA: OFFSET REGRESSION: RATES … • Log(T) is called the offset variable that has a coefficient equal to 1. … OFFSET = Variable … Copyright © 2013, SAS Institute Inc. All rights reserved. • The offset variable makes the fitted rate proportional to the index of exposure. • For example, using the log of the population as an offset variable is the same as modeling the mean number of events proportional to population size.

POISSON SKIN CANCER IN TEXAS AND MINNESOTA REGRESSION: RATES City: Minneapolis-St. Paul Dallas-Fort Worth

POISSON SKIN CANCER IN TEXAS AND MINNESOTA REGRESSION: RATES City: Minneapolis-St. Paul Dallas-Fort Worth Incidence of nonmelanoma skin cancer Copyright © 2013, SAS Institute Inc. All rights reserved. Age_ 15 -24 Group: 25 -34 35 -44 45 -54 55 -64 65 -74 75 -84 85+

POISSON ПРИМЕР REGRESSION: RATES proc genmod data=sasuser. skin; class City (param=ref ref='MSP') Age (param=ref

POISSON ПРИМЕР REGRESSION: RATES proc genmod data=sasuser. skin; class City (param=ref ref='MSP') Age (param=ref ref='85+'); model cases = city age / offset=log_pop dist=poisson link=log type 3; run; Copyright © 2013, SAS Institute Inc. All rights reserved.

ZERO-INFLATED POISSON MODEL Copyright © 2013, SAS Institute Inc. All rights reserved.

ZERO-INFLATED POISSON MODEL Copyright © 2013, SAS Institute Inc. All rights reserved.

ZIP PURPOSE • In some settings, the incidence of zero counts will be much

ZIP PURPOSE • In some settings, the incidence of zero counts will be much greater than expected for the Poisson distribution. • Poisson regression models will exhibit overdispersion when they are fit to data with an excess number of zeros. • Zero-inflated Poisson (ZIP) models might be a better fit to the data. Copyright © 2013, SAS Institute Inc. All rights reserved.

ZIP MODEL • The population that can be modeled with the zero-inflated Poisson distribution

ZIP MODEL • The population that can be modeled with the zero-inflated Poisson distribution is considered to consist of two types of responses. • The first type gives Poisson distributed counts, which can produce the zero outcome or some other positive outcome. • The second type always gives a zero count. • Therefore, the relevant distribution is a mixture of a Poisson distribution and a distribution that is constant at zero. Copyright © 2013, SAS Institute Inc. All rights reserved.

ZIP COMPONENTS MODEL statement ZEROMODEL statement proc genmod data=sasuser. roots; class bap photoperiod; model

ZIP COMPONENTS MODEL statement ZEROMODEL statement proc genmod data=sasuser. roots; class bap photoperiod; model roots = photoperiod | bap / dist=zip link=log type 3; zeromodel photoperiod; run; Copyright © 2013, SAS Institute Inc. All rights reserved.

ZIP ПРИМЕР: ДАННЫЕ photoperiod (hour) Copyright © 2013, SAS Institute Inc. All rights reserved.

ZIP ПРИМЕР: ДАННЫЕ photoperiod (hour) Copyright © 2013, SAS Institute Inc. All rights reserved. concentration ( M) 2. 2 4. 4 8. 8 17. 6 8 Number of roots 16 Number of roots

8 hours 16 hours ZIP ПРИМЕР Copyright © 2013, SAS Institute Inc. All rights

8 hours 16 hours ZIP ПРИМЕР Copyright © 2013, SAS Institute Inc. All rights reserved.

ZIP ПРИМЕР: РЕЗУЛЬТАТЫ dist=zinb Copyright © 2013, SAS Institute Inc. All rights reserved.

ZIP ПРИМЕР: РЕЗУЛЬТАТЫ dist=zinb Copyright © 2013, SAS Institute Inc. All rights reserved.

GAMMA REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved.

GAMMA REGRESSION Copyright © 2013, SAS Institute Inc. All rights reserved.

GAMMA DISTRIBUTION is a skewed distribution for positive values • has a variance that

GAMMA DISTRIBUTION is a skewed distribution for positive values • has a variance that is proportional to the squared mean • has lighter tails than a lognormal distribution • gamma Var(y) [E(y)]2 Copyright © 2013, SAS Institute Inc. All rights reserved.

DISTRIBUTIONS COMPARISON Distribution Variance Normal (truncated) constant* Poisson E(Y) Gamma (E(Y))2 Lognormal (E(Y))2 100

DISTRIBUTIONS COMPARISON Distribution Variance Normal (truncated) constant* Poisson E(Y) Gamma (E(Y))2 Lognormal (E(Y))2 100 x Copyright © 2013, SAS Institute Inc. All rights reserved.

GAMMA ПРИМЕР REGRESSION proc univariate data=car; var price; histogram / gamma (alpha=est sigma=est theta=est

GAMMA ПРИМЕР REGRESSION proc univariate data=car; var price; histogram / gamma (alpha=est sigma=est theta=est color=blue w=2) vaxis=0 to 14 by 2 midpoints=8 to 50 by 2; run; Copyright © 2013, SAS Institute Inc. All rights reserved.

GAMMA REG AND GENMOD RESULTS: RESIDUAL REGRESSION proc genmod data=car; model price = hwympg

GAMMA REG AND GENMOD RESULTS: RESIDUAL REGRESSION proc genmod data=car; model price = hwympg 2 horsepower / dist=gamma link=log /*identity*/ obstats id=model; run; PROC REG PROC GENMOD, link=log Copyright © 2013, SAS Institute Inc. All rights reserved. PROC GENMOD, link=identity

SUMMARY Problem: PROBLEM for OLS • nonconstant variance • Approaches: Ø Transform the dependent

SUMMARY Problem: PROBLEM for OLS • nonconstant variance • Approaches: Ø Transform the dependent variable Price (log). Ø Fit a gamma regression model with the log link function. Ø Fit a gamma regression model with the identity link function. • Copyright © 2013, SAS Institute Inc. All rights reserved. ?

СТРАХОВАНИЕ CASE STUDY Copyright © 2013, SAS Institute Inc. All rights reserved.

СТРАХОВАНИЕ CASE STUDY Copyright © 2013, SAS Institute Inc. All rights reserved.

GENMOD СТРАХОВАНИЕ • Frequency - how often claims are made • Severity • A

GENMOD СТРАХОВАНИЕ • Frequency - how often claims are made • Severity • A typical way to model severity (claim amount) is by using a gamma distribution with a log link function • Pure premium - it is the portion of the company’s expected cost that is “purely” attributed to loss • does not include the general expense of doing business • Tweedie distribution Copyright © 2013, SAS Institute Inc. All rights reserved.

GLM СТРАХОВАНИЕ: FREQUENCY & PURE PREMIUM • ZIP Copyright © 2013, SAS Institute Inc.

GLM СТРАХОВАНИЕ: FREQUENCY & PURE PREMIUM • ZIP Copyright © 2013, SAS Institute Inc. All rights reserved. • Tweedie distribution – • PROC SEVERITY SAS/ETS

СПАСИБО! Copyright © 2013, SAS Institute Inc. All rights reserved. sas. com

СПАСИБО! Copyright © 2013, SAS Institute Inc. All rights reserved. sas. com