Generalized Linear Models Xuhua Xia xxiauottawa ca http

  • Slides: 14
Download presentation
Generalized Linear Models Xuhua Xia xxia@uottawa. ca http: //dambe. bio. uottawa. ca

Generalized Linear Models Xuhua Xia xxia@uottawa. ca http: //dambe. bio. uottawa. ca

Ex 1: Spaceship launch • Jan. 28, 1986, the tenth flight of Space Shuttle

Ex 1: Spaceship launch • Jan. 28, 1986, the tenth flight of Space Shuttle Challenger (OV-99): failure at 73 seconds into its flight, killing all seven crew members. The spacecraft disintegrated over the Atlantic Ocean, off the coast of Cape Canaveral, Florida, at 11: 39 EST, due to failure of an O-ring seal. • Forecasts for January 28 predicted an unusually cold morning, with temperatures close to − 1 °C (30 °F), the minimum temperature permitted for launch. • Relationship between O-ring failure and temperature Xuhua Xia

Data Failure • In the past 24 launches, the failure of at least one

Data Failure • In the past 24 launches, the failure of at least one O- Launch. T 53 1 ring is recorded (1: failure, 0: success) with 56 1 57 1 temperature in Fahrenheit. 63 0 0 • Objective: What is probability of failure (p) at a given 66 67 0 temperature (T)? Such a relationship should allow us to 67 0 0 predict p given T and decide whether we should take 67 68 0 the chance or not. 69 0 70 0 • This is a problem for logistic regression which is a 70 1 special case of generalized linear model. 70 1 0 • However, we can quickly demonstrate the relationship 72 73 0 between T and p by observing that the mean T is 64. 4 75 0 75 1 for failure and 72. 2 for sucess, which differ 76 0 significantly (p = 0. 0156, two-tailed). Thus, low 76 0 78 0 temperature is associated with O-ring failure. 79 0 0 • However, what is the probability of failure at 30°F? 80 81 Xuhua Xia 0

Logistic regression • p: probability of failure • logit: ln(p/q), where q = 1

Logistic regression • p: probability of failure • logit: ln(p/q), where q = 1 -p • linear model specifying the relationship between p and T: ln(p/q) = a+b*T, i. e. , p is a function of T with parameters a and b • How to obtain the best estimate of a and b? • There are various approaches to estimate a and b: • The simplest is just to guess: – Failure is less likely in high temperature, i. e. , p decreases with increasing T, so b should be negative. – If T = 0, then p must be close to one (O-rign is not designed to work under such low temperature), so ln(p/q) must be very large, i. e. , a >>0.

Consequences of various guesses a=7 b=-0. 12 a=10. 875 b=-0. 171 Who has the

Consequences of various guesses a=7 b=-0. 12 a=10. 875 b=-0. 171 Who has the best guess? a=10 b=-0. 1 Model and parameter estimation We want to find a and b that maximizes the likelihood or loglikelihood (ln. L) Xuhua Xia

The likelihood method E(p) To find a and b that maximizes ln. L, double-click

The likelihood method E(p) To find a and b that maximizes ln. L, double-click the EXCEL insert, copy/paste to an EXCEL sheet and use SOLVER to solve for a and b. Xuhua Xia

Analytically: Take partial derivatives of ln. L with respect to a and b, set

Analytically: Take partial derivatives of ln. L with respect to a and b, set the two partial derivatives to 0 and solve the two simultaneous equations for a and b. Xuhua Xia

Or use R: Copy the first two columns of data (Launch. T and Failure)

Or use R: Copy the first two columns of data (Launch. T and Failure) md <- read. table("clipboard", header=T) fit <- glm(Failure~Launch. T, binomial, data=md) summary(fit)$coefficients[, 2] # get s_a and s_b confint(fit) # 95% CI for the coefficients predict(fit, type="response") # predicted values Call: glm(formula = Failure ~ Launch. T, family = binomial, data = md) Deviance Residuals: Min 1 Q Median -1. 2125 -0. 8253 -0. 4706 3 Q 0. 5907 Max 2. 0512 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 10. 87535 5. 70291 1. 907 0. 0565. Launch. T -0. 17132 0. 08344 -2. 053 0. 0400 *

Fitted curve and extrapolated p Xuhua Xia

Fitted curve and extrapolated p Xuhua Xia

Still birth & toxicity N_sb 15 17 22 38 144 N_fetus 297 242 312

Still birth & toxicity N_sb 15 17 22 38 144 N_fetus 297 242 312 299 285 Concentration 0 62. 5 125 250 500 • Fit GLM in EXCEL • Compute null and residual deviance and perform LRT • Compute deviance residual md<-read. table("clipboard", header=T) attach(md) N_good<-N_fetus-N_sb Y<-cbind(N_sb, N_good) fit<-glm(Y~Concentration, family=binomial) summary(fit)$coefficients[, 1] summary(fit)$coefficients[, 2] # resid(fit, type='response')*fit$prior. weights confint(fit) # 95% CI for the coefficients predict(fit, type="response") # predicted values Xuhua Xia

GLM vs conventional transformation Take logistic regression for example (probit or logit transformation versus

GLM vs conventional transformation Take logistic regression for example (probit or logit transformation versus GLM 1. probit or logit transformation of p cannot handle proportions of 0 and 1, but GLM can. 2. using proportions obscure differences in sample size, i. e. , 0. 5 can be from 1/2 or from 100/200, but GLM incorporate the numbers in the likelihood function. Xuhua Xia

Survival data of the Titanic available in R: Titanic Class Crew does not have

Survival data of the Titanic available in R: Titanic Class Crew does not have Child category, so: df = 2 for Class: Age, but df = 3 for Class: Sex df = 2 for 3 -way interaction, e. g. if background is 1 st_Female_Adult, then 3 -way differences include: 2 nd_male_child Survive 3 rd_male_child Class Sex Age Yes No Copy the red part and read the clipboard in R 1 st 1 st 2 nd 2 nd 3 rd 3 rd Crew Female Male Female Male Adult Child Adult Child Adult 140 1 57 5 80 13 14 11 76 14 75 13 20 192 4 0 118 0 13 0 154 0 89 17 387 35 3 670

Logistic regression: Survival md<-read. table("Titanic. txt", header=T) attach(md) # full. Model <-glm(Survived~Class*Sex*Age, family=binomial, weights=Freq)

Logistic regression: Survival md<-read. table("Titanic. txt", header=T) attach(md) # full. Model <-glm(Survived~Class*Sex*Age, family=binomial, weights=Freq) Survived<-cbind(Yes, No) full. Model <-glm(Survived~Class*Sex*Age, family=binomial) confint(full. Model) # 95% CI for the coefficients predict(full. Model, type="response") # predicted p_survival values fit<-step(full. Model, direction="both") # will find the best model summary(fit)$coefficients[, 2] # get SE confint(fit) # 95% CI for the coefficients nd<-data. frame(md, Survival=predict(fit, type="response")) # predicted values Coefficients: (1 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) 3. 55535 0. 50709 7. 011 2. 36 e-12 Class 2 nd -1. 73827 0. 58870 -2. 953 0. 00315 Class 3 rd -3. 77275 0. 52878 -7. 135 9. 69 e-13 Class. Crew -1. 65823 0. 80030 -2. 072 0. 03826 Sex. Male -4. 28298 0. 53213 -8. 049 8. 36 e-16 Age. Child 15. 28493 392. 50617 0. 039 0. 96894 Class 2 nd: Sex. Male 0. 06801 0. 67120 0. 101 0. 91929 Class 3 rd: Sex. Male 2. 89768 0. 56364 5. 141 2. 73 e-07 Class. Crew: Sex. Male 1. 13608 0. 82048 1. 385 0. 16616 Class 2 nd: Age. Child 2. 19967 520. 81278 0. 004 0. 99663 Class 3 rd: Age. Child -14. 94702 392. 50626 -0. 038 0. 96962 *** ***

Predicted survival > nd<-data. frame(md, Survival=predict(fit, type="response")) > nd Class Sex Age Yes No

Predicted survival > nd<-data. frame(md, Survival=predict(fit, type="response")) > nd Class Sex Age Yes No Survival 1 1 st Female Adult 140 4 0. 97222222 2 1 st Female Child 1 0 1. 0000 3 1 st Male Adult 57 118 0. 32571429 4 1 st Male Child 5 0 1. 0000 5 2 nd Female Adult 80 13 0. 86021505 6 2 nd Female Child 13 0 1. 0000 7 2 nd Male Adult 14 154 0. 08333333 8 2 nd Male Child 11 0 1. 0000 9 3 rd Female Adult 76 89 0. 44586174 10 3 rd Female Child 14 17 0. 53009073 11 3 rd Male Adult 75 387 0. 16760349 12 3 rd Male Child 13 35 0. 22014973 13 Crew Female Adult 20 3 0. 86956522 14 Crew Male Adult 192 670 0. 22273782 Xuhua Xia