Lab 9 Survival Analysis Henian Chen M D
Lab 9 Survival Analysis Henian Chen, M. D. , Ph. D. Applied Epidemiologic Analysis P 8400 Fall 2002
Description of Data ‘SURVIVAL 65. TXT’ is data from a study on multiple myeloma in which researchers treated 65 patients with alkylating agents. Of those patients, 48 died during the study and 17 survived. The goal of this study is to identify important prognostic factors. TIME STATUS survival time in months from diagnosis 1 = dead, 0 = alive (censored) LOGBUN HGB PLATELET AGE LOGWBC FRACTURE LOGPBM PROTEIN SALCIUM log blood urea nitrogen (BUN) at diagnosis hemoglobin at diagnosis platelets at diagnosis: 0 = abnormal, 1 = normal age at diagnosis in years log WBC at diagnosis fractures at diagnosis: 0 = none, 1 = present log percentage of plasma cells in bone marrow proteinuria at diagnosis serum calcium at diagnosis Applied Epidemiologic Analysis P 8400 Fall 2002
LIFETEST procedure Estimation of the distribution of the survival times by nonparametric methods 1. Kaplan-Meier method (also called product-limit method) 2. life table method Applied Epidemiologic Analysis P 8400 Fall 2002
Kaplan-Meier Estimates for Total Sample proc import datafile='a: survival 65. txt' out=survival 65 dbms=tab replace; run; proc lifetest data=survival 65 method=km plots=(s, lls); title 'Distribution of Survival Times for 65 Myeloma Patients by Kaplan-Meier Method'; time*status(0); run; Method=KM or PL: Kaplan-Meier (KM) or product-limit (PL) estimates Method=LT or LIFE: life table estimates By default, Method=PL. Plots=S: plot the survival curve (estimated survival distribution function (SDF) against time) Applied Epidemiologic Analysis P 8400 Fall 2002
Applied Epidemiologic Analysis P 8400 Fall 2002
Plots = LLS plot the log[-log(estimated SDF)] against log(time) to show the distribution of the survival time. Exponential Distribution the hazard function is constant and does not depend on time, the graph is approximately a straight line, the slope is 1. Weibull Distribution the hazard function changes with time, the graph is approximately a straight line, but the slope is not 1. Applied Epidemiologic Analysis P 8400 Fall 2002
Applied Epidemiologic Analysis P 8400 Fall 2002
Life Table Estimates for Total Sample proc lifetest data=survival 65 method=lt plots=(s, lls) width=10; title 'Distribution of Survival Times for 65 Myeloma Patients by Life Table Method'; time*status(0); run; Applied Epidemiologic Analysis P 8400 Fall 2002
Comparison of Two Survival Curves for Normal Platelet and Abnormal Platelet by Kaplan-Meier proc lifetest data=survival 65 method=km plots=(s, lls); time*status(0); strata platelet; run; Log-Rank test for Weibull distribution or proportional hazards assumption, using weight=1 so that each failure time has equal weighting. Wilcoxon test For lognormal distribution, using weight=the total number at risk at that time so that earlier times receive greater weight than later times, placing less emphasis on the later failure times. -2 Log(LR) : Likelihood Ratio test for exponential distribution survival data. Applied Epidemiologic Analysis P 8400 Fall 2002
PHREG procedure PHREG performs regression analysis of survival data based on the Cox proportional hazards model. Procedure PHREG also performs conditional logistic regression analysis for matched case-control studies Applied Epidemiologic Analysis P 8400 Fall 2002
SAS Program for Cox Model proc import datafile='a: survival 65. txt' out=survival 65 dbms=tab replace; run; proc phreg data=survival 65; model time*status(0)= logbun hgb platelet age logwbc fracture logpbm protein calcium /selection=stepwise details rl; run; Applied Epidemiologic Analysis P 8400 Fall 2002
TIME : survival time in months from diagnosis STATUS: 1 = dead, 0 = alive (censored) Cox Regression Model model time*status(0)= nine independent variables; Linear Regression Model model time= nine independent variables; Can we fit a linear regression model for this data? NO !! 1. We don’t know the distribution of the survival times. 2. Linear regression model treats the censored data as non-censored data. Applied Epidemiologic Analysis P 8400 Fall 2002
Logistic Regression Model model status= nine independent variables; Can we fit a logistic regression model for this data? NO !! It is not right to use logistic regression to fit the survival data because it treats different strata (different time points) as one stratum (the last time point). It is not right if you don’t use the information of “time” when you have it. You have to use logistic regression if you don’t have the information of “survival time”. Applied Epidemiologic Analysis P 8400 Fall 2002
Patient A: time=10 months, status=1 (dead) id time status sex age A 0 0 F 30. 00 A 1 0 F 30. 08 A 2 0 F 30. 17 A 3 0 F 30. 25 A 4 0 F 30. 33 A 5 0 F 30. 42 A 6 0 F 30. 50 A 7 0 F 30. 58 A 8 0 F 30. 67 A 9 0 F 30. 75 A 10 1 F 30. 83 Logistic regression (one stratum) Applied Epidemiologic Analysis P 8400 Fall 2002 Survival Analysis (11 strata)
LIFEREG procedure Procedure LIFEREG fits parametric models for survival data by using maximum likelihood. If you has clear idea about the distribution of survival times, you should use parametric models Applied Epidemiologic Analysis P 8400 Fall 2002
SAS Program for 7 Parametric Models (using the SURVIVAL 65. TXT DATA) proc lifereg data=survival 65; class platelet fracture; model time*status(0)=logbun hgb platelet age logwbc fracture logpbm protein calcium /distribution = weibull; run; WEIBULL EXPONENTIAL GAMMA LLOGISTIC LNORMAL LOGISTIC NORMAL Weibull distribution exponential distribution generalized gamma distribution loglogistic distribution lognormal distribution logistic distribution normal distribution Applied Epidemiologic Analysis P 8400 Fall 2002
Results of the Weibull Regression Model Parameter Intercept LOGBUN HGB PLATELET AGE LOGWBC FRACTURE LOGPBM PROTEIN CALCIUM Scale Weibull Shape DF Estimate 1 1 1 7. 0556 -1. 5325 0. 0970 -0. 2655 0. 0103 -0. 4008 0. 3324 -0. 3871 -0. 0092 -0. 0998 0. 8671 1. 1532 Standard Error 2. 7719 0. 5412 0. 0621 0. 4557 0. 0169 0. 5989 0. 3526 0. 4290 0. 0228 0. 0898 0. 0927 0. 1233 95% Confidence Limits 1. 6228 -2. 5932 -0. 0248 -1. 1585 -0. 0228 -1. 5746 -0. 3587 -1. 2280 -0. 0540 -0. 2758 0. 7032 0. 9351 12. 4883 -0. 4718 0. 2187 0. 6276 0. 0434 0. 7729 1. 0234 0. 4538 0. 0356 0. 0761 1. 0694 1. 4222 Chi. Square Pr > Chi. Sq 6. 48 8. 02 2. 44 0. 37 0. 45 0. 89 0. 81 0. 16 1. 24 0. 0109 0. 0046 0. 1185 0. 5602 0. 5411 0. 5033 0. 3459 0. 3670 0. 6862 0. 2661 -- An increase in one unit of the LOGBUN increases the log of the hazard of dying by 1. 5325, controlling for other variables -- An increase in one unit of the LOGBUN increases the hazard of dying by 363% [exp(1. 5325)=4. 63 -1] *The coefficients are expected to have opposite signs for parametric Applied Epidemiologic Analysis models P 8400 Fall 2002
- Slides: 17