- Very similar to linear or logistic regression models except that the dependent variable is a measure of the timing or rate of event occurrence
- Most method of survival analysis require that the event time be measured with respect to some origin time
- Ideally the origin time is the same as the time at which observations begin and most software program assume that it is the case
- Might need to take into account late entry or left truncation
- Censoring is endemic to survival data
- Any report of survival analysis should discuss the type, cause and treatment of censoring
- Most common type of censoring is right censoring when an observation is terminated before an individual experiences an event
- Censoring could be informative if it occurs at varying time because individuals drop out of the study
- Slightly less common type of censoring is interval censoring when the exact time not not known, only between two point in time
- If you know the exact time at which an event occurs, use methods that treat time as continuous
- If not use discrete method (like when you only know the month or the year of the event)
- For discrete method you must choose between a logit model and a complementary log-log model but in practice the choice is usually not consequential
- Logit is more appropriate for truly discrete events
- The most popular method for regression analysis of survival data is the Cox regression
- Cox regression is semi parametric
- However parametric methods are much better at handling left censoring or interval censoring and can generate predicted times to events
- One major difference between survival regression and conventional linear regression is the possibility of time dependent covariates
- If the data contain information on more than one event for each individual then special methods are needed to take advantage of the additional information
- Repeated events provide more statistical power
- Likely to be statistical dependence among those observations
- There are four methods to provide correction for repeat events 1) Robust standard errors (Huber-white or sandwich estimates 2) Generalised estimating equation (GEE) 3) Random effect (mixed) models 4)Fixed effect methods
- Stata will estimate random effects models for Cox regression but SAS wont
- If event times are discrete, maximum likelihood estimation requires that models are estimated simultaneously suing the generalized logit model (no equivalent for log-log)
- Conventional wisdom has it that there should be at least 5 (some say 10) events for each parameter in the model in order for max likelihood estimates to have reasonably good properties
- Imputing values from random draws from the predictive distribution of the missing value. Generate several dataset (5 or more) each with slightly different imputed values. Then combine into a single set of parameters estimates
- For survival analysis imputation should only be done on the predictor variables. Cases on dependent variable should just be deleted
- Compare not nested models with AIC, SBC or BIC
- Preference is given to models with the lowest values of those statistics, although no p-values can be calculated
- Magnitudes of beta coefficients (hazard ratios) are difficult to interpret
- Hazard ratios (always positive) are confusing because a value of 1 means no effect
- The numeric value as a more straight forward value 100(HR-1)/100 is the percentage change in the hazard for one unit increase in the predictor
- Hazard ratios are asymmetric no can not use standard errors. Report 95% confidence levels instead
- Other stats can be chi-square test for the null hypothesis that all coefficients are zero
Survival analysis
Original by P.D. Allison, 2012, 12 pages
Latest Hamster Notes
- Measure what Matters posted in Management
- PSPO I posted in Agile
- Stuff on Scrum posted in Agile
- 3 tips to create a courageous space posted in Management
- The Lean Strategy posted in Management Personal Development
- 6 traits of an inclusive leader posted in Management
- Myers Briggs Type Indicator posted in Personal Development
- Positive Influence posted in Management
- Start with Why posted in Management
- 4 steps to optimise product value posted in Agile Management