- Very similar to linear or logistic regression models except that the dependent variable is a measure of the timing or rate of event occurrence
- Most method of survival analysis require that the event time be measured with respect to some origin time
- Ideally the origin time is the same as the time at which observations begin and most software program assume that it is the case
- Might need to take into account late entry or left truncation
- Censoring is endemic to survival data
**Any report of survival analysis should discuss the type, cause and treatment of censoring**- Most common type of censoring is right censoring when an observation is terminated before an individual experiences an event
- Censoring could be informative if it occurs at varying time because individuals drop out of the study
- Slightly less common type of censoring is interval censoring when the exact time not not known, only between two point in time
**If you know the exact time at which an event occurs, use methods that treat time as continuous**- If not use discrete method (like when you only know the month or the year of the event)
- For discrete method you must choose between a logit model and a complementary log-log model but in practice the choice is usually not consequential
- Logit is more appropriate for truly discrete events
- The most popular method for regression analysis of survival data is the Cox regression
**Cox regression is semi parametric**- However parametric methods are much better at handling left censoring or interval censoring and can generate predicted times to events
- One major difference between survival regression and conventional linear regression is the possibility of time dependent covariates
- If the data contain information on more than one event for each individual then special methods are needed to take advantage of the additional information
**Repeated events provide more statistical power**- Likely to be statistical dependence among those observations
- There are four methods to provide correction for repeat events 1) Robust standard errors (Huber-white or sandwich estimates 2) Generalised estimating equation (GEE) 3) Random effect (mixed) models 4)Fixed effect methods
- Stata will estimate random effects models for Cox regression but SAS wont
- If event times are discrete, maximum likelihood estimation requires that models are estimated simultaneously suing the generalized logit model (no equivalent for log-log)
- Conventional wisdom has it that there should be at least 5 (some say 10) events for each parameter in the model in order for max likelihood estimates to have reasonably good properties
- Imputing values from random draws from the predictive distribution of the missing value. Generate several dataset (5 or more) each with slightly different imputed values. Then combine into a single set of parameters estimates
**For survival analysis imputation should only be done on the predictor variables**. Cases on dependent variable should just be deleted**Compare not nested models with AIC, SBC or BIC**- Preference is given to models with the
**lowest values of those statistics**, although no p-values can be calculated - Magnitudes of beta coefficients (hazard ratios) are difficult to interpret
**Hazard ratios (always positive) are confusing because a value of 1 means no effect**- The numeric value as a more straight forward value 100(HR-1)/100 is the percentage change in the hazard for one unit increase in the predictor
- Hazard ratios are asymmetric no can not use standard errors.
**Report 95% confidence levels**instead - Other stats can be chi-square test for the null hypothesis that all coefficients are zero

# Survival analysis

Original by P.D. Allison, 2012, 12 pages## Latest Hamster Notes

- Start with Why posted in Management
- 4 steps to optimise product value posted in Agile Management
- 5 paradoxes of digital business leadership posted in Agile
- Product Vision, 10 tips posted in Agile
- Progress in adopting the Principles for effective risk data aggregation and risk reporting 2023 posted in Credit risk Finance
- Leadership styles by ChatGPT posted in Management
- Gen AI round table posted in Machine Learning
- Leadership in fully remote teams posted in Management
- Leadership for the reluctant leader posted in Management
- Corporate Credit Risk Modelling and the Macroeconomy posted in Finance