Credit scoring with macroeconomic variables using survival analysis

Survival analysis is used to study time of failure and allows to model not just if a borrower will default but when

Use macroeconomic variables: Bank interest rates, unemployment index, house price
Other variables: income, age, housing and employment status
Bank interest rates is the most significant variable for credit cards
Macro economic time series data can naturally be incorporated into survival model as time varying covariates (TVC)
Inclusion of macroeconomic variables gives a statistical significant explanatory model
A rise in consumer confidence is expected to increase risk as they will be more likely to consume and borrow making repayment more difficult

Use Cox proportional hazard (PH) survival model to model the time of default of each case
Give more weight to bad cases as data sample contains a large proportion of good cases with respect to bad ones
Censored data is data for recent applications or data that has not defaulted yet
Survival data is analysed through the hazard function which is the probability that an account has not defaulted by some time t after the account has been opened
Application data is fixed with respect to time while macroeconomic variables change over time. The value of the covariate is given as the value of the macroeconomic variable at the time of failure
Due to the large size of the training set, processing time was long. The model selection hence did not use forward or backwards selection

Importance of variable measured by the standardised marginal effect: absolute value of marginal effect times the standard deviation of the variable. Gives approximate relative importance of variables
Use cost function (bit like H measure) to determine the value of prediction (rather than looking at ROC curve) with wrongly predicted as good going to bad costs 20 against 0 for correctly identified and good case identified as bad costs 1
ROC curve can give misleading conclusion (Hand 2005)
Cut-off threshold computed for each model to minimise total cost of errors on the training set. Analysis repeated with cut-offs caluclted on test sample for comparison
Mean cost per observation is computed on th test set for each model as the sum of costs of errors for all cases int he test set. Low mean = good performance
Significance level using Wal statistic (p-values below 0.05 or 0.01)

In practice this model can be used for credit scoring by the incorporation of forecast of macroeconomic conditions onto the assessment of credit card application over a period of 12 months
Because a forecast is used one can use the model for stress testing replacing forecast with stressed forecast

Related Hamster Notes