Credit risk modeling is an essential process for financial institutions. It allows lenders to assess the probability that a borrower might default on their obligations. A credit scorecard, which is often built from such models, transforms these probabilities into a simple score that can be used for decision-making. In this guide, we’ll walk through the theory, mathematical concepts, and practical steps with Python code examples.
Credit risk modeling uses historical data to predict the likelihood of default or non-payment. It involves:
Data preprocessing: Cleaning and preparing data for modeling.
Exploratory Data Analysis (EDA): Understanding data distribution and relationships.
Model building: Most commonly with logistic regression.
Model evaluation: Using metrics like the ROC-AUC, confusion matrix, etc.
A credit scorecard converts the output of a predictive model into a score that is easy to interpret. Typically, it uses the odds (or probabilities) generated by the model and applies a scaling formula to convert them into points. Two key concepts here are:
Weight of Evidence (WOE): Transforms variables to capture their predictive power.
Points to Double Odds (PDO): A scale measure that shows how many points the score changes when the odds double.
Before modeling, it’s essential to clean and transform your dataset. Below is a snippet that demonstrates reading a dataset, cleaning up columns, and creating a target variable:
import pandas as pd
import numpy as np
# Load the dataset
= pd.read_csv('loan_data_2007_2014.csv')
loan_data
# Drop columns with too many missing values or that are irrelevant for modeling
= loan_data.shape[0] * 0.8
na_threshold =loan_data.shape[0]*0.2, axis=1, inplace=True)
loan_data.dropna(thresh=['id', 'member_id', 'sub_grade', 'emp_title', 'url', 'desc', 'title', 'zip_code',
loan_data.drop(columns'next_pymnt_d', 'recoveries', 'collection_recovery_fee', 'total_rec_prncp',
'total_rec_late_fee'], inplace=True)
# Create a binary target variable based on loan_status
'good_bad'] = np.where(loan_data['loan_status'].isin([
loan_data['Charged Off', 'Default', 'Late (31-120 days)',
'Does not meet the credit policy. Status:Charged Off'
0, 1)
]),
# Drop the original loan_status column
=['loan_status'], inplace=True) loan_data.drop(columns
Data Cleaning: Columns with high missing values are dropped.
Target Variable: The good_bad
column is generated to differentiate between good (1) and bad (0)
loans.
Exploratory analysis helps us understand our data’s structure and distribution. For example, visualizing the distribution of the target variable:
import seaborn as sns
import matplotlib.pyplot as plt
# Visualize the target variable distribution
'good_bad']).set_title('Distribution of Good vs. Bad Loans')
sns.countplot(loan_data["Loan Outcome (1: Good, 0: Bad)")
plt.xlabel("Number of Records")
plt.ylabel( plt.show()
### Explanation:
Logistic regression is popular in credit risk because it models the probability of default in a way that is easy to interpret. The model is defined as:
\[P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0+\beta_1x_1+\beta_2x_2+\ldots+\beta_kx_k)}}\]
Where:
\(P(Y=1|X)\) is the probability that a borrower is a good credit risk.
\(\beta_0\) is the intercept.
\(\beta_1, \beta_2, \ldots, \beta_k\) are the coefficients corresponding to each feature \(x_1, x_2, \ldots, x_k\).
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, confusion_matrix
# Split the data into features and target
= loan_data.drop('good_bad', axis=1)
X = loan_data['good_bad']
y
# Create a train/test split, stratified to maintain the distribution of the target variable
= train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
X_train, X_test, y_train, y_test
# Instantiate and train the logistic regression model
= LogisticRegression(max_iter=1000)
model
model.fit(X_train, y_train)
# Predict probabilities on the test set
= model.predict_proba(X_test)[:, 1]
y_pred_prob
# Evaluate the model using ROC-AUC
= roc_auc_score(y_test, y_pred_prob)
roc_auc print(f"ROC-AUC Score: {roc_auc:.2f}")
# Confusion Matrix
= model.predict(X_test)
y_pred print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
Confusion Matrix: [[ 8418 1776] [ 10 83053]]
Model Training: We train a logistic regression model to estimate the probability that a borrower is a good credit risk.
Evaluation: Metrics like ROC-AUC and the confusion matrix help determine the model’s performance.
The Receiver Operating Characteristic (ROC) curve is a plot of the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings. The Area Under the Curve (AUC) is a measure of the model’s ability to distinguish between classes.
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt
# Compute ROC curve
= roc_curve(y_test, y_pred_prob)
fpr, tpr, thresholds
# Plot ROC Curve
plt.figure()=f"ROC curve (area = {roc_auc:.2f})")
plt.plot(fpr, tpr, label0, 1], [0, 1], 'k--')
plt.plot(["False Positive Rate")
plt.xlabel("True Positive Rate")
plt.ylabel("Receiver Operating Characteristic (ROC) Curve")
plt.title(="lower right")
plt.legend(loc plt.show()
### Explanation:
Credit scorecards convert the logistic regression output into a score. The score is often calculated using the following formula:
\[\text{Score} = \text{Offset} - (\text{Factor} \times \ln(\text{Odds}))\]
Where:
Odds is calculated as:
\[\text{Odds} = \frac{P}{1-P}\]
Factor is given by:
\[\text{Factor} = \frac{\text{PDO}}{\ln(2)}\]
Here, PDO (Points to Double the Odds) is a constant chosen by the institution (e.g., 20 points).
Offset is chosen so that a given score corresponds to a baseline odds. For example, if you decide that a score of 600 corresponds to odds of 1:50, the offset can be calculated accordingly.
Suppose the PDO is 20:
import numpy as np
= 20 # Points to Double the Odds
PDO = PDO / np.log(2)
factor print(f"Factor: {factor:.2f}")
Let’s assume a baseline score S0S_0 of 600 corresponds to baseline odds O0O_0 of 1:50:
= 600
baseline_score = 1/50 # Odds are probability of default vs. non-default
baseline_odds = baseline_score + (factor * np.log(baseline_odds))
offset print(f"Offset: {offset:.2f}")
Factor: 28.85 Offset: 487.12 #### c. Transforming Model Predictions to Scores
Convert the predicted probabilities to scores using the logistic regression outputs. For each applicant, calculate:
The odds:
\[\text{Odds} = \frac{P}{1-P}\]
The score:
\[\text{Score} = \text{Offset} - (\text{Factor} \times \ln(\text{Odds}))\]
# Compute odds for each predicted probability
= y_pred_prob / (1 - y_pred_prob)
odds # Calculate score for each applicant
= offset - (factor * np.log(odds))
scores
# Show summary of the credit scores
import pandas as pd
= pd.DataFrame({'Probability': y_pred_prob, 'Score': scores})
score_summary print(score_summary.describe())
Predicted_Probability Credit_Score
count 93257.000000 93257.000000
mean 0.887445 339.019409
std 0.278539 199.812793
min 0.000001 88.491534
25% 0.949164 157.521232
50% 0.984298 367.720839
75% 0.999989 402.668335
max 0.999999 885.754219
### Explanation:
Factor and Offset Calculation: These determine the scaling of scores.
Score Transformation: This step translates a continuous probability into an interpretable credit score.
Predicted Probabilities:
The mean predicted probability is about 0.887, with most values very
high (median ~0.984, 75th percentile ~0.99999). This indicates that the
model is extremely confident that most loans are “good” (non-default),
which is typical if the dataset is imbalanced toward good
loans.
Credit Scores:
The scores range from roughly 88 to 886 with a mean of 339 and a median
of about 368. Since the score is computed as
\[\text{Score} = \text{Offset} - (\text{Factor} \times \ln(\text{Odds}))\]
and the odds are \(P/(1-P)\), loans with very high predicted probabilities (good loans) yield lower scores. This might seem counterintuitive if higher scores are expected to represent better credit quality. If you want higher scores for better loans, you may need to adjust the transformation (for example, by inverting the score or recalibrating the offset and factor).
Overall Impression:
The model shows excellent discrimination (ROC-AUC ~0.96) and is very
confident in its predictions. However, you might want to revisit the
scorecard transformation so that the scores align with typical business
expectations (i.e., higher scores indicating lower risk).
Credit risk modeling and scorecard development involve:
Data preprocessing: Cleaning, handling missing values, and engineering features.
Modeling: Using logistic regression to predict default probabilities.
Evaluation: Leveraging metrics like ROC-AUC to gauge model performance.
Scorecard Construction: Transforming model outputs into actionable scores using scaling factors like PDO, offset, and weight of evidence.
The process not only provides insights into creditworthiness but also standardizes decisions in lending. While our examples use logistic regression for its simplicity and interpretability, other techniques (like decision trees, ensemble methods, or neural networks) can also be applied depending on the complexity and requirements of the task.