Article

Credit Risk Modeling and Credit Scores using Logistic Regression with Python

Credit risk modeling is an essential process for financial institutions. It allows lenders to assess the probability that a borrower might default on their obligations. A credit scorecard, which is often built from such models, transforms these probabilities into a simple score that can be used for decision-making. In this guide, we’ll walk through the theory, mathematical concepts, and practical steps with Python code examples.

1. Introduction

What is Credit Risk Modeling?

Credit risk modeling uses historical data to predict the likelihood of default or non-payment. It involves:

Data preprocessing: Cleaning and preparing data for modeling.
Exploratory Data Analysis (EDA): Understanding data distribution and relationships.
Model building: Most commonly with logistic regression.
Model evaluation: Using metrics like the ROC-AUC, confusion matrix, etc.

What is a Credit Scorecard?

A credit scorecard converts the output of a predictive model into a score that is easy to interpret. Typically, it uses the odds (or probabilities) generated by the model and applies a scaling formula to convert them into points. Two key concepts here are:

Weight of Evidence (WOE): Transforms variables to capture their predictive power.
Points to Double Odds (PDO): A scale measure that shows how many points the score changes when the odds double.

2. Data Preparation and Feature Engineering

Before modeling, it’s essential to clean and transform your dataset. Below is a snippet that demonstrates reading a dataset, cleaning up columns, and creating a target variable:

import pandas as pd
import numpy as np

# Load the dataset
loan_data = pd.read_csv('loan_data_2007_2014.csv')

# Drop columns with too many missing values or that are irrelevant for modeling
na_threshold = loan_data.shape[0] * 0.8
loan_data.dropna(thresh=loan_data.shape[0]*0.2, axis=1, inplace=True)
loan_data.drop(columns=['id', 'member_id', 'sub_grade', 'emp_title', 'url', 'desc', 'title', 'zip_code', 
                          'next_pymnt_d', 'recoveries', 'collection_recovery_fee', 'total_rec_prncp', 
                          'total_rec_late_fee'], inplace=True)

# Create a binary target variable based on loan_status
loan_data['good_bad'] = np.where(loan_data['loan_status'].isin([
    'Charged Off', 'Default', 'Late (31-120 days)', 
    'Does not meet the credit policy. Status:Charged Off'
]), 0, 1)

# Drop the original loan_status column
loan_data.drop(columns=['loan_status'], inplace=True)

Explanation:

Data Cleaning: Columns with high missing values are dropped.
Target Variable: The good_bad column is generated to differentiate between good (1) and bad (0) loans.

3. Exploratory Data Analysis (EDA)

Exploratory analysis helps us understand our data’s structure and distribution. For example, visualizing the distribution of the target variable:

import seaborn as sns
import matplotlib.pyplot as plt

# Visualize the target variable distribution
sns.countplot(loan_data['good_bad']).set_title('Distribution of Good vs. Bad Loans')
plt.xlabel("Loan Outcome (1: Good, 0: Bad)")
plt.ylabel("Number of Records")
plt.show()

Pasted image 20250326104432.png ### Explanation:

Visualization: A count plot is used to see the balance between good and bad loans. This insight is crucial because imbalanced classes might require special handling during modeling.

4. Logistic Regression for Credit Risk Modeling

The Logistic Regression Model

Logistic regression is popular in credit risk because it models the probability of default in a way that is easy to interpret. The model is defined as:

\[P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0+\beta_1x_1+\beta_2x_2+\ldots+\beta_kx_k)}}\]

Where:

\(P(Y=1|X)\) is the probability that a borrower is a good credit risk.
\(\beta_0\) is the intercept.
\(\beta_1, \beta_2, \ldots, \beta_k\) are the coefficients corresponding to each feature \(x_1, x_2, \ldots, x_k\).

Code Example: Splitting Data and Training Logistic Regression

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, confusion_matrix

# Split the data into features and target
X = loan_data.drop('good_bad', axis=1)
y = loan_data['good_bad']

# Create a train/test split, stratified to maintain the distribution of the target variable
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Instantiate and train the logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict probabilities on the test set
y_pred_prob = model.predict_proba(X_test)[:, 1]

# Evaluate the model using ROC-AUC
roc_auc = roc_auc_score(y_test, y_pred_prob)
print(f"ROC-AUC Score: {roc_auc:.2f}")

# Confusion Matrix
y_pred = model.predict(X_test)
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Confusion Matrix: [[ 8418 1776] [ 10 83053]]

Explanation:

Model Training: We train a logistic regression model to estimate the probability that a borrower is a good credit risk.
Evaluation: Metrics like ROC-AUC and the confusion matrix help determine the model’s performance.

5. Model Evaluation Metrics

ROC Curve and AUC

The Receiver Operating Characteristic (ROC) curve is a plot of the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings. The Area Under the Curve (AUC) is a measure of the model’s ability to distinguish between classes.

from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

# Compute ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)

# Plot ROC Curve
plt.figure()
plt.plot(fpr, tpr, label=f"ROC curve (area = {roc_auc:.2f})")
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver Operating Characteristic (ROC) Curve")
plt.legend(loc="lower right")
plt.show()

Pasted image 20250326105806.png ### Explanation:

ROC Curve Plot: Visualizes the trade-off between the TPR and FPR, which helps in selecting a threshold for classification.

6. Developing a Credit Scorecard

Credit scorecards convert the logistic regression output into a score. The score is often calculated using the following formula:

\[\text{Score} = \text{Offset} - (\text{Factor} \times \ln(\text{Odds}))\]

Where:

Odds is calculated as:

\[\text{Odds} = \frac{P}{1-P}\]
Factor is given by:

\[\text{Factor} = \frac{\text{PDO}}{\ln(2)}\]

Here, PDO (Points to Double the Odds) is a constant chosen by the institution (e.g., 20 points).
Offset is chosen so that a given score corresponds to a baseline odds. For example, if you decide that a score of 600 corresponds to odds of 1:50, the offset can be calculated accordingly.

Step-by-Step Example

a. Calculate the Factor

Suppose the PDO is 20:

import numpy as np

PDO = 20  # Points to Double the Odds
factor = PDO / np.log(2)
print(f"Factor: {factor:.2f}")

b. Calculate the Offset

Let’s assume a baseline score S0S_0 of 600 corresponds to baseline odds O0O_0 of 1:50:

baseline_score = 600
baseline_odds = 1/50  # Odds are probability of default vs. non-default
offset = baseline_score + (factor * np.log(baseline_odds))
print(f"Offset: {offset:.2f}")

Factor: 28.85 Offset: 487.12 #### c. Transforming Model Predictions to Scores

Convert the predicted probabilities to scores using the logistic regression outputs. For each applicant, calculate:

The odds:

\[\text{Odds} = \frac{P}{1-P}\]
The score:

\[\text{Score} = \text{Offset} - (\text{Factor} \times \ln(\text{Odds}))\]

# Compute odds for each predicted probability
odds = y_pred_prob / (1 - y_pred_prob)
# Calculate score for each applicant
scores = offset - (factor * np.log(odds))

# Show summary of the credit scores
import pandas as pd
score_summary = pd.DataFrame({'Probability': y_pred_prob, 'Score': scores})
print(score_summary.describe())

           Predicted_Probability  Credit_Score
count           93257.000000  93257.000000
mean                0.887445    339.019409
std                 0.278539    199.812793
min                 0.000001     88.491534
25%                 0.949164    157.521232
50%                 0.984298    367.720839
75%                 0.999989    402.668335
max                 0.999999    885.754219

Pasted image 20250326110619.png ### Explanation:

Factor and Offset Calculation: These determine the scaling of scores.
Score Transformation: This step translates a continuous probability into an interpretable credit score.
Predicted Probabilities:
The mean predicted probability is about 0.887, with most values very high (median ~0.984, 75th percentile ~0.99999). This indicates that the model is extremely confident that most loans are “good” (non-default), which is typical if the dataset is imbalanced toward good loans.
Credit Scores:
The scores range from roughly 88 to 886 with a mean of 339 and a median of about 368. Since the score is computed as

\[\text{Score} = \text{Offset} - (\text{Factor} \times \ln(\text{Odds}))\]

and the odds are \(P/(1-P)\), loans with very high predicted probabilities (good loans) yield lower scores. This might seem counterintuitive if higher scores are expected to represent better credit quality. If you want higher scores for better loans, you may need to adjust the transformation (for example, by inverting the score or recalibrating the offset and factor).
Overall Impression:
The model shows excellent discrimination (ROC-AUC ~0.96) and is very confident in its predictions. However, you might want to revisit the scorecard transformation so that the scores align with typical business expectations (i.e., higher scores indicating lower risk).

7. Final Thoughts

Credit risk modeling and scorecard development involve:

Data preprocessing: Cleaning, handling missing values, and engineering features.
Modeling: Using logistic regression to predict default probabilities.
Evaluation: Leveraging metrics like ROC-AUC to gauge model performance.
Scorecard Construction: Transforming model outputs into actionable scores using scaling factors like PDO, offset, and weight of evidence.

The process not only provides insights into creditworthiness but also standardizes decisions in lending. While our examples use logistic regression for its simplicity and interpretability, other techniques (like decision trees, ensemble methods, or neural networks) can also be applied depending on the complexity and requirements of the task.