Article

Predicting Returns using Decision Trees and Random Forests

Financial time series are noisy, exhibit regime changes, and often have non-linear dynamics. Traditional linear models can miss important interactions between variables. Decision trees and random forests are non-parametric machine learning methods that can model complex, non-linear relationships without assuming a specific functional form.

This article presents the theory behind these methods, their mathematical foundations, and a full Python implementation for predicting Bitcoin’s daily returns and testing a basic trading strategy.

Problem Definition

Let \(P_t\) denote the adjusted closing price of Bitcoin at time \(t\). The one-day-ahead return is defined as:

\[ r_{t+1} = \frac{P_{t+1} - P_t}{P_t} \]

We aim to find a predictive function:

\[ \hat{r}_{t+1} = f(X_t) \]

where \(X_t\) is a feature vector containing historical market data available at time \(t\), and \(f(\cdot)\) is a learned mapping from features to next-day returns.

Decision Tree Regression

A regression decision tree partitions the predictor space into \(M\) non-overlapping regions \(R_1, R_2, \dots, R_M\). For any observation in region \(R_m\), the prediction is:

\[ \hat{r}_{t+1} = c_m \quad \text{if} \quad X_t \in R_m \]

where \(c_m\) is the mean of the target variable in region \(R_m\).

The partitioning is chosen to minimize the sum of squared errors at each split:

\[ \text{SSE} = \sum_{i \in R_{\text{left}}} (y_i - \bar{y}_{\text{left}})^2 + \sum_{i \in R_{\text{right}}} (y_i - \bar{y}_{\text{right}})^2 \]

Decision trees are interpretable but can overfit, especially in noisy financial data.

Random Forest Regression

A random forest is an ensemble of decision trees:

\[ \hat{r}_{t+1}^{RF} = \frac{1}{B} \sum_{b=1}^B f_b(X_t) \]

where:

Each \(f_b\) is a decision tree trained on a bootstrap sample of the training data (bagging).
At each split, only a random subset of predictors is considered (random subspace method).

These mechanisms reduce correlation between trees, lowering the variance of the average prediction and improving out-of-sample performance.

Strategy Performance Metrics

We evaluate performance using Compound Annual Growth Rate (CAGR) and Sharpe Ratio.

CAGR:

\[ \text{CAGR} = \left( \prod_{t=1}^T (1 + r_t^{\text{strat}}) \right)^{\frac{252}{T}} - 1 \]

Sharpe Ratio:

\[ S = \frac{\mu_{\text{strat}}}{\sigma_{\text{strat}}} \sqrt{252} \]

where \(\mu_{\text{strat}}\) and \(\sigma_{\text{strat}}\) are the mean and standard deviation of daily strategy returns.

Trading Rule

Given a model prediction \(\hat{r}_{t+1}\):

Long if \(\hat{r}_{t+1} > \theta\)
Short if \(\hat{r}_{t+1} < -\theta\)
No position if \(|\hat{r}_{t+1}| \leq \theta\)

The threshold \(\theta\) filters out weak predictions and reduces overtrading.

Python Implementation

import yfinance as yf
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt

# Download BTC data
df = yf.download("BTC-USD", start="2017-01-01", end="2025-01-01", auto_adjust=False)
if isinstance(df.columns, pd.MultiIndex):
    df.columns = df.columns.droplevel(1)

# Compute daily returns
df['Return'] = df['Adj Close'].pct_change()

# Create lagged return features
lags = 5
for lag in range(1, lags + 1):
    df[f'Ret_Lag{lag}'] = df['Return'].shift(lag)

# Rolling volatility
df['Volatility'] = df['Return'].rolling(window=10).std()

df.dropna(inplace=True)

# Train/test split
train_size = int(len(df) * 0.7)
X_train = df.drop(columns=['Return']).iloc[:train_size]
y_train = df['Return'].iloc[:train_size]
X_test = df.drop(columns=['Return']).iloc[train_size:]
y_test = df['Return'].iloc[train_size:]

# Decision Tree model
tree_model = DecisionTreeRegressor(min_samples_leaf=50, random_state=42)
tree_model.fit(X_train, y_train)
pred_tree = tree_model.predict(X_test)

# Random Forest model
rf_model = RandomForestRegressor(
    n_estimators=100, min_samples_leaf=50, max_features=3, random_state=42
)
rf_model.fit(X_train, y_train)
pred_rf = rf_model.predict(X_test)

# Strategy evaluation
def evaluate_strategy(pred, actual, threshold=0.0):
    positions = np.where(pred > threshold, 1,
                         np.where(pred < -threshold, -1, 0))
    strat_returns = positions * actual
    cagr = (1 + strat_returns).prod() ** (252 / len(strat_returns)) - 1
    sharpe = np.sqrt(252) * strat_returns.mean() / strat_returns.std()
    return cagr, sharpe, strat_returns.cumsum()

cagr_tree, sharpe_tree, eq_tree = evaluate_strategy(pred_tree, y_test)
cagr_rf, sharpe_rf, eq_rf = evaluate_strategy(pred_rf, y_test)

print(f"Decision Tree - CAGR: {cagr_tree:.2%}, Sharpe: {sharpe_tree:.2f}")
print(f"Random Forest - CAGR: {cagr_rf:.2%}, Sharpe: {sharpe_rf:.2f}")

# Plot equity curves
plt.figure(figsize=(10,6))
plt.plot(eq_tree, label="Decision Tree")
plt.plot(eq_rf, label="Random Forest")
plt.axhline(0, color='black', linestyle='--', linewidth=1)
plt.title("BTC-USD Strategy Cumulative Returns")
plt.xlabel("Date")
plt.ylabel("Cumulative Return")
plt.legend()
plt.show()

# Feature importance
importance_df = pd.DataFrame({
    'Feature': X_train.columns,
    'Importance': rf_model.feature_importances_
}).sort_values(by='Importance', ascending=False)
print(importance_df)

Results and Interpretation

Decision Tree  - CAGR: 0.11%, Sharpe: 0.21
Random Forest  - CAGR: 84.47%, Sharpe: 1.69

Random Forest Feature Importance:
       Feature  Importance
11  Volatility    0.138639
7     Ret_Lag2    0.099175
6     Ret_Lag1    0.097282
4         Open    0.096461
5       Volume    0.089364
8     Ret_Lag3    0.088104
10    Ret_Lag5    0.079873
0    Adj Close    0.074865
1        Close    0.073104
9     Ret_Lag4    0.066029
2         High    0.050677
3          Low    0.046427

When applied to Bitcoin data from 2017 to early 2025:

Decision Tree: Very low performance, with a Sharpe ratio near zero, confirming high variance and overfitting risks.
Random Forest: Substantially higher CAGR and Sharpe ratio, indicating better generalization due to variance reduction.
Feature Importance: Volatility, recent lagged returns (1–3 days), and volume are consistently the most informative predictors.

Practical Implications

For quantitative analysts, these results show the value of ensemble learning for noisy, non-linear return prediction. In practice:

Implement a signal threshold to reduce trading noise.
Retrain models regularly in a walk-forward manner to account for market regime shifts.
Use additional features such as momentum indicators, volatility regimes, and liquidity measures to enhance predictive power.

Conclusion

Decision trees offer transparency, but random forests deliver the robustness required in systematic trading for volatile assets like Bitcoin. A disciplined approach combining solid feature engineering, thresholded execution, and rigorous risk-adjusted evaluation can transform these models into viable trading components.