Financial time series are noisy, exhibit regime changes, and often have non-linear dynamics. Traditional linear models can miss important interactions between variables. Decision trees and random forests are non-parametric machine learning methods that can model complex, non-linear relationships without assuming a specific functional form.
This article presents the theory behind these methods, their mathematical foundations, and a full Python implementation for predicting Bitcoin’s daily returns and testing a basic trading strategy.
Let \(P_t\) denote the adjusted closing price of Bitcoin at time \(t\). The one-day-ahead return is defined as:
\[ r_{t+1} = \frac{P_{t+1} - P_t}{P_t} \]
We aim to find a predictive function:
\[ \hat{r}_{t+1} = f(X_t) \]
where \(X_t\) is a feature vector containing historical market data available at time \(t\), and \(f(\cdot)\) is a learned mapping from features to next-day returns.
A regression decision tree partitions the predictor space into \(M\) non-overlapping regions \(R_1, R_2, \dots, R_M\). For any observation in region \(R_m\), the prediction is:
\[ \hat{r}_{t+1} = c_m \quad \text{if} \quad X_t \in R_m \]
where \(c_m\) is the mean of the target variable in region \(R_m\).
The partitioning is chosen to minimize the sum of squared errors at each split:
\[ \text{SSE} = \sum_{i \in R_{\text{left}}} (y_i - \bar{y}_{\text{left}})^2 + \sum_{i \in R_{\text{right}}} (y_i - \bar{y}_{\text{right}})^2 \]
Decision trees are interpretable but can overfit, especially in noisy financial data.
A random forest is an ensemble of decision trees:
\[ \hat{r}_{t+1}^{RF} = \frac{1}{B} \sum_{b=1}^B f_b(X_t) \]
where:
These mechanisms reduce correlation between trees, lowering the variance of the average prediction and improving out-of-sample performance.
We evaluate performance using Compound Annual Growth Rate (CAGR) and Sharpe Ratio.
CAGR:
\[ \text{CAGR} = \left( \prod_{t=1}^T (1 + r_t^{\text{strat}}) \right)^{\frac{252}{T}} - 1 \]
Sharpe Ratio:
\[ S = \frac{\mu_{\text{strat}}}{\sigma_{\text{strat}}} \sqrt{252} \]
where \(\mu_{\text{strat}}\) and \(\sigma_{\text{strat}}\) are the mean and standard deviation of daily strategy returns.
Given a model prediction \(\hat{r}_{t+1}\):
The threshold \(\theta\) filters out weak predictions and reduces overtrading.
import yfinance as yf
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt
# Download BTC data
= yf.download("BTC-USD", start="2017-01-01", end="2025-01-01", auto_adjust=False)
df if isinstance(df.columns, pd.MultiIndex):
= df.columns.droplevel(1)
df.columns
# Compute daily returns
'Return'] = df['Adj Close'].pct_change()
df[
# Create lagged return features
= 5
lags for lag in range(1, lags + 1):
f'Ret_Lag{lag}'] = df['Return'].shift(lag)
df[
# Rolling volatility
'Volatility'] = df['Return'].rolling(window=10).std()
df[
=True)
df.dropna(inplace
# Train/test split
= int(len(df) * 0.7)
train_size = df.drop(columns=['Return']).iloc[:train_size]
X_train = df['Return'].iloc[:train_size]
y_train = df.drop(columns=['Return']).iloc[train_size:]
X_test = df['Return'].iloc[train_size:]
y_test
# Decision Tree model
= DecisionTreeRegressor(min_samples_leaf=50, random_state=42)
tree_model
tree_model.fit(X_train, y_train)= tree_model.predict(X_test)
pred_tree
# Random Forest model
= RandomForestRegressor(
rf_model =100, min_samples_leaf=50, max_features=3, random_state=42
n_estimators
)
rf_model.fit(X_train, y_train)= rf_model.predict(X_test)
pred_rf
# Strategy evaluation
def evaluate_strategy(pred, actual, threshold=0.0):
= np.where(pred > threshold, 1,
positions < -threshold, -1, 0))
np.where(pred = positions * actual
strat_returns = (1 + strat_returns).prod() ** (252 / len(strat_returns)) - 1
cagr = np.sqrt(252) * strat_returns.mean() / strat_returns.std()
sharpe return cagr, sharpe, strat_returns.cumsum()
= evaluate_strategy(pred_tree, y_test)
cagr_tree, sharpe_tree, eq_tree = evaluate_strategy(pred_rf, y_test)
cagr_rf, sharpe_rf, eq_rf
print(f"Decision Tree - CAGR: {cagr_tree:.2%}, Sharpe: {sharpe_tree:.2f}")
print(f"Random Forest - CAGR: {cagr_rf:.2%}, Sharpe: {sharpe_rf:.2f}")
# Plot equity curves
=(10,6))
plt.figure(figsize="Decision Tree")
plt.plot(eq_tree, label="Random Forest")
plt.plot(eq_rf, label0, color='black', linestyle='--', linewidth=1)
plt.axhline("BTC-USD Strategy Cumulative Returns")
plt.title("Date")
plt.xlabel("Cumulative Return")
plt.ylabel(
plt.legend()
plt.show()
# Feature importance
= pd.DataFrame({
importance_df 'Feature': X_train.columns,
'Importance': rf_model.feature_importances_
='Importance', ascending=False)
}).sort_values(byprint(importance_df)
Decision Tree - CAGR: 0.11%, Sharpe: 0.21
Random Forest - CAGR: 84.47%, Sharpe: 1.69
Random Forest Feature Importance:
Feature Importance
11 Volatility 0.138639
7 Ret_Lag2 0.099175
6 Ret_Lag1 0.097282
4 Open 0.096461
5 Volume 0.089364
8 Ret_Lag3 0.088104
10 Ret_Lag5 0.079873
0 Adj Close 0.074865
1 Close 0.073104
9 Ret_Lag4 0.066029
2 High 0.050677
3 Low 0.046427
When applied to Bitcoin data from 2017 to early 2025:
For quantitative analysts, these results show the value of ensemble learning for noisy, non-linear return prediction. In practice:
Decision trees offer transparency, but random forests deliver the robustness required in systematic trading for volatile assets like Bitcoin. A disciplined approach combining solid feature engineering, thresholded execution, and rigorous risk-adjusted evaluation can transform these models into viable trading components.