This tutorial provides a comprehensive guide to developing an algorithmic trading strategy for Bitcoin using machine learning classification techniques. We’ll cover everything from fetching real-time Bitcoin data and engineering predictive features to building and evaluating classification models, and finally, backtesting the strategy. This guide is designed to be self-contained, with all necessary Python code and explanations.
Cryptocurrency markets, known for their volatility and 24/7 trading, present unique challenges and opportunities for algorithmic trading. Machine learning, particularly classification, can be employed to predict market movements and generate trading signals (e.g., buy, sell, or hold).
The core idea is to transform the problem of predicting price movements into a classification task. For instance, we can classify the next period’s expected price movement into categories like “price will rise” (buy signal) or “price will fall” (sell signal). One powerful aspect of machine learning is feature engineering, where we create new, informative features from raw data (like price and volume) to improve model performance. Technical indicators are a common source for such features.
This tutorial will focus on:
We aim to predict whether the current trading signal for Bitcoin is to buy (1) or sell (0). This signal will be determined by comparing short-term and long-term price trends. For example, if a short-term moving average of the price is above a long-term moving average, it might indicate an uptrend (buy signal), and vice-versa.
yfinance
.We’ll need several Python libraries:
yfinance
: For fetching financial data
(Bitcoin prices).pandas
: For data manipulation and
analysis.numpy
: For numerical operations.matplotlib.pyplot
and
seaborn
: For data visualization.scikit-learn
: For machine learning
tasks, including:
model_selection
(for train_test_split
,
KFold
, cross_val_score
,
GridSearchCV
)LogisticRegression
,
DecisionTreeClassifier
,
RandomForestClassifier
)metrics
(for accuracy_score
,
confusion_matrix
, classification_report
)import yfinance as yf
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, KFold, cross_val_score, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, ExtraTreesClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import warnings
='ignore')
warnings.filterwarnings(action
# Set a consistent style for plots
'seaborn-v0_8-whitegrid')
plt.style.use('display.width', 100) pd.set_option(
We will fetch Bitcoin (BTC-USD) data using yfinance
. The
original context uses minute-by-minute data; for simplicity and common
practice with yfinance
for daily strategies, we’ll fetch
daily data. The principles remain the same.
= 'BTC-USD'
ticker = '2018-01-01'
start_date = pd.to_datetime('today').strftime('%Y-%m-%d')
end_date
try:
= yf.download(ticker, start=start_date, end=end_date, auto_adjust=False, progress=False)
raw_data if raw_data.empty:
raise ValueError("No data downloaded. Check ticker or date range.")
= raw_data[['Open', 'High', 'Low', 'Close', 'Volume']].copy()
dataset ={'Volume': 'Volume_(BTC)'}, inplace=True)
dataset.rename(columnsprint("Successfully downloaded Bitcoin data.")
except Exception as e:
print(f"Error downloading data: {e}")
print("Using a dummy dataset for demonstration purposes.")
= pd.date_range(start='2020-01-01', periods=1000, freq='D')
dates = {
data_dummy 'Open': np.random.rand(1000) * 10000 + 30000,
'High': np.random.rand(1000) * 10000 + 35000,
'Low': np.random.rand(1000) * 10000 + 25000,
'Close': np.random.rand(1000) * 10000 + 30000,
'Volume_(BTC)': np.random.rand(1000) * 100 + 10
}= pd.DataFrame(data_dummy, index=dates)
dataset
print("\nDataset shape:", dataset.shape)
=0, how='all', inplace=True) # Drop rows if all values are NaN (can happen with yfinance for some dates)
dataset.dropna(axisprint("Dataset shape after dropping all-NaN rows:", dataset.shape)
# 4. Exploratory Data Analysis (EDA)
print("\nDataset Info:")
dataset.info()
A quick look at the data structure.
print("\nDataset Info:")
dataset.info()
print("\nSummary Statistics:")
print(dataset.describe())
Visualizing the closing price helps understand its trend and volatility.
=(14, 7))
plt.figure(figsize'Close'].plot(grid=True)
dataset[f'{ticker} Closing Price ({start_date} to {end_date})')
plt.title('Price (USD)')
plt.ylabel('bitcoin_closing_price.png')
plt.savefig(print("\nSaved Bitcoin closing price plot to bitcoin_closing_price.png")
# plt.show()
plt.close()
Bitcoin’s price chart typically shows significant volatility and distinct trend periods.
Financial data can have missing values, especially for less liquid
assets or specific exchanges. For daily yfinance
data, NaNs
are less common for major assets like BTC-USD but should still be
checked. The PDF uses ffill()
(forward fill) to handle
NaNs.
print("\nMissing values before cleaning (after initial load):")
print(dataset.isnull().sum())
='ffill', inplace=True)
dataset.fillna(method='bfill', inplace=True)
dataset.fillna(methodprint("\nMissing values after initial ffill/bfill:")
print(dataset.isnull().sum())
=True) # Drop any remaining rows with NaNs, if any
dataset.dropna(inplaceprint("Dataset shape after full NaN drop:", dataset.shape)
if dataset.empty:
print("Dataset is empty after initial cleaning. Exiting.")
exit()
The Timestamp
column in the original PDF’s dataset
(minute data) was not useful for modeling and was dropped. For our daily
data, the DatetimeIndex is useful and kept.
signal
)The trading signal (our target variable) is generated by comparing a short-term moving average (MAVG) with a long-term MAVG.
We’ll use a 10-period rolling mean for the short-term MAVG and a 60-period rolling mean for the long-term MAVG, applied to the ‘Close’ price.
= 10
short_window = 60
long_window 'short_mavg'] = dataset['Close'].rolling(window=short_window, min_periods=1).mean()
dataset['long_mavg'] = dataset['Close'].rolling(window=long_window, min_periods=1).mean()
dataset['signal'] = 0.0
dataset[= max(short_window, long_window) -1
valid_signal_idx_start if len(dataset) > valid_signal_idx_start :
'signal'] = np.where(
dataset.loc[dataset.index[valid_signal_idx_start:], 'short_mavg'][valid_signal_idx_start:] > dataset['long_mavg'][valid_signal_idx_start:], 1.0, 0.0
dataset[ )
Raw price/volume data might not be sufficient for a model to learn complex patterns. Technical indicators can extract underlying trend, momentum, volatility, and other characteristics from the market data. We will create several common indicators to use as features.
Technical Indicators to Implement:
Exponential Moving Average (EMA): Similar to SMA
but gives more weight to recent prices. where
Rate of Change (ROC): Measures the percentage
change in price between the current price and the price n
periods ago.
Momentum (MOM): Measures the absolute change in
price over n
periods.
Relative Strength Index (RSI): A momentum oscillator that measures the speed and change of price movements. RSI oscillates between 0 and 100.
Stochastic Oscillator (%K and %D): Compares a particular closing price of an asset to a range of its prices over a certain period of time.
Moving Average (MA): Simple moving average (already used for signal, but can be features too).
for n_ema in [10, 30, 200]:
f'EMA{n_ema}'] = EMA(dataset['Close'], n_ema)
dataset[for n_roc in [10, 30]:
f'ROC{n_roc}'] = ROC(dataset['Close'], n_roc)
dataset[for n_mom in [10, 30]:
f'MOM{n_mom}'] = MOM(dataset['Close'], n_mom)
dataset[for n_rsi in [10, 30, 200]:
f'RSI{n_rsi}'] = RSI(dataset['Close'], n_rsi)
dataset[= [10, 30, 200]
stoch_periods = 3
d_smooth_period for n_stoch in stoch_periods:
f'%K_{n_stoch}'] = STOK(dataset['Close'], dataset['Low'], dataset['High'], n_stoch)
dataset[f'%D_{n_stoch}_{d_smooth_period}'] = STOD(dataset[f'%K_{n_stoch}'], d_smooth_period)
dataset[for n_ma in [21, 63, 252]:
f'MA{n_ma}'] = MA(dataset['Close'], n_ma)
dataset[
= len(dataset)
initial_rows -np.inf], np.nan, inplace=True) # Replace infs created by indicators like RSI if loss is 0
dataset.replace([np.inf, =True)
dataset.dropna(inplaceprint(f"\nDropped {initial_rows - len(dataset)} rows due to NaNs/infs from feature engineering.")
if dataset.empty:
print("Dataset is empty after feature engineering and NaN drop. Cannot proceed.")
exit()
Let’s check the distribution of our target variable
signal
after all data preparation.
=(6, 4))
plt.figure(figsize'signal'].value_counts().plot(kind='barh', color=['skyblue', 'salmon'])
dataset['Distribution of Trading Signal (1: Buy, 0: Sell)')
plt.title('Frequency')
plt.xlabel('Signal')
plt.ylabel(=[0,1], labels=['Sell (0)', 'Buy (1)']) # Adjust based on value_counts order
plt.yticks(ticks# plt.show()
'bitcoin_signal_distribution.png')
plt.savefig(print("\nSaved trading signal distribution plot to bitcoin_signal_distribution.png")
plt.close()
The distribution might be relatively balanced or slightly skewed depending on the market period and MAVG parameters. The PDF’s example shows it as relatively balanced.
Separate features (X) and target (y). Drop columns used for target creation if they are not intended as features.
if 'signal' not in dataset.columns:
print("Error: 'signal' column is missing from the dataset before splitting.")
exit()
= ['signal', 'short_mavg', 'long_mavg']
features_to_drop_for_X = dataset.drop(columns=features_to_drop_for_X, errors='ignore')
X = dataset['signal']
y
= X.apply(pd.to_numeric, errors='coerce').dropna(axis=1, how='all').fillna(0)
X
if X.empty or len(X) != len(y) or X.shape[1] == 0:
print("Feature set X is empty, mismatched with y, or has no columns after final processing. Cannot proceed.")
exit()
The PDF uses the last 100,000 observations for faster calculation. For daily data, this is a very long period. Let’s use a standard chronological split for time series, e.g., 80% for training, 20% for testing.
= int(len(X) * 0.8)
split_index if split_index < 1 or split_index >= len(X) -1 :
print(f"Cannot perform train-test split with current data size: {len(X)}. Need more data after NaN drops.")
exit()
= X.iloc[:split_index]
X_train = X.iloc[split_index:]
X_test = y.iloc[:split_index]
y_train = y.iloc[split_index:]
y_test
if X_train.empty or X_test.empty or y_train.empty or y_test.empty:
print("Training or testing set is empty. Cannot proceed with model evaluation.")
exit()
Given the signal distribution, accuracy can be a reasonable starting metric if the classes are somewhat balanced. We also need to look at precision, recall, and F1-score for buy/sell signals.
= 'accuracy'
scoring_metric = 5
num_folds = KFold(n_splits=num_folds, shuffle=True, random_state=42) kfold
Spot-check various classification algorithms.
= []
models_btc 'LR', LogisticRegression(solver='liblinear', max_iter=200, random_state=42)))
models_btc.append(('LDA', LinearDiscriminantAnalysis()))
models_btc.append(('CART', DecisionTreeClassifier(random_state=42)))
models_btc.append(('RF', RandomForestClassifier(random_state=42, n_jobs=-1)))
models_btc.append(('GBM', GradientBoostingClassifier(random_state=42)))
models_btc.append((
= []
results_btc = []
names_btc print(f"\nSpot-checking models using {scoring_metric}:")
for name, model in models_btc:
try:
= cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring_metric, n_jobs=-1)
cv_results
results_btc.append(cv_results)
names_btc.append(name)print(f"{name}: {cv_results.mean():.4f} ({cv_results.std():.4f})")
except Exception as e:
print(f"Could not evaluate {name}: {e}")
The PDF identifies Random Forest as performing well among ensemble models. Let’s assume it’s a good candidate.
We’ll tune hyperparameters for Random Forest using
GridSearchCV
.
= None
best_model_btc = 'RF'
chosen_model_name_for_tuning = None
model_to_tune_proto for name, model_proto_iter in models_btc:
if name == chosen_model_name_for_tuning:
= model_proto_iter
model_to_tune_proto break
if model_to_tune_proto is not None:
= {
param_grid 'n_estimators': [50, 100], 'max_depth': [5, 10, None], 'criterion': ['gini', 'entropy']
if isinstance(model_to_tune_proto, RandomForestClassifier) else {
} 'n_estimators': [50, 100], 'learning_rate': [0.05, 0.1], 'max_depth': [3,5]
}= GridSearchCV(estimator=model_to_tune_proto, param_grid=param_grid, scoring=scoring_metric, cv=kfold, n_jobs=-1)
grid try:
= grid.fit(X_train, y_train)
grid_result print(f"\nBest {scoring_metric} for {chosen_model_name_for_tuning}: {grid_result.best_score_:.4f} using {grid_result.best_params_}")
= grid_result.best_estimator_
best_model_btc except Exception as e:
print(f"GridSearchCV failed for {chosen_model_name_for_tuning}: {e}")
= model_to_tune_proto
best_model_btc print(f"Using default (untuned) {chosen_model_name_for_tuning} parameters due to GridSearchCV error.")
best_model_btc.fit(X_train, y_train)else:
print(f"\nModel '{chosen_model_name_for_tuning}' not found or CV failed. Using a default RF.")
= RandomForestClassifier(random_state=42, n_estimators=100, n_jobs=-1)
best_model_btc if not X_train.empty and not y_train.empty:
best_model_btc.fit(X_train, y_train)else:
print("Cannot fit default model as training data is empty.")
= None best_model_btc
Evaluate the tuned (or best chosen) model on the unseen test set.
if best_model_btc and not X_test.empty and not y_test.empty:
= best_model_btc.predict(X_test)
y_pred_test print(f"\nPerformance of Final Model ({best_model_btc.__class__.__name__}) on Test Set:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_test):.4f}")
= confusion_matrix(y_test, y_pred_test)
cm_test print("\nConfusion Matrix (Test Set):\n", cm_test)
print("\nClassification Report (Test Set):")
print(f"Unique values in y_test: {np.unique(y_test, return_counts=True)}")
print(f"Unique values in y_pred_test: {np.unique(y_pred_test, return_counts=True)}")
print(classification_report(y_test, y_pred_test, target_names=['Sell (0)', 'Buy (1)'], labels=[0, 1], zero_division=0))
if hasattr(best_model_btc, 'feature_importances_'):
= best_model_btc.feature_importances_
importances = X_train.columns
feature_names_original
= []
str_feature_names for name in feature_names_original:
if isinstance(name, tuple):
'_'.join(map(str, name)))
str_feature_names.append(else:
str(name))
str_feature_names.append(
= pd.DataFrame({'feature': str_feature_names, 'importance': importances})
feature_importance_df = feature_importance_df.sort_values(by='importance', ascending=False)
feature_importance_df print("\nTop 15 Feature Importances (with stringified feature names):")
print(feature_importance_df.head(15))
=(10, 8))
plt.figure(figsize='importance', y='feature', data=feature_importance_df.head(15), palette='viridis')
sns.barplot(xf'Top 15 Feature Importances - {best_model_btc.__class__.__name__}')
plt.title('Importance')
plt.xlabel('Feature')
plt.ylabel(
plt.tight_layout()# plt.savefig('bitcoin_feature_importance.png')
print("\nSaved feature importance plot to bitcoin_feature_importance.png")
# plt.close()
else:
print("\nNo model was finalized for evaluation or test set is empty.")
The model’s accuracy and other metrics on the test set give an indication of its real-world performance. For tree-based models like Random Forest or GBM, we can examine feature importances.
This helps understand which technical indicators were most influential in the model’s predictions. Momentum indicators like RSI and MOM often show high importance.
Backtesting simulates how the strategy would have performed on historical data. We’ll create a simple backtest:
1
means hold (or buy if not
holding), a 0
means be out of the market (or sell if
holding). This is a long-only interpretation for simplicity.if best_model_btc and not X_test.empty and 'y_pred_test' in locals() and not y_test.empty:
= pd.DataFrame(index=X_test.index)
backtest_df if 'Close' in dataset.columns and 'signal' in dataset.columns and X_test.index.isin(dataset.index).all():
'Market_Returns'] = dataset.loc[X_test.index, 'Close'].pct_change()
backtest_df['Predicted_Signal'] = y_pred_test
backtest_df['Strategy_Returns'] = backtest_df['Market_Returns'] * backtest_df['Predicted_Signal'].shift(1)
backtest_df['Actual_MAVG_Signal_Returns'] = backtest_df['Market_Returns'] * dataset.loc[X_test.index, 'signal'].shift(1)
backtest_df[=True)
backtest_df.dropna(inplace
if not backtest_df.empty:
'Cumulative_Market_Returns'] = (1 + backtest_df['Market_Returns']).cumprod() - 1
backtest_df['Cumulative_Strategy_Returns'] = (1 + backtest_df['Strategy_Returns']).cumprod() - 1
backtest_df['Cumulative_Actual_MAVG_Signal_Returns'] = (1 + backtest_df['Actual_MAVG_Signal_Returns']).cumprod() - 1
backtest_df[print("\nBacktesting Results (Last 5 days):\n", backtest_df.tail())
=(14, 7))
plt.figure(figsize'Cumulative_Market_Returns'].plot(label='Market (Buy & Hold BTC)', color='gray', linestyle='--')
backtest_df['Cumulative_Strategy_Returns'].plot(label='ML Strategy Returns', color='blue')
backtest_df['Cumulative_Actual_MAVG_Signal_Returns'].plot(label='Original MAVG Signal Returns', color='orange')
backtest_df['Cumulative Returns Comparison')
plt.title('Cumulative Returns')
plt.ylabel(
plt.legend()
plt.tight_layout()# plt.savefig('bitcoin_backtest_returns.png')
print("\nSaved backtesting returns plot to bitcoin_backtest_returns.png")
# plt.close()
else:
print("\nBacktest DataFrame is empty after processing; cannot plot returns.")
else:
print("\nCould not perform backtesting: 'Close' or 'signal' column missing or index mismatch.")
else:
print("\nSkipping backtesting as no model was finalized or test/prediction data is unavailable.")
print("\n--- Tutorial: Algorithmic Bitcoin Trading Strategy Finished ---")
The plot comparing cumulative returns helps assess if the machine learning strategy added value over a simple buy-and-hold or the original MAVG crossover rule.
This tutorial demonstrated a complete workflow for building a Bitcoin trading strategy using machine learning classification. We covered:
yfinance
.The results of such a strategy can vary greatly depending on the chosen period, features, model, and market conditions. Key takeaways include the importance of robust feature engineering and careful model evaluation.
Further improvements and considerations could include:
This framework provides a solid foundation for developing and testing algorithmic trading strategies based on machine learning.