Article

Algorithmic Bitcoin Trading Strategy using Machine Learning Classification

This tutorial provides a comprehensive guide to developing an algorithmic trading strategy for Bitcoin using machine learning classification techniques. We’ll cover everything from fetching real-time Bitcoin data and engineering predictive features to building and evaluating classification models, and finally, backtesting the strategy. This guide is designed to be self-contained, with all necessary Python code and explanations.

1. Introduction: Classification for Trading Signals

Cryptocurrency markets, known for their volatility and 24/7 trading, present unique challenges and opportunities for algorithmic trading. Machine learning, particularly classification, can be employed to predict market movements and generate trading signals (e.g., buy, sell, or hold).

The core idea is to transform the problem of predicting price movements into a classification task. For instance, we can classify the next period’s expected price movement into categories like “price will rise” (buy signal) or “price will fall” (sell signal). One powerful aspect of machine learning is feature engineering, where we create new, informative features from raw data (like price and volume) to improve model performance. Technical indicators are a common source for such features.

This tutorial will focus on:

Building a trading strategy based on classifying buy/sell signals.
Engineering features using common technical indicators.
Developing a framework to backtest the trading strategy’s performance.
Choosing appropriate evaluation metrics for a trading strategy.

2. Problem Definition: Predicting Buy/Sell Signals

We aim to predict whether the current trading signal for Bitcoin is to buy (1) or sell (0). This signal will be determined by comparing short-term and long-term price trends. For example, if a short-term moving average of the price is above a long-term moving average, it might indicate an uptrend (buy signal), and vice-versa.

Data: We’ll use historical Bitcoin price data. We will fetch up-to-date data using yfinance.
Features: We will create various trend and momentum technical indicators from the price data to serve as input features for our classification model.
Target Variable: A binary signal (1 for buy, 0 for sell) derived from the relationship between short-term and long-term moving averages.

3. Getting Started: Setting Up the Environment

3.1. Python Packages

We’ll need several Python libraries:

yfinance: For fetching financial data (Bitcoin prices).
pandas: For data manipulation and analysis.
numpy: For numerical operations.
matplotlib.pyplot and seaborn: For data visualization.
scikit-learn: For machine learning tasks, including:
- model_selection (for train_test_split, KFold, cross_val_score, GridSearchCV)
- Various classifiers (e.g., LogisticRegression, DecisionTreeClassifier, RandomForestClassifier)
- metrics (for accuracy_score, confusion_matrix, classification_report)

import yfinance as yf
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, KFold, cross_val_score, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, ExtraTreesClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import warnings
warnings.filterwarnings(action='ignore')

# Set a consistent style for plots
plt.style.use('seaborn-v0_8-whitegrid')
pd.set_option('display.width', 100)

3.2. Loading the Data

We will fetch Bitcoin (BTC-USD) data using yfinance. The original context uses minute-by-minute data; for simplicity and common practice with yfinance for daily strategies, we’ll fetch daily data. The principles remain the same.

ticker = 'BTC-USD'
start_date = '2018-01-01'
end_date = pd.to_datetime('today').strftime('%Y-%m-%d')

try:
    raw_data = yf.download(ticker, start=start_date, end=end_date, auto_adjust=False, progress=False)
    if raw_data.empty:
        raise ValueError("No data downloaded. Check ticker or date range.")
    
    dataset = raw_data[['Open', 'High', 'Low', 'Close', 'Volume']].copy()
    dataset.rename(columns={'Volume': 'Volume_(BTC)'}, inplace=True)
    print("Successfully downloaded Bitcoin data.")
except Exception as e:
    print(f"Error downloading data: {e}")
    print("Using a dummy dataset for demonstration purposes.")
    dates = pd.date_range(start='2020-01-01', periods=1000, freq='D')
    data_dummy = {
        'Open': np.random.rand(1000) * 10000 + 30000,
        'High': np.random.rand(1000) * 10000 + 35000,
        'Low': np.random.rand(1000) * 10000 + 25000,
        'Close': np.random.rand(1000) * 10000 + 30000,
        'Volume_(BTC)': np.random.rand(1000) * 100 + 10
    }
    dataset = pd.DataFrame(data_dummy, index=dates)

print("\nDataset shape:", dataset.shape)
dataset.dropna(axis=0, how='all', inplace=True) # Drop rows if all values are NaN (can happen with yfinance for some dates)
print("Dataset shape after dropping all-NaN rows:", dataset.shape)


# 4. Exploratory Data Analysis (EDA)
print("\nDataset Info:")
dataset.info()

4. Exploratory Data Analysis (EDA)

A quick look at the data structure.

print("\nDataset Info:")
dataset.info()

print("\nSummary Statistics:")
print(dataset.describe())

Visualizing the closing price helps understand its trend and volatility.

plt.figure(figsize=(14, 7))
dataset['Close'].plot(grid=True)
plt.title(f'{ticker} Closing Price ({start_date} to {end_date})')
plt.ylabel('Price (USD)')
plt.savefig('bitcoin_closing_price.png')
print("\nSaved Bitcoin closing price plot to bitcoin_closing_price.png")
# plt.show()
plt.close()

Bitcoin’s price chart typically shows significant volatility and distinct trend periods.

5. Data Preparation

5.1. Data Cleaning

Financial data can have missing values, especially for less liquid assets or specific exchanges. For daily yfinance data, NaNs are less common for major assets like BTC-USD but should still be checked. The PDF uses ffill() (forward fill) to handle NaNs.

print("\nMissing values before cleaning (after initial load):")
print(dataset.isnull().sum())
dataset.fillna(method='ffill', inplace=True)
dataset.fillna(method='bfill', inplace=True) 
print("\nMissing values after initial ffill/bfill:")
print(dataset.isnull().sum())
dataset.dropna(inplace=True) # Drop any remaining rows with NaNs, if any
print("Dataset shape after full NaN drop:", dataset.shape)

if dataset.empty:
    print("Dataset is empty after initial cleaning. Exiting.")
    exit()

The Timestamp column in the original PDF’s dataset (minute data) was not useful for modeling and was dropped. For our daily data, the DatetimeIndex is useful and kept.

5.2. Preparing the Target Variable (`signal`)

The trading signal (our target variable) is generated by comparing a short-term moving average (MAVG) with a long-term MAVG.

If short-term MAVG > long-term MAVG: Buy signal (1)
Otherwise: Sell signal (0)

We’ll use a 10-period rolling mean for the short-term MAVG and a 60-period rolling mean for the long-term MAVG, applied to the ‘Close’ price.

short_window = 10
long_window = 60
dataset['short_mavg'] = dataset['Close'].rolling(window=short_window, min_periods=1).mean()
dataset['long_mavg'] = dataset['Close'].rolling(window=long_window, min_periods=1).mean()
dataset['signal'] = 0.0
valid_signal_idx_start = max(short_window, long_window) -1 
if len(dataset) > valid_signal_idx_start :
    dataset.loc[dataset.index[valid_signal_idx_start:], 'signal'] = np.where(
        dataset['short_mavg'][valid_signal_idx_start:] > dataset['long_mavg'][valid_signal_idx_start:], 1.0, 0.0
    )

5.3. Feature Engineering: Technical Indicators

Raw price/volume data might not be sufficient for a model to learn complex patterns. Technical indicators can extract underlying trend, momentum, volatility, and other characteristics from the market data. We will create several common indicators to use as features.

Technical Indicators to Implement:

Exponential Moving Average (EMA): Similar to SMA but gives more weight to recent prices. $\text{EMA}_{\text{today}} = (\text{Value}_{\text{today}} \times \text{Multiplier}) + \text{EMA}_{\text{yesterday}} \times (1 - \text{Multiplier})$ where $\text{Multiplier} = \frac{2}{\text{Period} + 1}$
Rate of Change (ROC): Measures the percentage change in price between the current price and the price n periods ago. $\text{ROC} = \left( \frac{\text{Close}_{\text{today}} - \text{Close}_{\text{n periods ago}}}{\text{Close}_{\text{n periods ago}}} \right) \times 100$
Momentum (MOM): Measures the absolute change in price over n periods. $\text{MOM} = \text{Close}_{\text{today}} - \text{Close}_{\text{n periods ago}}$
Relative Strength Index (RSI): A momentum oscillator that measures the speed and change of price movements. RSI oscillates between 0 and 100.
- Typically, RSI > 70 indicates overbought conditions, and RSI < 30 indicates oversold conditions.
- Calculation involves average gains and average losses over a period. $\text{RS} = \frac{\text{Average Gain}}{\text{Average Loss}}$ $\text{RSI} = 100 - \frac{100}{1 + \text{RS}}$
Stochastic Oscillator (%K and %D): Compares a particular closing price of an asset to a range of its prices over a certain period of time.
- %K Line: $\%K = \left( \frac{\text{Current Close} - \text{Lowest Low over period}}{\text{Highest High over period} - \text{Lowest Low over period}} \right) \times 100$
- %D Line: Typically a 3-period SMA of %K (slow stochastic).
Moving Average (MA): Simple moving average (already used for signal, but can be features too).

for n_ema in [10, 30, 200]:
    dataset[f'EMA{n_ema}'] = EMA(dataset['Close'], n_ema)
for n_roc in [10, 30]:
    dataset[f'ROC{n_roc}'] = ROC(dataset['Close'], n_roc)
for n_mom in [10, 30]:
    dataset[f'MOM{n_mom}'] = MOM(dataset['Close'], n_mom)
for n_rsi in [10, 30, 200]:
    dataset[f'RSI{n_rsi}'] = RSI(dataset['Close'], n_rsi)
stoch_periods = [10, 30, 200]
d_smooth_period = 3 
for n_stoch in stoch_periods:
    dataset[f'%K_{n_stoch}'] = STOK(dataset['Close'], dataset['Low'], dataset['High'], n_stoch)
    dataset[f'%D_{n_stoch}_{d_smooth_period}'] = STOD(dataset[f'%K_{n_stoch}'], d_smooth_period)
for n_ma in [21, 63, 252]:
    dataset[f'MA{n_ma}'] = MA(dataset['Close'], n_ma)

initial_rows = len(dataset)
dataset.replace([np.inf, -np.inf], np.nan, inplace=True) # Replace infs created by indicators like RSI if loss is 0
dataset.dropna(inplace=True)
print(f"\nDropped {initial_rows - len(dataset)} rows due to NaNs/infs from feature engineering.")

if dataset.empty:
    print("Dataset is empty after feature engineering and NaN drop. Cannot proceed.")
    exit()

5.4. Data Visualization (Post Feature Engineering)

Let’s check the distribution of our target variable signal after all data preparation.

plt.figure(figsize=(6, 4))
dataset['signal'].value_counts().plot(kind='barh', color=['skyblue', 'salmon'])
plt.title('Distribution of Trading Signal (1: Buy, 0: Sell)')
plt.xlabel('Frequency')
plt.ylabel('Signal')
plt.yticks(ticks=[0,1], labels=['Sell (0)', 'Buy (1)']) # Adjust based on value_counts order
# plt.show()
plt.savefig('bitcoin_signal_distribution.png')
print("\nSaved trading signal distribution plot to bitcoin_signal_distribution.png")
plt.close()

The distribution might be relatively balanced or slightly skewed depending on the market period and MAVG parameters. The PDF’s example shows it as relatively balanced.

6. Evaluate Algorithms and Models

6.1. Prepare Data for Modeling

Separate features (X) and target (y). Drop columns used for target creation if they are not intended as features.

if 'signal' not in dataset.columns:
    print("Error: 'signal' column is missing from the dataset before splitting.")
    exit()

features_to_drop_for_X = ['signal', 'short_mavg', 'long_mavg']
X = dataset.drop(columns=features_to_drop_for_X, errors='ignore')
y = dataset['signal']

X = X.apply(pd.to_numeric, errors='coerce').dropna(axis=1, how='all').fillna(0) 

if X.empty or len(X) != len(y) or X.shape[1] == 0:
    print("Feature set X is empty, mismatched with y, or has no columns after final processing. Cannot proceed.")
    exit()

6.2. Train-Test Split

The PDF uses the last 100,000 observations for faster calculation. For daily data, this is a very long period. Let’s use a standard chronological split for time series, e.g., 80% for training, 20% for testing.

split_index = int(len(X) * 0.8)
if split_index < 1 or split_index >= len(X) -1 : 
    print(f"Cannot perform train-test split with current data size: {len(X)}. Need more data after NaN drops.")
    exit()

X_train = X.iloc[:split_index]
X_test = X.iloc[split_index:]
y_train = y.iloc[:split_index]
y_test = y.iloc[split_index:]

if X_train.empty or X_test.empty or y_train.empty or y_test.empty:
    print("Training or testing set is empty. Cannot proceed with model evaluation.")
    exit()

6.3. Test Options and Evaluation Metric

Given the signal distribution, accuracy can be a reasonable starting metric if the classes are somewhat balanced. We also need to look at precision, recall, and F1-score for buy/sell signals.

scoring_metric = 'accuracy'
num_folds = 5 
kfold = KFold(n_splits=num_folds, shuffle=True, random_state=42)

6.4. Compare Models and Algorithms

Spot-check various classification algorithms.

models_btc = []
models_btc.append(('LR', LogisticRegression(solver='liblinear', max_iter=200, random_state=42)))
models_btc.append(('LDA', LinearDiscriminantAnalysis()))
models_btc.append(('CART', DecisionTreeClassifier(random_state=42)))
models_btc.append(('RF', RandomForestClassifier(random_state=42, n_jobs=-1)))
models_btc.append(('GBM', GradientBoostingClassifier(random_state=42)))

results_btc = []
names_btc = []
print(f"\nSpot-checking models using {scoring_metric}:")
for name, model in models_btc:
    try:
        cv_results = cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring_metric, n_jobs=-1)
        results_btc.append(cv_results)
        names_btc.append(name)
        print(f"{name}: {cv_results.mean():.4f} ({cv_results.std():.4f})")
    except Exception as e:
        print(f"Could not evaluate {name}: {e}")

The PDF identifies Random Forest as performing well among ensemble models. Let’s assume it’s a good candidate.

7. Model Tuning and Grid Search (Random Forest)

We’ll tune hyperparameters for Random Forest using GridSearchCV.

best_model_btc = None
chosen_model_name_for_tuning = 'RF' 
model_to_tune_proto = None
for name, model_proto_iter in models_btc:
    if name == chosen_model_name_for_tuning:
        model_to_tune_proto = model_proto_iter
        break

if model_to_tune_proto is not None:
    param_grid = {
        'n_estimators': [50, 100], 'max_depth': [5, 10, None], 'criterion': ['gini', 'entropy']
    } if isinstance(model_to_tune_proto, RandomForestClassifier) else {
        'n_estimators': [50, 100], 'learning_rate': [0.05, 0.1], 'max_depth': [3,5]
    }
    grid = GridSearchCV(estimator=model_to_tune_proto, param_grid=param_grid, scoring=scoring_metric, cv=kfold, n_jobs=-1)
    try:
        grid_result = grid.fit(X_train, y_train)
        print(f"\nBest {scoring_metric} for {chosen_model_name_for_tuning}: {grid_result.best_score_:.4f} using {grid_result.best_params_}")
        best_model_btc = grid_result.best_estimator_
    except Exception as e:
        print(f"GridSearchCV failed for {chosen_model_name_for_tuning}: {e}")
        best_model_btc = model_to_tune_proto 
        print(f"Using default (untuned) {chosen_model_name_for_tuning} parameters due to GridSearchCV error.")
        best_model_btc.fit(X_train, y_train)
else:
    print(f"\nModel '{chosen_model_name_for_tuning}' not found or CV failed. Using a default RF.")
    best_model_btc = RandomForestClassifier(random_state=42, n_estimators=100, n_jobs=-1)
    if not X_train.empty and not y_train.empty:
         best_model_btc.fit(X_train, y_train)
    else:
        print("Cannot fit default model as training data is empty.")
        best_model_btc = None

8. Finalize the Model and Evaluate

8.1. Results on the Test Dataset

Evaluate the tuned (or best chosen) model on the unseen test set.

if best_model_btc and not X_test.empty and not y_test.empty:
    y_pred_test = best_model_btc.predict(X_test)
    print(f"\nPerformance of Final Model ({best_model_btc.__class__.__name__}) on Test Set:")
    print(f"Accuracy: {accuracy_score(y_test, y_pred_test):.4f}")
    cm_test = confusion_matrix(y_test, y_pred_test)
    print("\nConfusion Matrix (Test Set):\n", cm_test)
    
    print("\nClassification Report (Test Set):")
    print(f"Unique values in y_test: {np.unique(y_test, return_counts=True)}")
    print(f"Unique values in y_pred_test: {np.unique(y_pred_test, return_counts=True)}")
    print(classification_report(y_test, y_pred_test, target_names=['Sell (0)', 'Buy (1)'], labels=[0, 1], zero_division=0))

    if hasattr(best_model_btc, 'feature_importances_'):
        importances = best_model_btc.feature_importances_
        feature_names_original = X_train.columns

        str_feature_names = []
        for name in feature_names_original:
            if isinstance(name, tuple):
                str_feature_names.append('_'.join(map(str, name))) 
            else:
                str_feature_names.append(str(name))

        feature_importance_df = pd.DataFrame({'feature': str_feature_names, 'importance': importances})
        feature_importance_df = feature_importance_df.sort_values(by='importance', ascending=False)
        print("\nTop 15 Feature Importances (with stringified feature names):")
        print(feature_importance_df.head(15))
        plt.figure(figsize=(10, 8))
        sns.barplot(x='importance', y='feature', data=feature_importance_df.head(15), palette='viridis')
        plt.title(f'Top 15 Feature Importances - {best_model_btc.__class__.__name__}')
        plt.xlabel('Importance') 
        plt.ylabel('Feature')    
        plt.tight_layout()
        # plt.savefig('bitcoin_feature_importance.png')
        print("\nSaved feature importance plot to bitcoin_feature_importance.png")
        # plt.close()
else:
    print("\nNo model was finalized for evaluation or test set is empty.")

The model’s accuracy and other metrics on the test set give an indication of its real-world performance. For tree-based models like Random Forest or GBM, we can examine feature importances.

This helps understand which technical indicators were most influential in the model’s predictions. Momentum indicators like RSI and MOM often show high importance.

9. Backtesting the Trading Strategy (Simplified)

Backtesting simulates how the strategy would have performed on historical data. We’ll create a simple backtest:

Calculate daily market returns.
Calculate strategy returns by multiplying market returns by the predicted signal from the previous day (since we trade on the next bar after a signal). A 1 means hold (or buy if not holding), a 0 means be out of the market (or sell if holding). This is a long-only interpretation for simplicity.

if best_model_btc and not X_test.empty and 'y_pred_test' in locals() and not y_test.empty:
    backtest_df = pd.DataFrame(index=X_test.index)
    if 'Close' in dataset.columns and 'signal' in dataset.columns and X_test.index.isin(dataset.index).all():
        backtest_df['Market_Returns'] = dataset.loc[X_test.index, 'Close'].pct_change()
        backtest_df['Predicted_Signal'] = y_pred_test
        backtest_df['Strategy_Returns'] = backtest_df['Market_Returns'] * backtest_df['Predicted_Signal'].shift(1)
        backtest_df['Actual_MAVG_Signal_Returns'] = backtest_df['Market_Returns'] * dataset.loc[X_test.index, 'signal'].shift(1)
        backtest_df.dropna(inplace=True)

        if not backtest_df.empty:
            backtest_df['Cumulative_Market_Returns'] = (1 + backtest_df['Market_Returns']).cumprod() - 1
            backtest_df['Cumulative_Strategy_Returns'] = (1 + backtest_df['Strategy_Returns']).cumprod() - 1
            backtest_df['Cumulative_Actual_MAVG_Signal_Returns'] = (1 + backtest_df['Actual_MAVG_Signal_Returns']).cumprod() - 1
            print("\nBacktesting Results (Last 5 days):\n", backtest_df.tail())
            plt.figure(figsize=(14, 7))
            backtest_df['Cumulative_Market_Returns'].plot(label='Market (Buy & Hold BTC)', color='gray', linestyle='--')
            backtest_df['Cumulative_Strategy_Returns'].plot(label='ML Strategy Returns', color='blue')
            backtest_df['Cumulative_Actual_MAVG_Signal_Returns'].plot(label='Original MAVG Signal Returns', color='orange')
            plt.title('Cumulative Returns Comparison')
            plt.ylabel('Cumulative Returns')
            plt.legend()
            plt.tight_layout()
            # plt.savefig('bitcoin_backtest_returns.png')
            print("\nSaved backtesting returns plot to bitcoin_backtest_returns.png")
            # plt.close()
        else:
            print("\nBacktest DataFrame is empty after processing; cannot plot returns.")
    else:
        print("\nCould not perform backtesting: 'Close' or 'signal' column missing or index mismatch.")
else:
    print("\nSkipping backtesting as no model was finalized or test/prediction data is unavailable.")

print("\n--- Tutorial: Algorithmic Bitcoin Trading Strategy Finished ---")

The plot comparing cumulative returns helps assess if the machine learning strategy added value over a simple buy-and-hold or the original MAVG crossover rule.

10. Conclusion and Next Steps

This tutorial demonstrated a complete workflow for building a Bitcoin trading strategy using machine learning classification. We covered:

Defining the problem as a classification task.
Fetching real market data using yfinance.
Extensive feature engineering using technical indicators.
Training, tuning, and evaluating various classification models.
Assessing feature importance.
Performing a simplified backtest.

The results of such a strategy can vary greatly depending on the chosen period, features, model, and market conditions. Key takeaways include the importance of robust feature engineering and careful model evaluation.

Further improvements and considerations could include:

More sophisticated feature engineering (e.g., volatility measures, order book data if available).
Different ways to define the target variable (e.g., predicting price change magnitude, multi-class signals like buy/sell/hold).
Advanced backtesting with considerations for transaction costs, slippage, and risk management.
Time series cross-validation techniques.
Exploring more complex models like LSTMs or other deep learning architectures, though they require more data and computational resources.

This framework provides a solid foundation for developing and testing algorithmic trading strategies based on machine learning.