In the world of quantitative finance, the pursuit of alphaβreturns that exceed a benchmarkβis the holy grail of investment strategies. At the heart of this pursuit lies the concept of alpha factors, sophisticated mathematical transformations of raw market data that aim to predict future asset price movements. These factors serve as the fundamental building blocks of algorithmic trading strategies, providing the signals that indicate when to buy or sell assets to generate superior returns.
Alpha factors are not merely academic constructs; they represent decades of research into how markets work and which features may better explain or predict price movements. From the pioneering work of Eugene Fama and Kenneth French on size and value factors to the more recent discoveries in momentum and quality investing, alpha factors embody our collective understanding of market inefficiencies and behavioral biases that create profit opportunities.
This tutorial provides a complete guide to developing, backtesting, and analyzing quantitative trading strategies for Bitcoin. We will transition from simple, rule-based strategies to a more sophisticated, data-driven approach using βalpha factors.β This will be done in four clear steps: data preparation, factor generation, performance analysis using the Information Coefficient (IC), and advanced visualization.
The core of this analysis is a script written in Python, leveraging
popular data science and financial libraries such as
yfinance
, pandas
, numpy
, and
talib
.
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns
import talib
from scipy import stats
from scipy.stats import spearmanr, pearsonr
import warnings
'ignore')
warnings.filterwarnings(
print("π¬ Bitcoin Alpha Factors from Technical Indicators")
print("="*70)
# === Load Bitcoin Data ===
print("π Loading Bitcoin data...")
= yf.download('BTC-USD', start='2020-01-01', end='2024-01-01', auto_adjust=False)
df if df.columns.nlevels > 1:
= df.droplevel(1, axis=1)
df
= df['Adj Close']
price = df['Volume'].fillna(method='ffill')
volume = df['High']
high = df['Low']
low = df['Open']
open_price = price.pct_change().dropna()
returns
# Convert to numpy arrays for TA-Lib
= price.values.astype(np.float64)
close = high.values.astype(np.float64)
high_arr = low.values.astype(np.float64)
low_arr = volume.values.astype(np.float64)
volume_arr
print(f"β
Data loaded: {len(price)} days from {price.index[0].date()} to {price.index[-1].date()}")
# === Forward Returns (Our Prediction Targets) ===
print("π― Creating forward return targets...")
= pd.DataFrame(index=price.index)
forward_returns = [1, 3, 7, 14, 30]
horizons
for h in horizons:
f'fwd_{h}d'] = price.pct_change(h).shift(-h)
forward_returns[
print(f"β
Created {len(horizons)} forward return targets")
# === ALPHA FACTORS FROM TECHNICAL INDICATORS ===
print("\nπ§ Computing technical indicators as alpha factors...")
= pd.DataFrame(index=price.index)
factors
# === 1. MOMENTUM INDICATORS AS FACTORS ===
print(" β‘ Momentum-based factors...")
# RSI as continuous factor (not binary signal)
'rsi_14'] = pd.Series(talib.RSI(close, 14), index=price.index)
factors['rsi_7'] = pd.Series(talib.RSI(close, 7), index=price.index)
factors['rsi_30'] = pd.Series(talib.RSI(close, 30), index=price.index)
factors[
# RSI relative to its own moving average
'rsi_relative'] = factors['rsi_14'] / factors['rsi_14'].rolling(20).mean()
factors[
# RSI momentum (change in RSI)
'rsi_momentum'] = factors['rsi_14'].diff(5)
factors[
# MACD as factors
= talib.MACD(close)
macd, macd_signal, macd_hist 'macd'] = pd.Series(macd, index=price.index)
factors['macd_signal'] = pd.Series(macd_signal, index=price.index)
factors['macd_histogram'] = pd.Series(macd_hist, index=price.index)
factors['macd_ratio'] = factors['macd'] / factors['macd_signal']
factors[
# Momentum indicators
'momentum_10'] = pd.Series(talib.MOM(close, 10), index=price.index) / price
factors['momentum_20'] = pd.Series(talib.MOM(close, 20), index=price.index) / price
factors[
# Rate of Change
'roc_10'] = pd.Series(talib.ROC(close, 10), index=price.index)
factors['roc_20'] = pd.Series(talib.ROC(close, 20), index=price.index)
factors[
# === 2. VOLATILITY INDICATORS AS FACTORS ===
print(" π Volatility-based factors...")
# Bollinger Bands
= talib.BBANDS(close)
bb_upper, bb_middle, bb_lower 'bb_upper'] = pd.Series(bb_upper, index=price.index)
factors['bb_lower'] = pd.Series(bb_lower, index=price.index)
factors['bb_width'] = (factors['bb_upper'] - factors['bb_lower']) / factors['bb_upper']
factors['bb_position'] = (price - factors['bb_lower']) / (factors['bb_upper'] - factors['bb_lower'])
factors[
# Average True Range
'atr'] = pd.Series(talib.ATR(high_arr, low_arr, close), index=price.index)
factors['atr_ratio'] = factors['atr'] / price
factors[
# Volatility measures
'volatility_10'] = returns.rolling(10).std() * np.sqrt(365)
factors['volatility_20'] = returns.rolling(20).std() * np.sqrt(365)
factors[
# === 3. OSCILLATOR INDICATORS AS FACTORS ===
print(" ποΈ Oscillator-based factors...")
# Stochastic oscillator
= talib.STOCH(high_arr, low_arr, close)
stoch_k, stoch_d 'stoch_k'] = pd.Series(stoch_k, index=price.index)
factors['stoch_d'] = pd.Series(stoch_d, index=price.index)
factors['stoch_diff'] = factors['stoch_k'] - factors['stoch_d']
factors[
# Williams %R
'williams_r'] = pd.Series(talib.WILLR(high_arr, low_arr, close), index=price.index)
factors[
# CCI
'cci'] = pd.Series(talib.CCI(high_arr, low_arr, close), index=price.index)
factors['cci_normalized'] = factors['cci'] / 100 # Normalize
factors[
# === 4. TREND INDICATORS AS FACTORS ===
print(" π Trend-based factors...")
# ADX (trend strength)
'adx'] = pd.Series(talib.ADX(high_arr, low_arr, close), index=price.index)
factors['plus_di'] = pd.Series(talib.PLUS_DI(high_arr, low_arr, close), index=price.index)
factors['minus_di'] = pd.Series(talib.MINUS_DI(high_arr, low_arr, close), index=price.index)
factors['di_diff'] = factors['plus_di'] - factors['minus_di']
factors[
# Moving averages as factors
'sma_10'] = pd.Series(talib.SMA(close, 10), index=price.index)
factors['sma_50'] = pd.Series(talib.SMA(close, 50), index=price.index)
factors['sma_200'] = pd.Series(talib.SMA(close, 200), index=price.index)
factors[
# Price relative to moving averages
'price_vs_sma10'] = price / factors['sma_10'] - 1
factors['price_vs_sma50'] = price / factors['sma_50'] - 1
factors['price_vs_sma200'] = price / factors['sma_200'] - 1
factors[
# Moving average crossovers (as continuous factor)
'ma_cross_signal'] = (factors['sma_10'] - factors['sma_50']) / factors['sma_50']
factors[
# === 5. VOLUME INDICATORS AS FACTORS ===
print(" π Volume-based factors...")
if not volume.isna().all():
# On-Balance Volume
'obv'] = pd.Series(talib.OBV(close, volume_arr), index=price.index)
factors['obv_sma'] = factors['obv'].rolling(20).mean()
factors['obv_ratio'] = factors['obv'] / factors['obv_sma']
factors[
# Volume moving averages
'volume_sma'] = volume.rolling(20).mean()
factors['relative_volume'] = volume / factors['volume_sma']
factors[
# Price Volume Trend
'pvt'] = ((price.diff() / price.shift(1)) * volume).cumsum()
factors['pvt_momentum'] = factors['pvt'].pct_change(10)
factors[
# === 6. CUSTOM FACTOR COMBINATIONS ===
print(" π¬ Custom factor combinations...")
# RSI divergence from price momentum
'rsi_price_divergence'] = factors['rsi_momentum'] - (factors['momentum_10'] * 1000)
factors[
# Volatility-adjusted momentum
'vol_adj_momentum'] = factors['momentum_20'] / factors['volatility_20']
factors[
# Trend quality (ADX * price momentum alignment)
'trend_quality'] = factors['adx'] * np.sign(factors['di_diff']) * factors['momentum_10']
factors[
print(f"β
Created {len(factors.columns)} alpha factors from technical indicators")
# === ALPHA FACTOR ANALYSIS ===
print("\nπ Analyzing factor predictive power...")
def calculate_ic(factor_series, forward_returns_series):
"""Calculate Information Coefficient (correlation between factor and forward returns)"""
# Remove NaN values
= pd.concat([factor_series, forward_returns_series], axis=1).dropna()
combined if len(combined) < 30: # Need enough data
return np.nan, np.nan
# Spearman correlation (rank-based, more robust)
= spearmanr(combined.iloc[:, 0], combined.iloc[:, 1])
ic, p_value return ic, p_value
# Calculate IC for all factor-horizon combinations
= pd.DataFrame(index=factors.columns, columns=[f'IC_{h}d' for h in horizons])
ic_matrix = pd.DataFrame(index=factors.columns, columns=[f'pval_{h}d' for h in horizons])
pvalue_matrix
print(" π Computing Information Coefficients...")
for factor_name in factors.columns:
for h in horizons:
= factors[factor_name]
factor_series = forward_returns[f'fwd_{h}d']
fwd_ret_series
= calculate_ic(factor_series, fwd_ret_series)
ic, p_val f'IC_{h}d'] = ic
ic_matrix.loc[factor_name, f'pval_{h}d'] = p_val
pvalue_matrix.loc[factor_name,
# Convert to numeric
= ic_matrix.astype(float)
ic_matrix = pvalue_matrix.astype(float)
pvalue_matrix
# === FACTOR RANKING ===
print("\nπ Ranking alpha factors by predictive power...")
# Calculate mean absolute IC across all horizons
'mean_abs_ic'] = ic_matrix.abs().mean(axis=1)
ic_matrix['mean_ic'] = ic_matrix.iloc[:, :-1].mean(axis=1) # Exclude mean_abs_ic column
ic_matrix[
# Sort by mean absolute IC
= ic_matrix.sort_values('mean_abs_ic', ascending=False)
factor_ranking
print("\nπ TOP 15 ALPHA FACTORS:")
print("="*60)
print(f"{'Factor':<25} {'Mean |IC|':<10} {'Mean IC':<10} {'1d IC':<8} {'7d IC':<8} {'30d IC':<8}")
print("-"*60)
for i, (factor, row) in enumerate(factor_ranking.head(15).iterrows()):
= row['mean_abs_ic']
mean_abs_ic = row['mean_ic']
mean_ic = ic_matrix.loc[factor, 'IC_1d']
ic_1d = ic_matrix.loc[factor, 'IC_7d']
ic_7d = ic_matrix.loc[factor, 'IC_30d']
ic_30d
print(f"{factor:<25} {mean_abs_ic:>8.3f} {mean_ic:>8.3f} {ic_1d:>6.3f} {ic_7d:>6.3f} {ic_30d:>6.3f}")
# === FACTOR VISUALIZATION ===
print(f"\nπ Creating factor analysis visualizations...")
# 1. IC Heatmap
=(12, 10))
plt.figure(figsize20, :-2], cmap='RdBu_r', center=0,
sns.heatmap(ic_matrix.iloc[:=True, fmt='.3f', cbar_kws={'label': 'Information Coefficient'})
annot'π₯ Alpha Factor Information Coefficients Heatmap (Top 20)', fontsize=14, fontweight='bold')
plt.title('Forward Return Horizons')
plt.xlabel('Alpha Factors')
plt.ylabel(
plt.tight_layout()
plt.show()
# 2. Factor Distribution
= factor_ranking.head(8).index
top_factors = plt.subplots(2, 4, figsize=(20, 10))
fig, axes = axes.flatten()
axes
for i, factor in enumerate(top_factors):
= axes[i]
ax = factors[factor].dropna()
factor_values
=50, alpha=0.7, color='skyblue', edgecolor='black')
ax.hist(factor_values, binsf'{factor}', fontsize=12, fontweight='bold')
ax.set_title('Factor Value')
ax.set_xlabel('Frequency')
ax.set_ylabel(True, alpha=0.3)
ax.grid(
'π Distribution of Top Alpha Factors', fontsize=16, fontweight='bold')
plt.suptitle(
plt.tight_layout()
plt.show()
# 3. IC Time Series for Best Factors
print(" π Plotting IC time series for best factors...")
# Calculate rolling IC for top 4 factors
= factor_ranking.head(4).index
top_4_factors = 60 # 60-day rolling IC
window
= plt.subplots(2, 2, figsize=(16, 10))
fig, axes = axes.flatten()
axes
for i, factor in enumerate(top_4_factors):
= axes[i]
ax
# Calculate rolling IC
= pd.Series(index=factors.index, dtype=float)
rolling_ic
for j in range(window, len(factors)):
= factors[factor].iloc[j-window:j]
factor_window = forward_returns['fwd_7d'].iloc[j-window:j]
fwd_ret_window
= pd.concat([factor_window, fwd_ret_window], axis=1).dropna()
combined if len(combined) > 20:
= spearmanr(combined.iloc[:, 0], combined.iloc[:, 1])
ic, _ = ic
rolling_ic.iloc[j]
='darkblue', linewidth=2)
ax.plot(rolling_ic.dropna(), color=0, color='red', linestyle='--', alpha=0.5)
ax.axhline(yf'{factor} - Rolling 60D IC', fontsize=12, fontweight='bold')
ax.set_title('Information Coefficient')
ax.set_ylabel(True, alpha=0.3)
ax.grid(
'π Rolling Information Coefficient Time Series', fontsize=16, fontweight='bold')
plt.suptitle(
plt.tight_layout()
plt.show()
# === FACTOR COMBINATION ANALYSIS ===
print("\nπ€ Testing factor combinations...")
# Select top factors that are not too correlated
= factor_ranking.head(10).index
top_factors_for_combo
# Calculate factor correlation matrix
= factors[top_factors_for_combo].corr()
factor_corr
print("\nπ Correlation Matrix of Top Factors:")
=(10, 8))
plt.figure(figsize=True, cmap='RdBu_r', center=0, fmt='.2f')
sns.heatmap(factor_corr, annot'Factor Correlation Matrix', fontsize=14, fontweight='bold')
plt.title(
plt.tight_layout()
plt.show()
# Create a simple ensemble factor
print("\nπ― Creating ensemble alpha factor...")
# Select relatively uncorrelated top factors
= []
uncorr_factors for factor in top_factors_for_combo:
if not uncorr_factors:
uncorr_factors.append(factor)else:
# Check if this factor is not too correlated with existing factors
= max(abs(factor_corr.loc[factor, uf]) for uf in uncorr_factors)
max_corr if max_corr < 0.7: # Less than 70% correlation
uncorr_factors.append(factor)if len(uncorr_factors) >= 5: # Limit to 5 factors
break
print(f"Selected {len(uncorr_factors)} uncorrelated factors for ensemble:")
for factor in uncorr_factors:
print(f" - {factor}")
# Weight factors by their IC
= {}
weights for factor in uncorr_factors:
= factor_ranking.loc[factor, 'mean_abs_ic']
weights[factor]
= sum(weights.values())
total_weight = {f: w/total_weight for f, w in weights.items()}
normalized_weights
# Create ensemble factor
= pd.Series(0, index=factors.index)
ensemble_factor for factor, weight in normalized_weights.items():
+= factors[factor].fillna(0) * weight
ensemble_factor
# Test ensemble factor
print(f"\nπ Ensemble Factor Performance:")
for h in horizons:
= calculate_ic(ensemble_factor, forward_returns[f'fwd_{h}d'])
ic, p_val print(f" {h}d IC: {ic:.4f} (p-value: {p_val:.4f})")
# === FINAL INSIGHTS ===
print(f"\nπ‘ KEY INSIGHTS")
print("="*50)
= factor_ranking.index[0]
best_factor = factor_ranking.loc[best_factor, 'mean_abs_ic']
best_ic
print(f"π₯ Best Alpha Factor: {best_factor}")
print(f" Mean |IC|: {best_ic:.4f}")
print(f"\nπ Factor Categories Performance:")
# Group factors by category
= [f for f in factor_ranking.index if any(x in f.lower() for x in ['rsi', 'momentum', 'roc', 'macd'])]
momentum_factors = [f for f in factor_ranking.index if any(x in f.lower() for x in ['bb', 'atr', 'volatility'])]
volatility_factors = [f for f in factor_ranking.index if any(x in f.lower() for x in ['stoch', 'williams', 'cci'])]
oscillator_factors
= {
categories 'Momentum': momentum_factors[:5],
'Volatility': volatility_factors[:5],
'Oscillators': oscillator_factors[:5]
}
for category, factor_list in categories.items():
if factor_list:
= factor_ranking.loc[factor_list, 'mean_abs_ic'].mean()
avg_ic print(f" {category}: {avg_ic:.4f}")
print(f"\n⨠Technical indicators CAN be good alpha factors!")
print(f" The key is using them as continuous predictive signals,")
print(f" not binary trading rules that underperform buy-and-hold.")
print(f"\nβ
Alpha factor analysis complete!")
The process can be broken down into several key stages:
Data Ingestion: The script begins by downloading
historical price and volume data for Bitcoin. It uses the
yfinance
library to get data for a specified date range and
includes a critical step to clean and prepare the data, ensuring it is
ready for analysis.
Generating Forward Returns: To determine if an indicator has predictive power, we need to define what it is trying to predict. The script computes βforward returnsβ for various time horizons (e.g., 1 day, 7 days, 30 days). These future returns serve as the target variables for our analysis.
Alpha Factor Creation: The script calculates a
wide array of technical indicators using the talib
library,
which are then treated as our candidate βalpha factors.β The indicators
are grouped into categories to facilitate a structured analysis:
Information Coefficient (IC) Analysis: The heart of the framework is a statistical test called the Information Coefficient (IC). The script calculates the Spearman rank correlation between each alpha factor and the future returns. The Spearman IC is a non-parametric measure that assesses the strength and direction of a monotonic relationship between two variables. An IC close to +1 suggests a strong positive correlation (higher factor values predict higher returns), while an IC near -1 suggests a strong negative correlation. An IC close to 0 indicates no predictive relationship.
Factor Ranking and Visualization: The script is designed to rank the alpha factors based on their IC scores. Factors with higher absolute IC values are considered more predictive. The analysis also includes visualization components to better understand the factorsβ behavior:
Ensemble Factor Experimentation: The final part of the script explores the concept of combining multiple factors. By creating an βensemble factorβ from a selection of the top-performing, yet relatively uncorrelated, factors, the script aims to demonstrate how blending signals can potentially lead to a more robust and stable predictive model.
Strong Positive Predictors (Red):
rsi_14
, rsi_7
, rsi_30
- Lower
RSI predicts higher future returns (contrarian)macd
, macd_signal
- MACD momentum predicts
continuationmomentum_20
, roc_20
- Price momentum
continuesbb_position
- Bollinger band position mattersStrong Negative Predictors (Blue):
bb_upper
, bb_lower
- When price hits
Bollinger bands, expect reversalatr
- High volatility predicts lower future
returnsTime Horizon Patterns:
The Real Alpha: The heatmap shows that momentum and mean-reversion both work in Bitcoin - just at different timeframes and with different indicators!
This heatmap IS the alpha research. Any quant fund would pay serious money for insights this clear! #### Conclusion
The provided script represents a sophisticated approach to quantitative trading research. It moves beyond a simple, qualitative interpretation of technical indicators to a rigorous, data-driven methodology for identifying genuine alpha. By calculating Information Coefficients, ranking factors by their predictive power, and testing factor combinations, this framework provides a solid foundation for any aspiring quantitative trader looking to build a systematic, evidence-based strategy for a volatile asset like Bitcoin.