← Back to Home
Bayesian Regression with Serially Correlated Errors for Modeling Alpha

Bayesian Regression with Serially Correlated Errors for Modeling Alpha

Factor models are widely used in finance for explaining asset returns in terms of systematic risk factors and idiosyncratic components. The classical specification is

\[ R_t = \alpha + \beta^\top F_t + \epsilon_t \]

where \(R_t\) is the asset return, \(F_t\) are the factor returns, and \(\epsilon_t\) is the error term. A common assumption is that \(\epsilon_t\) are i.i.d. Gaussian white noise. In many return series, however, the residuals exhibit serial correlation, which can bias estimates and understate uncertainty.

An extension that addresses this issue models the residuals as an autoregressive process:

\[ \epsilon_t = \phi \epsilon_{t-1} + a_t \]

\[ a_t \sim \mathcal{N}(0, \sigma^2) \]

The result is a regression model with AR(1) errors, estimated here in a Bayesian framework using Gibbs sampling.

Model Specification

Let \(y_t\) be the dependent variable and \(x_t\) the vector of regressors including a constant. The model is

\[ y_t = x_t^\top \beta + z_t \]

\[ z_t = \phi z_{t-1} + a_t, \quad a_t \stackrel{\text{i.i.d.}}{\sim} N(0,\sigma^2) \]

where \(\beta\) are regression coefficients, \(\phi\) is the AR(1) parameter, and \(\sigma^2\) is the innovation variance.

Bayesian Estimation

We specify priors:

\[ \beta \sim N(\beta_0, V_0), \quad \phi \sim N(\phi_0, s_\phi^2) \text{ truncated to } (-1,1), \quad \sigma^2 \sim \mathrm{InvGamma}(a_\sigma, b_\sigma) \]

The Gibbs sampler cycles through:

  1. Sample \(\beta\) conditional on \(\phi, \sigma^2\) from a normal distribution
  2. Sample \(\phi\) conditional on \(\beta, \sigma^2\) from a truncated normal
  3. Sample \(\sigma^2\) conditional on \(\beta, \phi\) from an inverse-gamma

Python Implementation

# Bayesian Linear Regression with AR(1) Errors — REAL DATA via yfinance
# - Downloads prices for: target asset + factor proxies (SPY market, IWM size, VLUE value)
# - Builds daily log-return regressors
# - Runs Gibbs sampler (β, φ, σ²)
# - PLOTS traces, posteriors, and posterior predictive
# Requirements: numpy, scipy, matplotlib, yfinance  (pip install yfinance)

import sys, subprocess
def _pip(pkg): 
    try:
        __import__(pkg.split("==")[0])
    except ImportError:
        subprocess.check_call([sys.executable, "-m", "pip", "install", pkg])

_pip("yfinance")

import numpy as np
from numpy.linalg import inv, cholesky
from scipy.stats import invgamma, truncnorm
import matplotlib.pyplot as plt
import yfinance as yf
import pandas as pd

# -----------------------------
# Helper random draws
# -----------------------------
def mvn_rand(mean, cov):
    L = cholesky(cov)
    return mean + L @ np.random.randn(mean.shape[0])

def truncnorm_rand(mu, sigma2, a, b):
    sd = np.sqrt(sigma2)
    a_std = (a - mu) / sd
    b_std = (b - mu) / sd
    return truncnorm.rvs(a=a_std, b=b_std, loc=mu, scale=sd)

# -----------------------------
# Gibbs: y_t = x_t'β + z_t,  z_t = φ z_{t-1} + a_t,  a_t ~ N(0, σ²)
# -----------------------------
def gibbs_reg_ar1(
    y, X,
    n_iter=9000, burnin=3000, thin=5,
    beta0=None, V0=None,
    phi0=0.0, s2_phi0=0.5,     # prior φ ~ N(phi0, s2_phi0) truncated (-1,1)
    a_sigma=2.0, b_sigma=0.5,  # prior σ² ~ InvGamma(a_sigma, b_sigma)
    phi_bounds=(-0.999, 0.999),
    seed=123
):
    if seed is not None:
        np.random.seed(seed)

    y = np.asarray(y).reshape(-1)
    X = np.asarray(X)
    n, k = X.shape

    if beta0 is None: beta0 = np.zeros(k)
    if V0   is None: V0   = np.eye(k) * 5.0
    V0_inv = inv(V0)

    # OLS init (ignoring AR errors)
    beta = np.linalg.lstsq(X, y, rcond=None)[0]
    resid = y - X @ beta
    phi = 0.0
    sigma2 = np.var(resid)

    n_keep = (n_iter - burnin) // thin
    beta_s = np.zeros((n_keep, k))
    phi_s  = np.zeros(n_keep)
    s2_s   = np.zeros(n_keep)

    keep = 0
    for it in range(n_iter):
        # 1) β | φ, σ²
        y_o = y[1:] - phi * y[:-1]
        X_o = X[1:] - phi * X[:-1]
        XtX = X_o.T @ X_o
        Xty = X_o.T @ y_o

        V_star_inv = XtX / sigma2 + V0_inv
        V_star     = inv(V_star_inv)
        m_star     = V_star @ (Xty / sigma2 + V0_inv @ beta0)
        beta = mvn_rand(m_star, V_star)

        # 2) φ | β, σ²   (truncated normal)
        z = y - X @ beta
        zlag, zcur = z[:-1], z[1:]
        S_zz = float(zlag @ zlag)
        S_zy = float(zlag @ zcur)
        s2_phi_post = 1.0 / (S_zz / sigma2 + 1.0 / s2_phi0)
        m_phi_post  = s2_phi_post * (S_zy / sigma2 + phi0 / s2_phi0)
        phi = truncnorm_rand(m_phi_post, s2_phi_post, phi_bounds[0], phi_bounds[1])

        # 3) σ² | β, φ
        a = z[1:] - phi * z[:-1]
        shape = a_sigma + 0.5 * (n - 1)
        scale = b_sigma + 0.5 * float(a @ a)
        sigma2 = invgamma.rvs(a=shape, scale=scale)

        if it >= burnin and ((it - burnin) % thin == 0):
            beta_s[keep] = beta
            phi_s[keep]  = phi
            s2_s[keep]   = sigma2
            keep += 1

    return {
        "beta_samples": beta_s, "phi_samples": phi_s, "sigma2_samples": s2_s,
        "beta_mean": beta_s.mean(axis=0), "beta_sd": beta_s.std(axis=0, ddof=1),
        "phi_mean": phi_s.mean(), "phi_sd": phi_s.std(ddof=1),
        "sigma2_mean": s2_s.mean(), "sigma2_sd": s2_s.std(ddof=1)
    }

# -----------------------------
# Posterior predictive (1-step; can set horizon>1)
# -----------------------------
def posterior_predictive(y, X, X_new, samples, horizon=1):
    beta_s = samples["beta_samples"]
    phi_s  = samples["phi_samples"]
    s2_s   = samples["sigma2_samples"]
    m = beta_s.shape[0]

    beta_mean = samples["beta_mean"]
    z_hist = y - X @ beta_mean
    zT = z_hist[-1]

    preds = np.zeros(m)
    x_new = X_new.reshape(1, -1)

    for i in range(m):
        b, ph, s = beta_s[i], phi_s[i], np.sqrt(s2_s[i])
        z_next = ph * zT + np.random.randn() * s
        y_next = float(x_new @ b + z_next)
        zc, yc = z_next, y_next
        for _ in range(2, horizon + 1):
            zc = ph * zc + np.random.randn() * s
            yc = float(x_new @ b + zc)
        preds[i] = yc
    return preds

# -----------------------------
# Data: pick your target and factor proxies
# -----------------------------
if __name__ == "__main__":
    # CONFIG —— change these if you want different assets
    target = "AAPL"                     # asset to model
    factors = ["SPY", "IWM", "VLUE"]    # SPY=market, IWM=size-ish, VLUE=value-ish
    start = "2015-01-01"
    end   = None                        # None = up to today

    tickers = [target] + factors
    df = yf.download(tickers, start=start, end=end, progress=False, auto_adjust=False)["Adj Close"].dropna()

    # Daily log returns
    ret = np.log(df).diff().dropna()

    # Response: target return
    y_series = ret[target].copy()

    # Factor proxies (no risk-free subtraction here; add T-bill if you want excess returns)
    # X columns: [1, SPY, (IWM - SPY) size spread, (VLUE - SPY) value spread]
    X_df = pd.DataFrame(index=ret.index)
    X_df["intercept"] = 1.0
    X_df["mkt"] = ret["SPY"]
    X_df["size_spread"] = ret["IWM"] - ret["SPY"]
    X_df["value_spread"] = ret["VLUE"] - ret["SPY"]

    # Align to common index with no NaNs
    data = pd.concat([y_series.rename("y"), X_df], axis=1).dropna()
    y = data["y"].values
    X = data[["intercept", "mkt", "size_spread", "value_spread"]].values
    dates = data.index

    print(f"Sample: {dates[0].date()}{dates[-1].date()} | n={len(y)} obs, k={X.shape[1]} regressors")

    # -----------------------------
    # Run Gibbs
    # -----------------------------
    res = gibbs_reg_ar1(y, X, n_iter=9000, burnin=3000, thin=5, seed=123)

    # Quick text summary
    cols = ["intercept", "mkt", "size_spread", "value_spread"]
    for j, name in enumerate(cols):
        print(f"{name:>12s}: mean={res['beta_mean'][j]: .5f}, sd={res['beta_sd'][j]: .5f}")
    print(f"{'phi':>12s}: mean={res['phi_mean']: .4f}, sd={res['phi_sd']: .4f}")
    print(f"{'sigma^2':>12s}: mean={res['sigma2_mean']: .6f}, sd={res['sigma2_sd']: .6f}")

    # -----------------------------
    # PLOTS — each in its own figure (matplotlib only)
    # -----------------------------

    # Trace: phi
    plt.figure()
    plt.plot(res["phi_samples"])
    plt.title(f"Trace: phi (AR1) — {target}")
    plt.xlabel("Iteration")
    plt.ylabel("phi")
    plt.show()

    # Trace: sigma^2
    plt.figure()
    plt.plot(res["sigma2_samples"])
    plt.title(f"Trace: sigma^2 — {target}")
    plt.xlabel("Iteration")
    plt.ylabel("sigma^2")
    plt.show()

    # Trace: betas (overlay)
    plt.figure()
    for j, name in enumerate(cols):
        plt.plot(res["beta_samples"][:, j], label=name)
    plt.title(f"Trace: beta — {target}")
    plt.xlabel("Iteration")
    plt.ylabel("beta")
    plt.legend()
    plt.show()

    # Posterior: phi
    plt.figure()
    plt.hist(res["phi_samples"], bins=40, density=True)
    plt.title(f"Posterior: phi — {target}")
    plt.xlabel("phi")
    plt.ylabel("density")
    plt.show()

    # Posterior: sigma^2
    plt.figure()
    plt.hist(res["sigma2_samples"], bins=40, density=True)
    plt.title(f"Posterior: sigma^2 — {target}")
    plt.xlabel("sigma^2")
    plt.ylabel("density")
    plt.show()

  
    # Posterior: betas — split plots to avoid scale blow-up
    # (1) Intercept alone
    plt.figure()
    plt.hist(res["beta_samples"][:, 0], bins='fd', density=True)
    plt.title(f"Posterior: intercept — {target}")
    plt.xlabel("beta"); plt.ylabel("density")
    plt.show()
    
    # (2) Factors overlayed (exclude intercept)
    plt.figure()
    for j, name in enumerate(cols[1:], start=1):
        plt.hist(res["beta_samples"][:, j], bins='fd', density=True, alpha=0.5, label=name)
    plt.title(f"Posterior: factor betas — {target}")
    plt.xlabel("beta"); plt.ylabel("density")
    plt.legend(); plt.show()


    # Posterior predictive (1-step ahead) using the last row of X
    preds = posterior_predictive(y, X, X[-1], res, horizon=1)
    plt.figure()
    plt.hist(preds, bins=40, density=True)
    plt.title(f"Posterior Predictive: y_(T+1) — {target}")
    plt.xlabel("predicted return")
    plt.ylabel("density")
    plt.show()

The code first downloads price data and computes log returns. It builds factor spreads from market (SPY), size (IWM − SPY), and value (VLUE − SPY). The Gibbs sampler is implemented in gibbs_reg_ar1, which transforms the data to account for AR(1) residual structure and samples from the conditional posteriors for β, φ, and σ². The posterior_predictive function uses posterior draws to simulate the next period’s return. Plots include trace plots and posterior histograms for parameters and predictions.

Results

Pasted image 20250809210517.png Pasted image 20250809210526.png Pasted image 20250809210548.png

   Sample: 2015-01-05 → 2025-08-08 | n=2665 obs, k=4 regressors
   intercept: mean= 0.00012, sd= 0.00047
         mkt: mean= 1.22776, sd= 0.03880
 size_spread: mean=-0.16722, sd= 0.07164
value_spread: mean=-0.50466, sd= 0.10243
         phi: mean= 0.0624, sd= 0.0373
     sigma^2: mean= 0.000511, sd= 0.000014

From the results, the overall conclusion is:

  1. No meaningful alpha
    The intercept is very close to zero (0.00012) with a standard deviation that’s about four times larger, so there’s no statistically significant excess return unexplained by the factors.

  2. Strong market exposure
    The market beta (1.22775 ± 0.0388) is well above 1, meaning AAPL’s daily returns have higher volatility and responsiveness to market movements than the market itself.

  3. Negative size exposure
    The size spread beta is −0.16722 ± 0.07164, implying AAPL behaves more like a large-cap stock and underreacts to small-cap outperformance.

  4. Negative value exposure
    The value spread beta is −0.50465 ± 0.10243, consistent with AAPL being a growth-oriented stock that tends to underperform when value stocks outperform.

  5. Low residual autocorrelation
    The AR(1) parameter φ is 0.0624 ± 0.0373, small enough that residual autocorrelation is minimal. This means the model already explains most predictable structure in returns, and the idiosyncratic part is close to white noise.

  6. Stable idiosyncratic volatility
    The innovation variance σ² is 0.000511, which corresponds to about 2.26% daily idiosyncratic volatility (0.000511≈2.26% %0.000511​≈2.26%). This is small relative to AAPL’s total volatility, meaning most of the variation is explained by the factors.

In short: AAPL’s returns during 2015–2025 are almost entirely explained by the chosen market, size, and value factors, with high market beta, large-cap/growth tilt, and very little residual autocorrelation.