I Built a Stock Predictor (That Actually Works) Using Transformers and Conformal Prediction

The "Magical" 57% 📉

If you tell a quantitative researcher you can predict the S&P 500 with 57.93% directional accuracy, they'll usually laugh at you.

"Markets are efficient," they'll say. "It's a random walk."

And they're mostly right. Financial data is a mess of noise, non-stationarity, and chaos. But 57.93% is exactly what my final model achieved on out-of-sample data (2015-2025).

It wasn't easy. I didn't just throw an LSTM at some closing prices and call it a day. I had to rethink the problem from the ground up—moving away from "predicting price" to "predicting regime," and learning to incorporate the most valuable signal of all: uncertainty.

Why Most Retail Models Fail

I started this project where everyone starts: scikit- learn, a Random Forest, and some moving averages.

The results? A miserable 54% accuracy (barely better than a coin flip). The model would overfit to recent volatility and get crushed whenever the market regime changed—like the 2020 COVID crash or the 2022 inflation pivot.

I needed three things to break the 55% ceiling:

More than just price: The market runs on fear and news, not just past returns.
Better architecture: Simple RNNs forget too much; Transformers need too much data.
Risk management: A model that always bets is a model that eventually goes broke.

The Architecture: A "Stacked" Approach 🧠

I settled on a multi-modal approach that fuses three distinct data streams:

Market Data: OHLCV, RSI, MACD (the technicals).
Macro Data: VIX, Treasury Yields, DXY (the environment).
Sentiment: Financial news headlines processed by FinBERT.

The Stacked Generalization Ensemble

Instead of trusting one model, I built a Council of Elders.

Member 1: The Transformer (127k params)

A lightweight, decoder-only transformer. It uses self-attention to look at the global context of the last 60 days. It answers the question: "Have we seen this pattern before, anywhere in history?"

Member 2: The Bi-LSTM (240k params)

A Bidirectional LSTM. It reads the sequence forwards and backwards. It creates a robust local representation of momentum. It answers: "What is the immediate trend right now?"

These two models feed their "opinions" (embeddings) into a final Meta-Learner which makes the actual call.

The Secret Sauce: Tri-Class Labeling & Conformal Prediction

This is where the real gains came from.

1. Stop Trying to Predict Flat Days

Most models are binary: Buy or Sell. But what if the market moves 0.01%? That's effectively noise.

I switched to Tri-Class Labeling:

* UP: > +0.3%

* DOWN: < -0.3%

* FLAT: Everything in between.

By explicitly training the model to recognize "Flat" days, I stopped it from hallucinating trends where there were none.

2. Conformal Prediction (The Game Changer)

This is the most "production-grade" part of the project.

In traditional ML, model.predict() gives you a class. Maybe 0.51 confidence for UP and 0.49 for DOWN. A standard model buys. A smart trader stays in cash.

I implemented Conformal Prediction, a statistical framework that outputs sets of labels with a guaranteed coverage probability.

If the model is uncertain, the prediction set isn't ['Up']. It's ['Up', 'Down', 'Flat'].

Translation: "I have no idea."

When the model output includes multiple classes (representing high uncertainty), I abstain from trading.

The Results

By trading only when the model was confident (calibration), the metrics jumped:

Model	Directional Accuracy
Random Forest	54.71%
Pure LSTM	55.2%
My Ensemble	57.93%

On "high confidence" days, the accuracy was even higher (mid-60s). Because I sat out the confusing, choppy days, the risk-adjusted returns (Sharpe Ratio) were significantly better than a buy-and-hold strategy during volatile periods.

Final Thoughts

This project taught me that in financial ML, architecture < data < problem formulation.

The Transformer didn't save me. The Conformal Prediction—the ability to say "I don't know"—did.

If you want to read the full 20-page report with all the math and backtests, you can grab the PDF below.

Download Full Report

The "Magical" 57% 📉

If you tell a quantitative researcher you can predict the S&P 500 with 57.93% directional accuracy, they'll usually laugh at you.

"Markets are efficient," they'll say. "It's a random walk."

And they're mostly right. Financial data is a mess of noise, non-stationarity, and chaos. But 57.93% is exactly what my final model achieved on out-of-sample data (2015-2025).

Why Most Retail Models Fail

I started this project where everyone starts: scikit- learn, a Random Forest, and some moving averages.

I needed three things to break the 55% ceiling:

More than just price: The market runs on fear and news, not just past returns.
Better architecture: Simple RNNs forget too much; Transformers need too much data.
Risk management: A model that always bets is a model that eventually goes broke.

The Architecture: A "Stacked" Approach 🧠

I settled on a multi-modal approach that fuses three distinct data streams:

Market Data: OHLCV, RSI, MACD (the technicals).
Macro Data: VIX, Treasury Yields, DXY (the environment).
Sentiment: Financial news headlines processed by FinBERT.

The Stacked Generalization Ensemble

Instead of trusting one model, I built a Council of Elders.

Member 1: The Transformer (127k params)

A lightweight, decoder-only transformer. It uses self-attention to look at the global context of the last 60 days. It answers the question: "Have we seen this pattern before, anywhere in history?"

Member 2: The Bi-LSTM (240k params)

A Bidirectional LSTM. It reads the sequence forwards and backwards. It creates a robust local representation of momentum. It answers: "What is the immediate trend right now?"

These two models feed their "opinions" (embeddings) into a final Meta-Learner which makes the actual call.

The Secret Sauce: Tri-Class Labeling & Conformal Prediction

This is where the real gains came from.

1. Stop Trying to Predict Flat Days

Most models are binary: Buy or Sell. But what if the market moves 0.01%? That's effectively noise.

I switched to Tri-Class Labeling:

* UP: > +0.3%

* DOWN: < -0.3%

* FLAT: Everything in between.

By explicitly training the model to recognize "Flat" days, I stopped it from hallucinating trends where there were none.

2. Conformal Prediction (The Game Changer)

This is the most "production-grade" part of the project.

In traditional ML, model.predict() gives you a class. Maybe 0.51 confidence for UP and 0.49 for DOWN. A standard model buys. A smart trader stays in cash.

I implemented Conformal Prediction, a statistical framework that outputs sets of labels with a guaranteed coverage probability.

If the model is uncertain, the prediction set isn't ['Up']. It's ['Up', 'Down', 'Flat'].

Translation: "I have no idea."

When the model output includes multiple classes (representing high uncertainty), I abstain from trading.

The Results

By trading only when the model was confident (calibration), the metrics jumped:

Model	Directional Accuracy
Random Forest	54.71%
Pure LSTM	55.2%
My Ensemble	57.93%

Final Thoughts

This project taught me that in financial ML, architecture < data < problem formulation.

The Transformer didn't save me. The Conformal Prediction—the ability to say "I don't know"—did.

If you want to read the full 20-page report with all the math and backtests, you can grab the PDF below.

Download Full Report

I Built a Stock Predictor (That Actually Works) Using Transformers and Conformal Prediction

The "Magical" 57% 📉

Why Most Retail Models Fail

The Architecture: A "Stacked" Approach 🧠

The Stacked Generalization Ensemble

The Secret Sauce: Tri-Class Labeling & Conformal Prediction

1. Stop Trying to Predict Flat Days

2. Conformal Prediction (The Game Changer)

The Results

Final Thoughts

Related Articles

The Death of Big Data? Why 2026 Data Science is "Small" and Fast

Winning the GlobeStrat'25 Hackathon: Building the Future of Smart Shopping

I Built a Stock Predictor (That Actually Works) Using Transformers and Conformal Prediction

The "Magical" 57% 📉

Why Most Retail Models Fail

The Architecture: A "Stacked" Approach 🧠

The Stacked Generalization Ensemble

The Secret Sauce: Tri-Class Labeling & Conformal Prediction

1. Stop Trying to Predict Flat Days

2. Conformal Prediction (The Game Changer)

The Results

Final Thoughts

Related Articles

The Death of Big Data? Why 2026 Data Science is "Small" and Fast

Winning the GlobeStrat'25 Hackathon: Building the Future of Smart Shopping