The "Magical" 57% 📉
If you tell a quantitative researcher you can predict the S&P 500 with 57.93% directional accuracy, they'll usually laugh at you.
"Markets are efficient," they'll say. "It's a random walk."
And they're mostly right. Financial data is a mess of noise, non-stationarity, and chaos. But 57.93% is exactly what my final model achieved on out-of-sample data (2015-2025).
It wasn't easy. I didn't just throw an LSTM at some closing prices and call it a day. I had to rethink the problem from the ground up—moving away from "predicting price" to "predicting regime," and learning to incorporate the most valuable signal of all: uncertainty.
Why Most Retail Models Fail
I started this project where everyone starts: scikit- learn, a Random Forest, and some moving averages.
The results? A miserable 54% accuracy (barely better than a coin flip). The model would overfit to recent volatility and get crushed whenever the market regime changed—like the 2020 COVID crash or the 2022 inflation pivot.
I needed three things to break the 55% ceiling:
- More than just price: The market runs on fear and news, not just past returns.
- Better architecture: Simple RNNs forget too much; Transformers need too much data.
- Risk management: A model that always bets is a model that eventually goes broke.
The Architecture: A "Stacked" Approach 🧠
I settled on a multi-modal approach that fuses three distinct data streams:
- Market Data: OHLCV, RSI, MACD (the technicals).
- Macro Data: VIX, Treasury Yields, DXY (the environment).
- Sentiment: Financial news headlines processed by FinBERT.
The Stacked Generalization Ensemble
Instead of trusting one model, I built a Council of Elders.
Member 1: The Transformer (127k params)
A lightweight, decoder-only transformer. It uses self-attention to look at the global context of the last 60 days. It answers the question: "Have we seen this pattern before, anywhere in history?"
Member 2: The Bi-LSTM (240k params)
A Bidirectional LSTM. It reads the sequence forwards and backwards. It creates a robust local representation of momentum. It answers: "What is the immediate trend right now?"
These two models feed their "opinions" (embeddings) into a final Meta-Learner which makes the actual call.
The Secret Sauce: Tri-Class Labeling & Conformal Prediction
This is where the real gains came from.
1. Stop Trying to Predict Flat Days
Most models are binary: Buy or Sell. But what if the market moves 0.01%? That's effectively noise.
I switched to Tri-Class Labeling:
* UP: > +0.3%
* DOWN: < -0.3%
* FLAT: Everything in between.
By explicitly training the model to recognize "Flat" days, I stopped it from hallucinating trends where there were none.
2. Conformal Prediction (The Game Changer)
This is the most "production-grade" part of the project.
In traditional ML, model.predict() gives you a class. Maybe 0.51 confidence for UP and 0.49 for DOWN. A standard model buys. A smart trader stays in cash.
I implemented Conformal Prediction, a statistical framework that outputs sets of labels with a guaranteed coverage probability.

If the model is uncertain, the prediction set isn't ['Up']. It's ['Up', 'Down', 'Flat'].
Translation: "I have no idea."
When the model output includes multiple classes (representing high uncertainty), I abstain from trading.
The Results
By trading only when the model was confident (calibration), the metrics jumped:
| Model | Directional Accuracy |
|---|---|
| Random Forest | 54.71% |
| Pure LSTM | 55.2% |
| My Ensemble | 57.93% |
On "high confidence" days, the accuracy was even higher (mid-60s). Because I sat out the confusing, choppy days, the risk-adjusted returns (Sharpe Ratio) were significantly better than a buy-and-hold strategy during volatile periods.
Final Thoughts
This project taught me that in financial ML, architecture < data < problem formulation.
The Transformer didn't save me. The Conformal Prediction—the ability to say "I don't know"—did.
If you want to read the full 20-page report with all the math and backtests, you can grab the PDF below.