SmartSignal
A leakage-aware stock-movement forecasting pipeline with engineered market features, sentiment scoring, and chronological validation.
01 / Overview
SmartSignal is an end-to-end machine learning research pipeline for predicting whether a stock's next closing price will move up or down. The project combines technical indicators, volume behavior, and daily news sentiment inside a leakage-aware Random Forest workflow. The project is for research and education only and is not investment advice.
02 / Problem
Stock-direction prediction is easy to overstate if data are randomly shuffled, future information leaks into features, or performance is not compared against a simple baseline. SmartSignal was built to test the full forecasting workflow more honestly by preserving time order, using chronological validation, comparing against a persistence baseline, and clearly separating demo results from live-market claims.
03 / What I built
- Built an end-to-end Python forecasting pipeline for next-day stock direction classification.
- Implemented automated OHLCV ingestion from Yahoo Finance and support for local CSV datasets.
- Engineered 26 momentum, trend, volatility, volume, calendar, and sentiment features.
- Designed the target so each row predicts whether close[t + 1] is greater than close[t], while each predictor only uses information available at or before the close of day t.
- Implemented leakage-aware chronological validation with a newest-20% final holdout and five expanding-window validation folds inside the older training period.
- Trained and evaluated a Random Forest classifier against a naive persistence baseline.
- Added accuracy, precision, recall, F1, ROC AUC, Brier score, confusion matrix, feature importance, sentiment ablation, and illustrative strategy diagnostics.
- Built a lightweight finance-headline sentiment scoring prototype with negation handling.
- Saved model artifacts, metrics, prediction history, feature importance, and latest next-day signal outputs.
- Created a command-line interface for demo runs, ticker fetching, CSV training, headline scoring, and sentiment-enhanced training.
- Built an interactive Streamlit and Plotly dashboard to communicate model accuracy, baseline lift, ROC AUC, prediction confidence, equity curves, feature importance, and latest signal outputs.
- Added automated tests, Ruff linting, packaging, and GitHub Actions CI.
04 / Key results
- On a deterministic market-like simulation, SmartSignal achieved 63.3% five-fold walk-forward accuracy.
- The final chronological holdout reached 66.9% accuracy versus a 50.3% persistence baseline, with 0.687 ROC AUC and a +16.6 percentage-point lift.
- A sentiment ablation on the same untouched holdout showed 60.7% accuracy using technical indicators only and 66.9% accuracy using technical indicators plus sentiment.
- These results are from generated market-like data and do not represent guaranteed performance on live securities.
05 / Technical focus
This project demonstrates practical ML engineering for time-series classification: leakage-aware target construction, chronological validation, baseline comparison, feature engineering, sentiment ablation, artifact generation, CLI design, automated testing, CI, and dashboard-based model communication.
06 / Tech stack