Skip to content
ST

SmartSignal

A leakage-aware stock-movement forecasting pipeline with engineered market features, sentiment scoring, and chronological validation.

01 / Overview

SmartSignal is an end-to-end machine learning research pipeline for predicting whether a stock's next closing price will move up or down. The project combines technical indicators, volume behavior, and daily news sentiment inside a leakage-aware Random Forest workflow. The project is for research and education only and is not investment advice.

02 / Problem

Stock-direction prediction is easy to overstate if data are randomly shuffled, future information leaks into features, or performance is not compared against a simple baseline. SmartSignal was built to test the full forecasting workflow more honestly by preserving time order, using chronological validation, comparing against a persistence baseline, and clearly separating demo results from live-market claims.

03 / What I built

  • Built an end-to-end Python forecasting pipeline for next-day stock direction classification.
  • Implemented automated OHLCV ingestion from Yahoo Finance and support for local CSV datasets.
  • Engineered 26 momentum, trend, volatility, volume, calendar, and sentiment features.
  • Designed the target so each row predicts whether close[t + 1] is greater than close[t], while each predictor only uses information available at or before the close of day t.
  • Implemented leakage-aware chronological validation with a newest-20% final holdout and five expanding-window validation folds inside the older training period.
  • Trained and evaluated a Random Forest classifier against a naive persistence baseline.
  • Added accuracy, precision, recall, F1, ROC AUC, Brier score, confusion matrix, feature importance, sentiment ablation, and illustrative strategy diagnostics.
  • Built a lightweight finance-headline sentiment scoring prototype with negation handling.
  • Saved model artifacts, metrics, prediction history, feature importance, and latest next-day signal outputs.
  • Created a command-line interface for demo runs, ticker fetching, CSV training, headline scoring, and sentiment-enhanced training.
  • Built an interactive Streamlit and Plotly dashboard to communicate model accuracy, baseline lift, ROC AUC, prediction confidence, equity curves, feature importance, and latest signal outputs.
  • Added automated tests, Ruff linting, packaging, and GitHub Actions CI.

04 / Key results

  • On a deterministic market-like simulation, SmartSignal achieved 63.3% five-fold walk-forward accuracy.
  • The final chronological holdout reached 66.9% accuracy versus a 50.3% persistence baseline, with 0.687 ROC AUC and a +16.6 percentage-point lift.
  • A sentiment ablation on the same untouched holdout showed 60.7% accuracy using technical indicators only and 66.9% accuracy using technical indicators plus sentiment.
  • These results are from generated market-like data and do not represent guaranteed performance on live securities.

05 / Technical focus

This project demonstrates practical ML engineering for time-series classification: leakage-aware target construction, chronological validation, baseline comparison, feature engineering, sentiment ablation, artifact generation, CLI design, automated testing, CI, and dashboard-based model communication.

06 / Tech stack

PythonPandasNumPyscikit-learnRandom ForestyfinanceStreamlitPlotlyjoblibpytestRuffGitGitHub ActionsTime-series validationChronological holdoutExpanding-window validationFeature engineeringNLP sentiment scoringModel persistencePrediction artifacts