Live odds from The Odds API across 12 retail + sharp books (DraftKings, FanDuel, BetMGM, Caesars, Circa, Pinnacle, plus regional/offshore). Injury feeds from team beat writers + official sources. Weather feeds for outdoor games. Line movement history with timestamps down to the minute.
- ▸12-book odds scraper running every 5 min
- ▸Injury + lineup feeds polled every 15 min on game days
- ▸Weather (wind + precip) for outdoor sports every 30 min
- ▸Historical odds + results archive (5 seasons per major league)
Feature engineering
STAGE 2
Raw data becomes ~80 features per game per sport. Pace-adjusted efficiency, rest differential, home/road splits, usage rates, recent form weighted exponentially by recency, schedule density, opponent-adjusted metrics. Feature set is versioned and reproducible.
- ▸80+ per-game features across NFL / NBA / MLB / NHL / NCAAF / NCAAB
- ▸Pace-adjusted (not raw) efficiency numbers per team
- ▸Recency-weighted recent form (last 5 games > last 10 games > last 20)
- ▸Opponent-adjusted variance, not just averages
XGBoost ensemble scoring
STAGE 3
Ensemble of gradient-boosted decision trees per sport, trained on 3+ seasons of historical data with walk-forward cross-validation (never train on future, never leak results). Model outputs a win probability and confidence tier for each market per game. Sub-100ms inference.
- ▸Per-sport ensemble (not one-size-fits-all across all leagues)
- ▸Walk-forward CV prevents data leakage
- ▸Out-of-sample test on last season held out at training
- ▸Inference served via lightweight FastAPI, <100ms per prediction
SHAP explainability
SHIPPED 2026-04-22STAGE 4
Every prediction decomposed into per-feature SHAP values. Canonical schema stored in picks.shap_top JSONB: { base_prob, model_prob, features[] }. One explainer module renders the Discord embed, the /record Recent Edges panel, and the per-pick /record/[id] permalink page — no drift between surfaces. Falls back to legacy why_* prose when shap_top is null so historical picks stay readable.
- ▸Shipped: canonical picks.shap_top schema + shared explainer (bot + web)
- ▸Shipped: top drivers in signed basis points, color-coded by sign
- ▸Shipped: Recent Edges panel on /record with tooltip details per driver
- ▸Shipped: /record/[id] permalink page + per-pick OG card + /api/record/[id] JSON
Closing Line Value tracker
SHIPPED 2026-04-22STAGE 5
Every posted pick gets its closing line captured 2 min after kickoff by an in-process tick that reads the odds-feed cache. CLV is then computed as (decimal-at-post / decimal-at-close - 1) and aggregated on /record across 7d / 30d / 90d / all-time windows. Migration 0020 added event_id + commence_time + sport_key to picks so rows can be matched against the odds feed by identity, not fuzzy game-name parsing.
- ▸Shipped: migration 0020 adds linkage columns + partial capture-pending index
- ▸Shipped: src/bot/services/closing-line-capture.js hourly (2-min) tick
- ▸Shipped: aggregated CLV % on /record + per-pick CLV on /record/[id]
- ▸Shipped: weekly_snapshots cron freezes CLV alongside ROI/yield/hit-rate
Daily retrain feedback loop
STAGE 6
Settled results feed into the training set. Model retrains on a rolling window (weekly full retrain, daily incremental for fresh data) so tomorrow's predictions are informed by yesterday's results. Over 90 days the model measurably sharpens against the closing line.
- ▸Weekly full retrain with updated feature importances
- ▸Daily incremental fine-tune with the most recent game nights
- ▸A/B shadow models evaluated before any live swap
- ▸Rollback automatic if live CLV drops 0.5+ points vs prior version over 50 picks