/ METHODOLOGY

How the AI actually generates picks

We get asked "what's the AI" a lot. Here's the full six-stage pipeline, with no hand-waving. If we don't explain the math, you shouldn't trust the picks.

Launch status (2026-04-22): Stages 1, 2, 4, and 5 shipped. Stage 4 (SHAP) wired end-to-end across Discord embed, /record web panel, and per-pick permalinks. Stage 5 (CLV) populating picks.closing_odds via a live capture service. Stage 3 (production XGBoost retrain) and Stage 6 (retrain loop) remain in progress — the pipeline stub emits the canonical SHAP schema so real model output will drop in without any surface changes.

Data intake

STAGE 1

Live odds from The Odds API across 12 retail + sharp books (DraftKings, FanDuel, BetMGM, Caesars, Circa, Pinnacle, plus regional/offshore). Injury feeds from team beat writers + official sources. Weather feeds for outdoor games. Line movement history with timestamps down to the minute.

▸12-book odds scraper running every 5 min
▸Injury + lineup feeds polled every 15 min on game days
▸Weather (wind + precip) for outdoor sports every 30 min
▸Historical odds + results archive (5 seasons per major league)

Feature engineering

STAGE 2

Raw data becomes ~80 features per game per sport. Pace-adjusted efficiency, rest differential, home/road splits, usage rates, recent form weighted exponentially by recency, schedule density, opponent-adjusted metrics. Feature set is versioned and reproducible.

▸80+ per-game features across NFL / NBA / MLB / NHL / NCAAF / NCAAB
▸Pace-adjusted (not raw) efficiency numbers per team
▸Recency-weighted recent form (last 5 games > last 10 games > last 20)
▸Opponent-adjusted variance, not just averages

XGBoost ensemble scoring

STAGE 3

Ensemble of gradient-boosted decision trees per sport, trained on 3+ seasons of historical data with walk-forward cross-validation (never train on future, never leak results). Model outputs a win probability and confidence tier for each market per game. Sub-100ms inference.

▸Per-sport ensemble (not one-size-fits-all across all leagues)
▸Walk-forward CV prevents data leakage
▸Out-of-sample test on last season held out at training
▸Inference served via lightweight FastAPI, <100ms per prediction

SHAP explainability

SHIPPED 2026-04-22STAGE 4

Every prediction decomposed into per-feature SHAP values. Canonical schema stored in picks.shap_top JSONB: { base_prob, model_prob, features[] }. One explainer module renders the Discord embed, the /record Recent Edges panel, and the per-pick /record/[id] permalink page — no drift between surfaces. Falls back to legacy why_* prose when shap_top is null so historical picks stay readable.

▸Shipped: canonical picks.shap_top schema + shared explainer (bot + web)
▸Shipped: top drivers in signed basis points, color-coded by sign
▸Shipped: Recent Edges panel on /record with tooltip details per driver
▸Shipped: /record/[id] permalink page + per-pick OG card + /api/record/[id] JSON

Closing Line Value tracker

SHIPPED 2026-04-22STAGE 5

Every posted pick gets its closing line captured 2 min after kickoff by an in-process tick that reads the odds-feed cache. CLV is then computed as (decimal-at-post / decimal-at-close - 1) and aggregated on /record across 7d / 30d / 90d / all-time windows. Migration 0020 added event_id + commence_time + sport_key to picks so rows can be matched against the odds feed by identity, not fuzzy game-name parsing.

▸Shipped: migration 0020 adds linkage columns + partial capture-pending index
▸Shipped: src/bot/services/closing-line-capture.js hourly (2-min) tick
▸Shipped: aggregated CLV % on /record + per-pick CLV on /record/[id]
▸Shipped: weekly_snapshots cron freezes CLV alongside ROI/yield/hit-rate

Daily retrain feedback loop

STAGE 6

Settled results feed into the training set. Model retrains on a rolling window (weekly full retrain, daily incremental for fresh data) so tomorrow's predictions are informed by yesterday's results. Over 90 days the model measurably sharpens against the closing line.

▸Weekly full retrain with updated feature importances
▸Daily incremental fine-tune with the most recent game nights
▸A/B shadow models evaluated before any live swap
▸Rollback automatic if live CLV drops 0.5+ points vs prior version over 50 picks

/ RECEIPTS — the methodology isn't marketing, it's verifiable

Every claim above is backed by a public surface you can inspect without an account. These are the artifacts:

picks table locked on insert via picks_immutable_guard trigger · weekly_snapshots append-only via weekly_snapshots_immutable_guard · closing line captured via src/bot/services/closing-line-capture.js