# Statistical Edge Analysis

## Dataset
- **293 entry signals** across **12 sessions** (~32 hours of live market data)
- 7 exchanges: Binance, Bybit, OKX, Coinbase, Kraken, Hyperliquid, dYdX
- 3.56M trades per session average, 53k+ orderbook snapshots per session
- All signals passed `shouldEnter()` (absorption or momentum criteria met)
- Forward returns measured at 60s, 120s, 300s, 600s, 1200s horizons
- Welch's t-test used throughout; significance at |t| > 2.0

## Aggregate Edge

| Horizon | Mean return | t-stat | Best excursion |
|---------|-----------|--------|----------------|
| 60s     | +2.33 bps | 2.91   | 20.9 bps       |
| 120s    | +2.47 bps | 2.58   | 24.7 bps       |
| 300s    | +5.77 bps | **5.38** | 31.5 bps     |
| 600s    | +6.62 bps | 4.61   | 38.6 bps       |
| 1200s   | +2.95 bps | 1.64   | 46.9 bps       |

Edge is real (t=4.61 at 10min). Peaks at **5 minutes** (t=5.38), then decays.
At 20 minutes the edge is gone (t=1.64). This constrains optimal hold time.

## Signal Type: Absorption vs Momentum

| Type | n | 10min mean | t-stat | Best | Worst | R:R |
|------|---|-----------|--------|------|-------|-----|
| ABS  | 74 | +6.11 bps | 2.22 | 55.0 | -30.3 | 1.82 |
| MOM  | 219 | +6.79 bps | 4.03 | 33.0 | -29.8 | 1.11 |

Both are significant. Absorption has better per-signal reward:risk (1.82 vs 1.11)
but fewer opportunities. Momentum has more statistical power from larger sample.
**Decision: keep both entry types.**

## Delta Magnitude

| |delta| band | n | 10min mean | t-stat | Win >10bps |
|-------------|---|-----------|--------|------------|
| 0.25–0.35   | 12 | -6.86 bps | -0.53 | 75% |
| 0.35–0.45   | 35 | +5.53 bps | 1.51  | 77% |
| 0.45–0.55   | 103 | **+9.30 bps** | **3.28** | 82% |
| 0.55–0.65   | 65 | **+11.71 bps** | **4.10** | 83% |
| 0.65+        | 78 | +1.40 bps | 0.93  | 69% |

Clear monotonic rise from 0.25 to 0.65, then **collapse** above 0.65.

The edge at |delta| > 0.65 (t=0.93, n=78) is indistinguishable from noise.
Likely cause: extreme delta = single exchange dominating flow = noise, not
cross-exchange consensus.

**Action: cap |delta| at 0.65 for momentum entries.** This removes 78 signals
with no edge. Absorption entries already require lower delta (0.30), and the
absorption-specific delta range (0.25–0.50) overlaps the productive zone.

## Realized Volatility (bimodal)

| rv band | n | 10min mean | t-stat | Win >10bps |
|---------|---|-----------|--------|------------|
| 2–3     | 13 | +1.11 bps | 0.74  | 69% |
| **3–4** | 131 | **+9.45 bps** | **3.99** | 68% |
| 4–5     | 32 | -2.31 bps | -0.42 | 72% |
| **5–7** | 104 | **+8.55 bps** | **4.56** | 93% |
| 7+      | 13 | -9.93 bps | -1.74 | 77% |

Edge is **bimodal**: present at rv 3–4 and rv 5–7, absent at rv 4–5, and
**negative** above rv 7.

- rv 3–4: Moderate vol, directional moves. Highest mean (+9.45bps).
- rv 4–5: Transition zone. No edge — vol rising but not yet decisive.
- rv 5–7: High vol, strong moves. Highest win rate (93%).
- rv 7+: Extreme vol. Negative expectancy — two-way chop.

The rv≥5 gate captures the 5–7 sweet spot and blocks the 4–5 dead zone.
It also blocks the 3–4 peak (t=3.99, n=131). A bimodal gate
`(rv 3–4) OR (rv 5–7)` would capture both peaks, but this is getting close
to overfitting with only 32 observations in the dead zone.

**Action: keep rv≥5 for now, add rv cap at 7.** The rv 7+ band has negative
expectancy across 13 observations — small sample but the mechanism is clear
(extreme chop).

## Range Regime

| Range band | n | 10min mean | t-stat | Win >10bps |
|-----------|---|-----------|--------|------------|
| 0–20 bps  | 57 | +1.92 bps | 1.17  | 53% |
| **20–40 bps** | 107 | **+10.27 bps** | **3.49** | 83% |
| **40–60 bps** | 62 | **+14.40 bps** | **5.23** | 81% |
| 60–100 bps | 39 | -4.78 bps | -1.29 | 87% |
| 100+ bps   | 22 | -0.67 bps | -0.16 | 86% |

Range 20–60 bps is the edge zone. Below 20 bps: targets too tight, noise
dominates. Above 60 bps: chop too violent, mean-reversion too fast.

Note the >60 bps bands show high win >10bps rate but negative close-to-close.
This means price reaches 10bps in your direction but then reverses past your
entry. The "best excursion" looks good but the close is negative — classic chop.

**Action: raise DEAD_RANGE from 10 to 20 bps. Add MAX_RANGE cap at 60 bps.**

## Surge Factor

| Surge band | n | 10min mean | t-stat | Win >10bps |
|-----------|---|-----------|--------|------------|
| 2–3       | 34 | +4.43 bps | 1.33  | 85% |
| 3–4       | 187 | +5.65 bps | 3.37 | 78% |
| 4–5       | 28 | +10.44 bps | 1.43 | 86% |
| 5+        | 29 | +12.60 bps | 2.41 | 72% |

All bands above 2 show positive edge. No clear discrimination point.
The current thresholds (2.5 for ABS, 3.0 for MOM) are well-placed.
**No change needed.**

## Score Magnitude

| |score| band | n | 10min mean | t-stat | Win >10bps |
|-------------|---|-----------|--------|------------|
| 0.3–0.5     | 59 | +7.48 bps | 3.12  | 90% |
| 0.5–0.6     | 22 | +0.83 bps | 0.36  | 68% |
| 0.6–0.7     | 48 | +8.58 bps | 2.65  | 71% |
| 0.7–0.8     | 34 | +15.18 bps | 2.59 | 68% |
| 0.8+        | 115 | +4.70 bps | 2.07 | 79% |

Scores 0.3–0.5 (mostly absorption signals, which have lower composite scores
but delta-price divergence) have the highest win rate (90%). This confirms
absorption's edge. Score 0.7–0.8 has the highest mean (+15.18bps).
No clear threshold to add. **No change needed.**

## Serial Correlation

| Context | n | 5min mean | t-stat |
|---------|---|----------|--------|
| After previous win  | 198 | +7.63 bps | **5.40** |
| After previous loss | 84 | +1.47 bps | 0.92 |

Strong clustering of wins. After a losing signal, the next signal has
no edge (t=0.92). After a winning signal, the next signal has strong
edge (t=5.40).

**This validates the conviction-weighted sizing.** The streak multiplier
correctly sizes up after wins and down after losses.

## Optimal Hold Time

Edge peaks at 5 minutes (t=5.38) and is gone by 20 minutes (t=1.64).
Current `maxHold = 120s` (2 minutes) is **too short** — it exits before
the edge fully develops. The trailing stop (5bps activation, 45% distance)
partially compensates by letting winners run, but the time exit cuts
positions that haven't yet reached trail activation.

**Action: increase maxHold from 120s to 300s.**

## Fill Probability Gate (backtest artifact)

All 293 signals "fill" in the backtest's fill probability simulation.
The 10-second bar granularity is too coarse — the bar's high/low always
touches the entry price within 3 bars (30 seconds). The 35% fill probability
parameter in MAKER_OPT is effectively 100%.

This means the backtest's +$187 result assumes every signal gets a limit fill,
which will not happen in live trading. The real fill rate on demo/live will
be the true test.

**Action: the 1-hour demo trading run will benchmark actual fill rates.**

## Summary of Changes

| Parameter | Before | After | Evidence |
|-----------|--------|-------|----------|
| DEAD_RANGE | 10 bps | 20 bps | t=1.17 below 20bps, n=57 |
| MAX_RANGE (new) | none | 60 bps | t=-1.29 above 60bps, n=39 |
| |delta| cap (MOM) | none | 0.65 | t=0.93 above 0.65, n=78 |
| rv cap (new) | none | 7 bps | t=-1.74 above 7, n=13 |
| maxHoldMs | 120s | 300s | Peak t=5.38 at 300s horizon |
| shouldEnter ABS delta | 0.30 | 0.30 | No change (works in 0.25–0.50 zone) |
| shouldEnter MOM delta | 0.40 | 0.45 | Borderline at 0.35–0.45, solid at 0.45+ |

## Strategy-Level Ablation Check

The raw signal analysis above identifies feature-level edges, but those do not
necessarily improve the **portfolio** once sizing, fees, cooldowns, and target
logic interact. A multi-session ablation study was run on the full strategy.

| Variant | Return | Max DD | Trades | WR | Fees |
|---------|--------|--------|--------|----|------|
| Baseline MAKER_OPT | +$187.80 | 1.55% | 29 | 96.5% | $103.85 |
| Delta cap (<=0.65) | +$128.75 | 0.16% | 21 | 100.0% | $70.57 |
| Range 20–60 only | +$133.90 | 1.35% | 17 | 76.5% | $50.09 |
| **rv cap at 7** | **+$199.56** | 1.56% | 22 | 95.5% | $76.44 |
| maxHold = 300s | +$180.10 | 1.55% | 29 | 96.5% | $104.02 |
| Combined changes | +$54.21 | 1.35% | 13 | 76.9% | $34.60 |

### Interpretation

Only **rv cap at 7** survives portfolio-level testing. The other changes looked
promising in raw signal analysis but hurt the integrated strategy after fees,
position sizing, cooldowns, and exits were applied.

This is exactly why we do not ship feature-level findings directly into the
strategy without full backtest validation.

### Final production choice

Keep only:
- `rv >= 5`
- `rv <= 7`

Do **not** change:
- dead range (keep 10bps)
- delta thresholds (keep ABS 0.30+, MOM 0.40+)
- maxHold (keep 120s)

This preserves the strong baseline and improves aggregate return from +$187.80
→ **+$199.56** without increasing drawdown.

## Critical Bug: rv Measurement Mismatch (found 2026-04-08)

### The Problem

The backtest and live runner computed realized volatility from **different data sources**
using the **same threshold** (5.0 bps):

| | Backtest | Live runner |
|---|---|---|
| **rv source** | Multi-exchange 10s bar closes (7 venues) | Bybit OB mid (1 venue) |
| **mean rv** | 4.60 bps | 2.12 bps |
| **rv ≥ 5 frequency** | 34.2% of time | 4.6% of time |

Multi-exchange bar closes have ~2.9x higher variance than single-venue OB mid,
because cross-exchange price dispersion (Binance 68100, Coinbase 68108, OKX 68095)
inflates the return series stdev.

### Consequence

The backtest's rv≥5 gate meant "tradeable 34% of the time."
The live runner's rv≥5 gate meant "tradeable 4.6% of the time."
**8 hours of live demo trading produced ZERO eligible signals.**

### The Fix

Live runner now computes rv from `computeRealizedVolFromBars()` using the
multi-exchange `BarAggregator` bar closes — identical to the backtest.
Same data source, same threshold, same behavior.

### Equivalent thresholds (for reference)

| Bar-close rv | OB-mid rv equivalent | Tradeable % |
|---|---|---|
| 3.0 | 1.03 | 95.6% |
| 5.0 | 2.21 | 34.2% |
| 7.0 | 3.57 | ~10% |