# Live Run Analysis — 2026-04-07

## Summary

Two live demo runs were conducted on 2026-04-07, totalling ~5h45m of live market data.
Both runs executed simulated fills against the Bybit L2 orderbook while recording
3.5M+ trades from 7 exchanges and 53k+ orderbook snapshots.

### Run 1: 16:01–17:44 UTC (1h43m)

| Metric        | Value                      |
|---------------|----------------------------|
| BTC range     | 68,189 → 68,650 (+68bps)  |
| Starting eq.  | $500.00                    |
| Final eq.     | $474.56                    |
| Return        | **-$25.44 (-5.1%)**        |
| Trades        | 7 (2 wins, 5 SL)          |
| Fees          | $18.48                     |
| Max drawdown  | 5.1%                       |
| rv threshold  | 0.5 bps (too loose)        |

### Run 2: 17:56–21:56 UTC (4h00m)

| Metric        | Value                      |
|---------------|----------------------------|
| BTC range     | 68,694 → 69,932 (+180bps) |
| Starting eq.  | $500.00                    |
| Final eq.     | $459.03                    |
| Return        | **-$40.97 (-8.2%)**        |
| Trades        | 11 (4 wins, 7 SL)         |
| Fees          | $26.24                     |
| Max drawdown  | 8.2%                       |
| rv threshold  | 3.0 bps                    |

---

## Root Cause Analysis

### Problem 1: All trades were SHORT in a strong uptrend

Both runs took place during a sustained BTC uptrend (+68bps in run 1, +180bps in
run 2). Every single trade — 18 in total — was a SHORT.

**Why:** The absorption signal detects when delta opposes price direction:
- BTC rising + negative delta = "sellers absorbing buying pressure"
- The system reads this as "go short" (side with delta)

But in an uptrend, sell-delta being absorbed by rising price means the **trend is
absorbing selling**. The correct interpretation would be to trade WITH the trend
(long), not against it.

### Problem 2: Volatility gate was miscalibrated

**Run 1 (rv threshold 0.5bps):** Let 100% of signals through. Took 7 trades in
low-vol normal regime (rv ~1-2bps). Most stopped out on noise.

**Run 2 (rv threshold 3.0bps):** Better filtering — blocked 253 low-vol signals.
But still allowed entry in "volatile" regime during the trending market. The 3.0bps
threshold was passed ~50% of the time.

For reference, the backtest uses 5.0bps on 10-second bars. At the OB-sampled scale:
- rv < 2.0: quiet, ~50% of time → should never trade
- rv 2.0–3.0: mild activity, ~33% → marginal
- rv 3.0–5.0: active, ~9% → selective
- rv > 5.0: volatile, ~8% → best conditions

### Problem 3: Normal regime stops too tight

5 of 7 losses in run 1 and 4 of 7 losses in run 2 were normal-regime trades with
8–9.5bps stops. In a trending market, these get clipped by noise immediately.
Average hold time of SL exits: 23–80 seconds.

The one volatile-regime trade in run 1 (16:27, 40bps range, 16bps stop) held for
30 minutes and exited profitably via time exit.

### Problem 4: Missed the best opportunity

At 17:36–17:40 in run 1, the range spiked to 80bps and rv hit 6.7bps — exactly
the volatile conditions where the backtest shows edge. But the system was in a
direction cooldown after 2 consecutive short losses. By the time cooldown expired,
the move was over.

---

## What Worked

1. **Conviction sizing** — After 2 consecutive losses, position size dropped from
   0.067btc to 0.012btc (conv 1.56 → 0.35). This limited damage.

2. **Trailing stop** — Captured the best trade in run 2 (trade 8): +$4.09 on TP1,
   then +$1.35 on the runner via trailing stop at 63.6bps peak.

3. **Time exit** — Saved trade 1 in both runs by closing after 30min with small profit
   instead of letting the position reverse.

4. **Guard system** — Direction cooldowns prevented unlimited losses in one direction.
   3 cooldowns triggered across both runs.

---

## Sim vs Backtest Divergence

At that time, the old live runner and the backtest used **different strategy logic**:

| Aspect                | Backtest (MAKER_OPT)              | Sim-runner (live)                  |
|-----------------------|------------------------------------|------------------------------------|
| rv threshold          | 5.0 bps (10s bars)                | 3.0 bps (10s sampled OB)          |
| Dead range            | 10 bps                            | 10 bps ✓                          |
| Entry signal          | Composite (absorption + momentum) | Actionables pipeline               |
| Fill simulation       | 35% fill prob, chase 3x           | Market fill against L2 book        |
| Trailing activation   | 5 bps                             | 5 bps ✓                           |
| Max hold              | 120s                              | 120s (was 1800s in run 1/2)       |
| Fees                  | 2 bps maker                       | 5.5 bps taker                     |
| Coinbase weight       | 1.2x                              | 0.85x via config                  |

**Key difference**: The backtest's MAKER_OPT would have taken **0 trades** in run 1
and **0–2 trades** in run 2 at the 5.0bps rv threshold. The old live runner's 3.0bps
threshold allowed 11 trades through — most of which were losers.

---

## Architecture Fix Applied

After these runs, the following was refactored:

1. **`src/core/strategy.ts`** — Single source of truth for:
   - All strategy constants (capital, leverage, fees, cooldowns, rv threshold)
   - `ObRangeTracker` and `RealizedVolTracker`
   - `computeAdaptiveTargets`, `computeConvictionMultiplier`, `computePositionSize`
   - `detectAbsorption` utility
   - `SimStateBroadcast` IPC message shape

2. **`src/runners/replay.ts`** — Replay runner using the shared engine

3. **`src/research/compare-profiles.ts`** — Batch profile comparison using the shared engine

4. **`src/runners/dashboard.ts`** — Receives `engine_state` IPC messages and renders real
   equity, positions, and closed trades from the active runner instead of running
   its own independent paper-trade simulation

---

## Recorded Sessions

Both runs recorded data for future backtest comparison:

- **Run 1**: `data/sessions/2026-04-07-15-55-34/` (1h43m, ~1.5M trades)
- **Run 2**: `data/sessions/2026-04-07-17-56-12/` (4h00m, 3.56M trades, 53.7k OB)

---

## Next Steps

1. **Run backtest on latest session** — Compare TAKER vs MAKER_OPT vs MAKER_FAST
   on the newly recorded data to validate backtest would have performed better.

2. **Unify rv gate** — Make the live-facing runner use the same 5.0bps threshold as MAKER_OPT
   backtest, now that the measurement scale is aligned (10s sampled).

3. **Trend context for absorption** — The absorption signal needs to consider the
   medium-term trend direction to avoid trading against a strong trend.

4. **Switch to maker execution** — Build the demo execution runner that places
   actual PostOnly limit orders on `api-demo.bybit.com` using `DemoExecutor`.

---

## Backtest Comparison (run after both live sessions)

Backtest run on all 12 valid sessions (including both live runs):

### Session 2026-04-07-17-56-12 (same data as live run 2)

|                | **LIVE DEMO/SIM** | TAKER bt | MAKER_OPT bt | MAKER_FAST bt |
|----------------|-------------|----------|--------------|---------------|
| Return         | -$40.97     | +$26.20  | +$7.76       | +$0.22        |
| Trades         | 11          | 9        | 2            | 3             |
| Win rate       | 36%         | 56%      | 100%         | 100%          |
| Max DD         | 8.2%        | 1.48%    | 0%           | 0.29%         |
| Fees           | $26.24      | $36.59   | $6.20        | $14.65        |

**Key**: MAKER_OPT took 2 trades where the live sim took 11. Both were winners.
The rv gate (5bps in backtest) and maker fill probability (35%) filtered out
all the noise trades that caused losses live.

### Aggregate (12 sessions, ~32 hours)

| Profile    | Return   | Max DD | Trades | Win Rate | Fees    |
|------------|----------|--------|--------|----------|---------|
| TAKER      | +$111.52 | 4.85%  | 62     | 53.2%    | $266.62 |
| MAKER_OPT  | +$187.80 | 1.55%  | 29     | 96.6%    | $103.85 |
| MAKER_FAST | +$47.42  | 1.33%  | 34     | 76.5%    | $174.09 |

MAKER_OPT is the clear winner: +68% more return than TAKER, with 53% fewer
trades, 3x lower max drawdown, and near-perfect 96.6% win rate.

### Root cause of live vs backtest divergence

This section is historical.

At the time of these runs, the live path still used a different entry pipeline
than the replay/backtest path. That discrepancy has since been removed.

The current codebase now routes replay and live-facing runners through the same
`StrategyEngine`, with only data injection and fill providers differing by mode.
---

## Deep Statistical Analysis (293 signals, 12 sessions)

### Method
Every `shouldEnter()` signal across all 12 sessions was collected with forward
returns at 60s, 120s, 300s, 600s, and 1200s horizons. Each observation includes
signal features (delta, surge, score, absorption, rv, range) and outcome (best
excursion, worst excursion, close-to-close). Welch's t-test used throughout.

### Key Findings

| Feature | Finding | t-stat | n | Actionable? |
|---------|---------|--------|---|-------------|
| |delta| 0.45-0.65 | Strongest edge zone | 3.28–4.10 | 168 | Cap MOM at 0.65 |
| |delta| > 0.65 | Edge collapses | 0.93 | 78 | Yes — remove |
| Range 20-60bps | Sweet spot | 3.49–5.23 | 169 | Raise dead range to 20 |
| Range > 60bps | Negative expectancy | -1.29 | 39 | Cap range at 60 |
| rv 5-7 | Clean edge | 4.56 | 104 | Keep rv≥5 |
| rv 3-4 | Also has edge | 3.99 | 131 | Interesting but bimodal |
| rv 4-5 | Dead zone | -0.42 | 32 | Avoid |
| rv > 7 | Negative | -1.74 | 13 | Cap at 7 |
| Horizon 300s | Peak t-stat | 5.38 | 293 | maxHold→300s |
| ABS vs MOM | Both significant | 2.22 / 4.03 | 74/219 | Keep both |
| Serial corr | Wins cluster | 5.40 vs 0.92 | 198/84 | Conviction sizing justified |
| Fill gate | Not filtering (bar too coarse) | — | 293/293 | Need tick-level sim |

### Bimodal RV Structure
```
rv 2-3:  t=0.74  (noise)
rv 3-4:  t=3.99  (edge) ← blocked by rv≥5
rv 4-5:  t=-0.42 (dead) ← would be included by rv≥3
rv 5-7:  t=4.56  (edge) ← captured by rv≥5
rv 7+:   t=-1.74 (negative)
```
The edge is bimodal at rv 3-4 and rv 5-7. The rv≥5 gate captures one peak
and blocks the dead zone (4-5). Lowering to rv≥3 would add the 3-4 peak
but also the 4-5 dead zone. A smarter gate would be `(rv≥3 AND rv<5) OR (rv≥5 AND rv<7)`,
but that's getting into overfit territory without more data.

### Fill Probability Is Illusory
All 293 signals "fill" in the backtest because the 10-second bar high/low
always touches the entry price within 3 bars. The 35% fill probability in
MAKER_OPT is being simulated too coarsely. Real limit order fills will be
much harder. This means the backtest's +$187 likely overstates live performance.