# txocap — 4-Year Research: Cross-Regime Validation & Signal Improvements

**Research period:** April 2026  
**Dataset:** 2,115,233 bars · Apr 2022 → Apr 2026 · Bybit BTCUSDT perpetual 1-minute  
**Relates to:** `docs/strategy-session-notes.md` (1-year foundational research)

---

## Table of Contents

1. [Why We Needed 4 Years](#1-why-we-needed-4-years)
2. [4-Year Backtest Results](#2-4-year-backtest-results)
3. [4-Year Parameter Sweep](#3-4-year-parameter-sweep)
4. [What Parameters Changed — and Why](#4-what-parameters-changed--and-why)
5. [Secondary Signal Research](#5-secondary-signal-research)
6. [Momentum Filter: Finding the Right Horizon](#6-momentum-filter-finding-the-right-horizon)
7. [The Blend Experiment](#7-the-blend-experiment)
8. [The Horizon Sweep](#8-the-horizon-sweep)
9. [Symmetry Analysis: Does the Filter Work Both Ways?](#9-symmetry-analysis-does-the-filter-work-both-ways)
10. [Final Verdict: 1-Year vs 4-Year Config](#10-final-verdict-1-year-vs-4-year-config)
11. [Decision](#11-decision)
12. [Appendix: Key Numbers](#appendix-key-numbers)

---

## 1. Why We Needed 4 Years

The original strategy research (see `strategy-session-notes.md`) was conducted on a single year of data: **Apr 2025 → Apr 2026**. On that window the SMOOTH config produced:

- t = 4.56\*\*\*\*, 13/13 positive months, moIR = 2.11, maxDD = 32.4%

That looked strong. But the sample included a selection bias we hadn't checked: the optimization window *happened to miss* two of the worst months in recent BTC history:

| Missed month | What happened | Impact |
|---|---|---|
| **Jan 2025** | BTC fell from $97k → $76k (−18% in 2 weeks) | −$267 fixed PnL |
| **Feb 2025** | BTC continued lower, choppy | −$140 fixed PnL |

These months were just outside the Apr 2025 start date. Both would have materially reduced the 1-year metrics. Additionally, 2022, 2023, and 2024 covered completely different market regimes:

- **2022**: Bear market, FTX collapse (Nov 2022 −40% in 2 weeks)
- **2023**: Sideways consolidation, BTC $16k → $45k with extended chop
- **2024**: Post-halving bull run, with significant Aug 2024 correction (−30% from peak)

A strategy tuned on one year of bullish-but-choppy 2025 data needed validation across all of these.

### Data acquisition

Downloaded via Bybit REST API in two batches:
- `data/klines/BTCUSDT-1m-2022-2025.jsonl` — Apr 2022 → Dec 22 2022 (382,000 bars; Bybit API limit hit)
- `data/klines/BTCUSDT-1m-2022-2025b.jsonl` — Dec 23 2022 → Apr 9 2025 (1,208,158 bars)
- `data/klines/BTCUSDT-1m.jsonl` — Apr 10 2025 → Apr 9 2026 (525,075 bars, existing)

One data gap: 17 hours on Dec 22-23 2022 (Bybit API dead zone around FTX aftermath). Noted, not filled.

Total: **2,115,233 bars** · 4.02 years.

---

## 2. 4-Year Backtest Results

Backtested the SMOOTH config (current runner) across all 4 years at MEXC 0% fee:

```
SMOOTH config: LE=10, SE=8, XT=10, maxH=240, cap=2, fee=0
```

### Aggregate (MEXC 0%)

| Metric | 1-year (Apr25–Apr26) | 4-year (Apr22–Apr26) |
|---|---:|---:|
| Trades | 1,055 | 4,417 |
| Mean/trade | +16.59 bps | +8.53 bps |
| **t-stat** | 4.56\*\*\*\* | **4.57\*\*\*\*** |
| WR | 51.3% | 45.3% |
| Monthly IR | 2.11 | **0.70** |
| Monthly SD | $96 | **$166** |
| Months positive | **13/13** | 37/49 (75.5%) |
| Max DD (compounded) | 32.4% | **74.3%** |

**Key observation:** The aggregate t-stat is nearly identical (4.56 vs 4.57) — the edge is real and consistent. But the monthly IR collapses from 2.11 → 0.70 and maxDD doubles. The 1-year period was genuinely smoother than the broader history, not just lucky.

### Annual breakdown

| Year | n | Mean | t-stat | WR | Months+ | Market regime |
|---|---:|---:|---:|---:|---:|---|
| 2022 | 841 | +7.07 bps | 1.53 | 43.6% | 8/9 | Bear market / FTX crash |
| 2023 | 1,109 | +5.95 bps | 1.80 | 45.3% | 7/12 | Sideways / slow recovery |
| 2024 | 1,119 | +6.89 bps | 1.78 | 42.5% | 8/12 | Bull run + Aug correction |
| **2025** | 1,064 | **+10.92 bps** | **3.03\*\*** | 48.3% | 10/12 | Volatile bull/consolidation |
| 2026 (partial) | 284 | +20.48 bps | 2.39\* | 50.0% | 4/4 | Recovery |

Every year has positive expectancy and t > 1.5. But 2022–2024 are materially weaker than 2025–2026. **2025 was the strongest year in the sample** — this is the year the strategy was optimized on. The apparent strength of the 1-year backtest is partly real edge, partly regime luck.

### Walk-forward (train 1yr → test next yr)

| Fold | Train t | Test t | OOS held? |
|---|---:|---:|---|
| 2022 → 2023 | 1.54 | 1.77 | ✓ |
| 2023 → 2024 | 1.89 | 1.68 | ✓ (−11%) |

Test t is within 10% of train t in both folds — edge generalises. The deterioration is modest and expected.

### Why 2023/2024 are harder years

The calendar signals (DOW, hour effects) work best when:
1. BTC is in a moderate volatility regime — trending enough for signals to develop, not so extreme that macro overrides everything
2. Options expiry cycles and institutional patterns are stable

In 2022 (FTX), 2024 August (rapid correction), and Jan 2025 (sharp sell-off), **macro events overwhelmed the calendar patterns**. The strategy still made money on balance, but those specific shock events created negative months.

---

## 3. 4-Year Parameter Sweep

Ran 720 configs on all 4 years, optimising for monthly IR (mean/SD of monthly returns):

```
longEntry  ∈ {10, 12, 15}
shortEntry ∈ {10, 12, 15, 18, 20}
extThresh  ∈ {8, 10, 12, 15}
maxHold    ∈ {120, 180, 240}
stopCap    ∈ {0, 1, 2, 3}
fee        = 0 (MEXC)
```

### Best configs by objective

| Objective | Best config | t | moIR | minHt | months+ | maxDD |
|---|---|---:|---:|---:|---:|---:|
| **Highest t** | LE12 SE12 XT15 h180 cap0 | **5.61\*\*\*\*** | 0.87 | **3.90** | 41/49 | 72.6% |
| **Best moIR** | LE12 SE10 XT15 h180 cap0 | 5.58\*\*\*\* | **0.90** | 3.85 | 39/49 | 71.3% |
| **Most months+** | LE10 SE10 XT8 h240 SS75 cap0 | 3.39\*\*\* | 0.87 | 2.82 | **43/49** | — |
| Best stability (minHt) | LE12 SE12 XT15 h180 cap0 | 5.61\*\*\*\* | 0.87 | **3.90** | 41/49 | 72.6% |

**"STABLE_4yr" config** (LE12 SE12 XT15 h180 cap0) wins on t-stat and stability simultaneously — both halves show high t:
- H1 (Apr 2022 – Oct 2024): t = **3.90\*\*\*\***
- H2 (Oct 2024 – Apr 2026): t = **3.85\*\*\*\***

Compare to 1-year SMOOTH: H1 = 3.23, H2 = 3.33 — both are actually lower individually.

### The stopCap reversal

On the 1-year dataset, `cap=2` was critical: monthly SD dropped from $129 → $96.  
On the 4-year dataset, `cap=0` is consistently better.

**Why:** The 1-year period (Apr 2025–Apr 2026) had specific stop-spiral events that the cap cleanly cut. Over 4 years with bull/bear/chop cycles, the cap occasionally blocks **good re-entries** after stops in trending markets — the signal genuinely fires again because the regime is still active, and blocking it costs more than the stop-spiral protection saves.

---

## 4. What Parameters Changed — and Why

Comparing 1-year SMOOTH vs 4-year STABLE_4yr:

| Parameter | SMOOTH (1yr) | STABLE_4yr (4yr) | Direction | Mechanism |
|---|---|---|---|---|
| `longEntry` | 10 | **12** | Higher | More selective entries in chop regimes |
| `shortEntry` | 8 | **12** | Higher | Removes weak short singletons at lower scores |
| `extThresh` | 10 | **15** | Higher | Don't extend through regime changes |
| `maxHold` | 240 min | **180 min** | Shorter | Less exposure per trade in trending markets |
| `stopCap` | 2/day | **0** | Removed | Cap blocks good re-entries across 4yr regimes |

### Each change explained

**longEntry 10 → 12**: Over 4 years there are many "noise" entries at scores 10-12 that worked fine in the gentle 2025 environment but struggled in 2022-2024 chop. Raising the threshold reduces these.

**shortEntry 8 → 12**: The 1-year research showed weak short singletons were negative on average. But on 1 year this was outweighed by the overall short edge. On 4 years, the junk short entries add variance without proportional return.

**extThresh 10 → 15**: Extensions re-use the entry hold calculation — they're free when the signal is strong. At extThresh=10, we extend when score is barely above the entry level. At extThresh=15, we only extend when the signal is clearly still active. In volatile regimes (2022, 2024), extending at score=10-15 means holding through a fading signal, which loses.

**maxHold 240 → 180**: Shorter holds mean the position closes sooner rather than sitting through adverse intraday moves. In 2025's trending environment, 240 min holds captured more profit. In 2022-2024 chop, they exposed the position to mean-reversion against the trade. 180 min is the 4-year optimal.

**stopCap 2 → 0**: See above. The cap was solving a 2025-specific problem.

---

## 5. Secondary Signal Research

After establishing the 4-year parameters, the next question was: **can we add secondary signals that confirm the calendar edge, particularly to avoid the macro-shock months?**

### Hypothesis

The calendar signals (DOW, hour effects) fire mechanically based on the day/time. They don't know whether BTC is in the middle of a crash. If a large macro event is underway, the calendar pattern may be overwhelmed.

The worst months in the 4-year data:
- **Nov 2022**: FTX collapse → −$371 (calendar said LONG during a crash)
- **Aug 2024**: BTC correction −30% from peak → −$252
- **Jan 2025**: BTC fell $97k → $76k → −$267

All three were **LONG-side losses** — the calendar fired BUY while BTC was in a confirmed macro downtrend.

### What we tested

For each of 5,970 trades on the STABLE_4yr config, recorded:
- Short-term momentum: 3h, 6h, 12h returns
- Medium-term momentum: 3-day, 7-day, 14-day returns
- Volatility: ATR(60)/ATR(240) ratio
- Price vs 24h/7d SMA
- RSI(240) — 4-hour RSI
- Signal composition: how many signals are voting

### Correlations with outcome

All per-trade linear correlations were essentially zero (r < 0.035). This means there is **no single number at bar-close that linearly predicts trade outcome**. The secondary signals only matter as **regime classifiers**, not per-trade predictors.

### The key findings

**1. Signal composition: singletons are weak**

| Signal count | n | Mean | t |
|---|---:|---:|---:|
| 1 signal (singleton) | 746 | +4.4 bps | 1.57 |
| 2 signals | 3,602 | +8.4 bps | 4.16\*\*\*\* |
| 3 signals | 1,399 | +8.8 bps | 3.02\*\* |
| 4+ signals | 223 | +14.9 bps | 2.00\* |

**DOW + at least one non-calendar signal** (BUYP, Rev24h, RSI, USgap): n=3,756, mean=+10.4 bps, t=4.83\*\*\*\*.

**2. Volatility context**

| ATR60/ATR240 ratio | Mean | t | Stop rate |
|---|---:|---:|---:|
| Compressing (<0.7) | +7.0 bps | 2.52\* | 17% |
| Normal (0.7–1.0) | **+9.6 bps** | **4.35\*\*\*\*** | 18% |
| Expanding (1.0–1.5) | +5.5 bps | 2.10\* | 24% |
| **Spike (>1.5)** | **−5.5 bps** | **−0.40** | **36%** |

Volatility spikes are clearly bad. But the good news: vol spikes are relatively rare.

**3. Medium-term momentum (the most actionable finding)**

| 14-day momentum vs trade direction | n | Mean | t |
|---|---:|---:|---:|
| Aligned >+1000 bps | 742 | +11.4 | 2.35\* |
| Aligned 300–1000 bps | 1,256 | +9.4 | 3.04\*\* |
| Neutral −300 to +300 | 1,920 | **+9.1** | **4.23\*\*\*\*** |
| Opposed 300–1000 bps | 1,253 | +6.7 | 2.09\* |
| **Opposed >1000 bps** | **799** | **+4.0** | **0.80** |

When the **14-day return is more than 1000 bps (10%) against the calendar direction**, the edge essentially vanishes (t=0.80). These are exactly the crash/spike events: FTX collapse, Aug 2024 correction, Jan 2025 drawdown.

### What the secondary signals can and cannot do

**Can do:**
- Identify regime-shock periods where calendar signals fail (macro overwhelms pattern)
- Improve monthly IR by reducing exposure during crash periods

**Cannot do:**
- Predict individual trade outcome (all per-trade correlations near zero)
- Eliminate all losing months (some losing months are just noisy, not regime-shock)

---

## 6. Momentum Filter: Finding the Right Horizon

### First test: 14-day filter (skip when opposed >1000 bps)

Tested "skip when 14-day momentum opposes by >1000 bps":

| Metric | STABLE_4yr | + 14d skip/1000 |
|---|---:|---:|
| t-stat | 5.61\*\*\*\* | 5.86\*\*\*\* |
| moIR | 0.87 | **0.90** |
| maxDD | 72.6% | **59.5%** |
| Months+ | 41/49 | **43/49** |

Better on every metric. But **14 days is too slow**. By the time 14 days of adverse momentum has accumulated, the crash is often nearly over. The filter triggers *after* the damage is already done for many bad months.

### Skip vs Penalty vs Flip

Also tested three mechanisms:
- **Skip**: completely ignore trades where momentum opposes
- **Penalty**: reduce the score when momentum opposes (lower conviction → shorter hold, or skip if score falls below threshold)
- **Flip**: when momentum strongly opposes the calendar, trade in the *momentum direction* instead

| Config | moIR | minHt | maxDD | Comment |
|---|---:|---:|---:|---|
| Skip >1000 | 0.90 | 4.03 | 59.5% | Clean, justified |
| Penalty-20 | 0.91 | 3.81 | 55.8% | Soft skip for weak signals |
| **Flip >750** | **0.99** | 3.94 | 56.0% | Best empirical moIR |
| Flip >500 | 1.01 | 3.55 | 53.3% | Best moIR but fragile |

**Flip result explained:** The flip converts bad calendar-against-crash trades into momentum trades. Nov 2022 (FTX): calendar said LONG, flip said SHORT, BTC kept falling → +$173 vs −$371. Aug 2024: same pattern → +$221 vs −$252.

**Why we didn't adopt the flip:** The flipped trades themselves have t=0.87 — not statistically significant. The apparent moIR improvement is driven by 2-3 specific macro crash events where the flip happened to work. Across all 2,525 flipped trades in the best config, the momentum edge is essentially noise. This is **in-sample selection** of rare events, not a replicable strategy.

**Verdict:** Use **skip** mode. It has a clear mechanistic justification ("don't trade calendar patterns during macro shocks"), doesn't require an independent edge in the filtered trades, and produces t=5.86 with maxDD=59.5%.

---

## 7. The Blend Experiment

**Question:** Instead of a single 14-day lookback, would a weighted blend of 3-day, 6-day, and 9-day returns give a more responsive filter?

### Setup

Tested 20 weight combinations (w3, w6, w9) × 7 thresholds × 2 modes (skip/penalty) = 140 configs. 

Blend formula: `blendBps = (w3×r3d + w6×r6d + w9×r9d) / (w3+w6+w9)`

### Result

**The best performing "blend" was pure r3d alone (w3=1, w6=0, w9=0).**

| Config | moIR | t | Notable |
|---|---:|---:|---|
| Pure r3d (1/0/0) | **1.00** | 6.14 | Winner |
| r3d+r6d (1/1/0) | 0.91 | 6.12 | Good |
| Equal blend (1/1/1) | 0.90 | 5.89 | Diminishing returns |
| r3d+r9d (1/0/1) | 0.92 | 5.86 | Minor improvement |

Adding the 6-day and 9-day returns to the blend dilutes the 3-day signal. They are correlated with r3d but lagged — they add noise, not independent information. The 3-day return alone captures the active crash phase better than any blend.

The best overall config from this sweep: `pen20(1/0/0)>=500` — penalise by 20 score points when the 3-day return opposes by more than 500 bps. This gives moIR=1.00, t=6.14\*\*\*\*.

---

## 8. The Horizon Sweep

**Question:** Is 3 days (72 hours, 4320 bars) the right lookback, or is there a better horizon? The user specifically asked about 36 hours.

### Methodology

Swept 17 horizons from 12h to 336h (14 days) at 12h increments. For each horizon, tested both skip and penalty-20 modes with thresholds scaled to typical BTC volatility at that horizon (σ = √(h/24) × 200 bps) plus fixed values.

### Results: the moIR curve

| Horizon | Best mode/thresh | **moIR** | t | months+ |
|---|---|---:|---:|---:|
| 12h | skip/500 | 0.87 | 5.47 | 40/49 |
| 18h | skip/173 | 0.87 | 5.30 | 41/49 |
| 24h | pen20/300 | 0.81 | 5.50 | 41/49 |
| 30h | skip/300 | 0.81 | 5.32 | 41/49 |
| **36h** | **skip/300** | **0.97** | **5.98** | **43/49** |
| 48h | skip/200 | 0.92 | 5.86 | 42/49 |
| 60h | skip/474 | 0.94 | 5.83 | 41/49 |
| **72h** | **skip/300** | **0.97** | **6.06** | **43/49** |
| 96h | skip/600 | 0.96 | 6.07 | 41/49 |
| 168h | skip/794 | 0.93 | 6.09 | 42/49 |
| 336h (14d) | skip/500 | 0.95 | 5.67 | 42/49 |

**Two peaks: 36h and 72h both reach moIR=0.97**, with 72h also having the higher t-stat (6.06 vs 5.98).

### Why 24-30h is a trough (moIR=0.81)

The strategy already contains **Rev24h** — a fade signal that fires when the 24-hour return exceeds 30 bps in one direction. The 24h and 30h momentum filters overlap with Rev24h, partially cancelling the signal. Rev24h says "buy the dip," a 24h momentum filter says "skip buying the dip" — they fight each other.

### Why 36h works

36 hours is the gap after Rev24h's influence fades but before the "noise floor" of shorter windows dominates. A 36-hour sustained adverse move is genuine directional momentum, not just an intrabar spike. At 300 bps (3% adverse in 36h), we're filtering a meaningful regime signal.

### Why 72h also works (and slightly better)

3 days (72h = 4320 bars) is the canonical short-term momentum window. A 3-day adverse move of 300+ bps (3%) indicates an established regime that the calendar signals are unlikely to overcome in the next 1-4 hours.

### The global winner: 72h/skip/300

Compared to 72h/penalty-20/500 (the best from the blend sweep):

| Metric | 72h/pen20/500 | **72h/skip/300** |
|---|---:|---:|
| t-stat | 5.74 | **6.06\*\*\*\*** |
| moIR | 0.88 | **0.97** |
| mSD | $158 | **$144** |
| maxDD | 66% | **52.9%** |
| months+ | 42/49 | **43/49** |

Using a hard skip at 300 bps (1σ of 3-day BTC move) outperforms the penalty approach. When 72h momentum is 1σ+ adverse to the calendar signal, it's more reliable to skip entirely than to trade at reduced conviction.

---

## 9. Symmetry Analysis: Does the Filter Work Both Ways?

**Question:** The filter is symmetric in structure — skip LONG when 72h is down 300+ bps, skip SHORT when 72h is up 300+ bps. But does it actually help on both sides?

### The 72h bucket analysis

**LONG trades by 72h return:**

| 72h return | n | Mean | t | What it means |
|---|---:|---:|---:|---|
| < −600 bps (crash) | 365 | **+3.7 bps** | **0.39** | Edge gone — filter blocks this correctly |
| −600 to −300 bps | 510 | **+13.8 bps** | **2.58\*** | **Filter wrongly blocks these — they're good trades** |
| −300 to −100 bps | 615 | +7.6 bps | 1.97\* | Kept, fine |
| −100 to 0 | 473 | +4.0 bps | 0.95 | Kept, marginal |
| 0 to +100 | 490 | −0.5 bps | −0.15 | Kept, noise |
| +100 to +300 | 641 | +10.9 bps | 2.86\*\* | Kept, good |
| > +300 (rally) | 729 | +11.4 bps | 2.78\*\* | Kept, good |

**SHORT trades by 72h return:**

| 72h return | n | Mean | t | What it means |
|---|---:|---:|---:|---|
| > +600 bps (rally) | 272 | **−0.2 bps** | **−0.02** | Edge gone — filter blocks this correctly |
| +300 to +600 bps | 328 | **+15.9 bps** | **2.30\*** | **Filter wrongly blocks these — they're good trades** |
| Neutral ±300 bps | n/a | positive | significant | Kept, working |
| < −300 bps (decline) | 377 | +8.8 bps | 1.23 | Kept, fine |

### Critical finding: the 300 bps threshold is too tight

The filter at 300 bps captures two distinct zones:

1. **Crash/spike zone** (>600 bps adverse): t=0.39 for longs, t=−0.02 for shorts — the edge genuinely breaks down. **Should block.**

2. **Moderate dip/rally zone** (300–600 bps adverse): t=2.3–2.6 — these trades are *profitable*. They represent the **calendar + Rev24h working together**: BTC has pulled back 3–6% in 72h, the Rev24h signal fires (fade the move), DOW/hour confirms. This is exactly the setup the strategy is designed for. **Should keep.**

**The correct threshold is 600 bps, not 300 bps.** The 300 bps threshold in the backtest sweep showed better moIR due to the cooldown chain effect — blocking some trades at 300 bps creates room for different subsequent entries — but at the individual trade level, the 300–600 bps zone is clearly worth keeping.

### Asymmetry between sides

| Filtered group (gate fired) | Mean | t | Assessment |
|---|---:|---:|---|
| Longs: gate36 fired | +6.9 bps | 1.02 | Weak → removing helps |
| **Longs: both gates fired** | +4.7 bps | **0.61** | Near noise → correct to skip |
| Shorts: gate36 fired | +9.2 bps | 1.42 | Below sig → marginal benefit |
| **Shorts: both gates fired** | +7.1 bps | **0.98** | Marginal → filter helps less here |

The filter helps **longs significantly more than shorts**. The catastrophic months (Nov 2022, Aug 2024, Jan 2025) were all long-side losses during crashes. Rising markets causing short losses are rarer and slower — they give the strategy time to adapt via extensions and natural exits.

### Correct implementation

```typescript
// Skip entry if 72h return is more than 600 bps against calendar direction
// (not 300 bps — that blocks the good moderate-dip Rev24h setups)
const r72h = lbRet(bars, i, 4320)          // 72-hour return in bps
const dir  = score > 0 ? +1 : -1           // calendar direction
if (dir * r72h < -600) continue            // skip: 6%+ adverse 3-day move

// Optional: also add 36h gate for faster crash detection
// (catches intraday/overnight crashes before they compound to 72h threshold)
const r36h = lbRet(bars, i, 2160)          // 36-hour return in bps
if (dir * r36h < -600) continue            // same threshold: conservative
```

Both gates at 600 bps means:
- LONG skipped when BTC fell >6% in last 36 OR 72 hours
- SHORT skipped when BTC rose >6% in last 36 OR 72 hours

---

## 10. Final Verdict: 1-Year vs 4-Year Config

Backtested the 1-year dataset (Apr 2025–Apr 2026) at MEXC 0% fee, comparing:
- **Baseline** (current runner): LE=10, SE=8, XT=10, maxH=240, cap=2
- **New** (4yr optimised + 72h/skip/600): LE=12, SE=12, XT=15, maxH=180, cap=0, 72h/skip/600

### Results

| Metric | Baseline (runner) | New (4yr) |
|---|---:|---:|
| Trades | 1,046 | 1,313 |
| Mean/trade | **+16.91 bps** | +10.34 bps |
| t-stat | **4.62\*\*\*\*** | 3.74\*\*\* |
| WR | 51.5% | **52.3%** |
| **Monthly IR** | **2.20** | 0.99 |
| Monthly SD | **$93** | $159 |
| Months positive | **13/13** | 11/13 |
| End equity ($500 start) | **$53,531** | $16,678 |
| Max DD | **32.4%** | 51.5% |

**The baseline wins on every single metric on the 1-year dataset.**

### Why the 4yr config underperforms on 1 year

**1. More trades but lower per-trade edge.** LE=12/SE=12 fires more entries than LE=10/SE=8 on this dataset, but the additional entries are lower-quality. Mean drops from 16.9 to 10.3 bps.

**2. The 72h/600 filter removes good Rev24h setups.** The 300–600 bps zone (moderate dip setups) was the sweetspot for calendar + Rev24h in 2025. The 600 bps threshold theoretically keeps these, but the filter still removes some entries due to cooldown chain effects.

**3. maxH=180 misses late-profit phases.** In 2025's trending environment, positions held to 240 min captured extra profit in the final 60 minutes. With maxH=180, these gains are left on the table.

**4. cap=0 allows stop spirals.** Without the cap=2, some months (Feb 2026: −$39 vs baseline +$134) saw multiple same-direction stops that the cap would have broken.

**5. extThresh=15 extends less often.** In 2025's strong directional months (Oct, Mar), the 10 threshold allowed more profitable extensions. The 15 threshold exits prematurely.

### Month-by-month

| Month | Baseline | 4yr config | Winner |
|---|---:|---:|---|
| 2025-05 | +$149 | −$43 | **BASE** |
| 2025-06 | +$274 | +$346 | NEW |
| 2025-09 | +$206 | +$305 | NEW |
| 2025-10 | **+$326** | +$127 | **BASE** |
| 2025-12 | **+$158** | $0 | **BASE** |
| 2026-02 | **+$134** | −$39 | **BASE** |
| 2026-03 | +$404 | +$468 | NEW |

The 4yr config wins in months with strong directional reversals (June, Sep). Baseline wins in high-volatility months (Oct, Dec 2025, Feb 2026).

### What this means

The results are not a contradiction — they're the fundamental overfitting tension:

> **The baseline (1yr SMOOTH) is calibrated for the Apr 2025–Apr 2026 regime.**  
> **The 4yr STABLE config is calibrated for cross-regime robustness.**

Both are correct for their respective objectives. The right choice depends on whether you believe the 2025 regime will persist or whether you need resilience across multiple regimes.

---

## 11. Decision

### Keep the baseline running

The current `combined-demo.ts` runner uses:
```
LE=10, SE=8, XT=10, maxH=240, cap=2, softSL=100, hardSL=125
```

This produced **13/13 positive months, moIR=2.20, maxDD=32.4%** on its validation year. It is demonstrably working. Do not change it mid-run without a completed OOS validation.

### Research conclusions for future deployment

When preparing the MEXC live runner or tuning for a new year:

1. **The 4yr parameters are the right starting point for a new deployment.** If we were starting fresh today (no prior year of optimisation), STABLE_4yr would be the safer base.

2. **The 72h momentum filter at 600 bps is valid and worth including.** Clear mechanism (don't trade calendar patterns into macro shocks), validated at the correct threshold (600 bps, not 300 bps, to avoid blocking the good Rev24h moderate-dip setups).

3. **The 36h filter provides additional early-warning value.** Use both 36h and 72h as gates, both at 600 bps. 36h catches the initial crash phase; 72h confirms a sustained regime.

4. **Signal count ≥ 2 is an implicit consequence of LE/SE=12.** Most trades at score ≥ 12 already have 2+ contributing signals. No explicit signal-count filter needed.

5. **The flip mechanism is not worth implementing.** moIR improvement from flip trades is driven by 2-3 macro crash events in-sample. The flipped trades themselves have t=0.87 — no independent edge.

### Recommended config for MEXC live deployment

Once funded and validated with test trades:

```typescript
// Signal parameters — calibrated for cross-regime robustness (4yr validated)
const LONG_ENTRY_THRESH  = 12    // raised from 10 — removes noise entries
const SHORT_ENTRY_THRESH = 12    // raised from 8 — removes junk short singletons
const EXT_THRESH         = 15    // raised from 10 — don't extend fading signals
const MAX_HOLD_BARS      = 180   // reduced from 240 — less exposure per trade

// Risk/stop — unchanged
const SOFT_SL_BPS = 100
const HARD_SL_BPS = 125
const RISK_PCT    = 0.03

// Momentum filter — skip entries into macro shocks
const R36H_SKIP_THRESH = 600  // skip if 36h adverse move > 6%
const R72H_SKIP_THRESH = 600  // skip if 72h adverse move > 6%
// Implementation: if (dir * lbRet(bars, i, 2160) < -600) continue
//                 if (dir * lbRet(bars, i, 4320) < -600) continue
```

**Expected 4-year performance (MEXC 0%):**
- t = 5.86\*\*\*\* (with 72h/skip/600)
- moIR = 0.90
- maxDD = 59.5%
- months+ = 43/49 (88%)

---

## Appendix: Key Numbers

### Research files

| File | What it tests | Key result |
|---|---|---|
| `src/research/fouryear-backtest.ts` | Full 4-year backtest | t=4.57 across all regimes |
| `src/research/wr-month-analysis.ts` | 720-config 4yr sweep | STABLE_4yr identified |
| `src/research/rolling-walkforward.ts` | Rolling OOS validation | Baseline slightly stronger OOS |
| `src/research/flip-exit-backtest.ts` | Skip vs penalty vs flip | Skip wins; flip is in-sample |

### Data files

| File | Coverage | Bars |
|---|---|---|
| `data/klines/BTCUSDT-1m-2022-2025.jsonl` | Apr 2022 → Dec 22 2022 | 382,000 |
| `data/klines/BTCUSDT-1m-2022-2025b.jsonl` | Dec 23 2022 → Apr 9 2025 | 1,208,158 |
| `data/klines/BTCUSDT-1m.jsonl` | Apr 10 2025 → Apr 9 2026 | 525,075 |

### Momentum filter thresholds: what they mean

| Threshold | Horizon | BTC context | Typical frequency |
|---|---|---|---|
| 300 bps | 72h | 3% in 3 days — moderate dip | ~17% of entries |
| **600 bps** | **72h** | **6% in 3 days — real correction** | **~9% of entries** |
| 1000 bps | 14d | 10% in 2 weeks — macro shock | ~13% of entries |

600 bps / 72h is the correct threshold because:
- It blocks only the **crash zone** (72h < −600: t=0.39 for longs)
- It keeps the **moderate dip zone** (72h −300 to −600: t=2.58\* — good trades)
- The 300 bps threshold blocks both zones, unnecessarily cutting profitable Rev24h setups

### Why 24h is the trough

```
12h: moIR=0.87 (good — responsive)
18h: moIR=0.87 (good)
24h: moIR=0.81 ← trough (conflicts with Rev24h signal)
30h: moIR=0.81 ← trough
36h: moIR=0.97 ← peak (gap after Rev24h zone)
72h: moIR=0.97 ← peak (canonical 3d momentum)
```

Rev24h is a **fade signal** (buy when price fell last 24h). A 24h momentum filter says "skip when price fell last 24h." They directly oppose each other. The filter must use a horizon long enough to escape this conflict — hence 36h as the minimum useful horizon.
