# GPU Training Runbook
*Execute from the licensed IIW/Banijay English episode masters.*
*Last updated: 2026-05-03.*

---

## Status

| Item | Status |
|---|---|
| Visual bible (13 episodes) | ✅ Done |
| Training captions — `training_caption` | ✅ Done, 1,551 clips |
| Dataset packages (`wan21_metadata.json`) | ✅ Done |
| Pipeline scripts | ✅ All in devenv |
| Models ingested (ASR, Qwen, Qwen-VL) | ✅ Done |
| GPU builder | Provision on demand (~5 min) |
| Licensed English episode files | ✅ Received and indexed in `materials/training-data/iiw_english_source_manifest.json` |

---

## Model: Wan2.2 (Apache 2.0)

**Why Wan2.2, not the alternatives:**

| Model | Blocker |
|---|---|
| FLUX.1 / FLUX.2 [dev] | **Non-commercial weights.** BFL license: *"revenue-generating activity is NOT a Non-Commercial Purpose."* Needs paid agreement. |
| LTX-Video (Lightricks) | **$10M revenue threshold.** Banijay ~€3B/year triggers it. Penalty: double damages. |
| HunyuanVideo (Tencent) | **EU excluded.** License: *"does not apply in the EU, UK, South Korea."* Unlicensed in France. |
| Wan2.7 | **No open weights.** API-only via Atlas Cloud. Cannot be fine-tuned. |

**Wan2.2 facts:**
- Apache 2.0: no revenue threshold, no geographic restriction
- Hybrid: same fine-tuned weights → images and video
- `TI2V-5B`: RTX 4090 (24 GB VRAM), ~4–6h training → image + short video
- `T2V-A14B`: A100 80GB, ~8–12h training → high quality video

Full analysis: `docs/research/model-licensing.md`

---

## Hardware requirements

### Training (one-off)

| Tier | GPU | VRAM | RAM | Storage | Time |
|---|---|---|---|---|---|
| Image / TI2V-5B | 1× RTX 4090 class | 24 GB | 32 GB | 200 GB NVMe | ~4–6 h |
| Video / T2V-A14B | 1–2× A100 80GB or H100 | 80 GB+ | 128 GB | 500 GB NVMe | ~8–12 h |

OS: Linux, NixOS preferred. Build system handles all dependencies automatically.

### Inference (ongoing)

| Use case | GPU | VRAM | Per clip (5s, 720p) |
|---|---|---|---|
| Light / testing | 1× RTX 4090 | 24 GB | ~1–2 min |
| Regular production | 1× A100 40GB | 40 GB | ~2–4 min |
| High volume | 2× A100 80GB | 160 GB | ~1 min |

---

## Step 0 — Provision GPU builder

```bash
devenv tasks run builder:server:find-cheapest-gpu   # find options
devenv tasks run builder:server:order:execute-cheapest  # provision
devenv tasks run builder:server:order:wait          # wait for ready (~5 min)
devenv tasks run builder:local:register-remote      # register as Nix builder
devenv tasks run builder:validate:remote            # verify CUDA + NixOS
```

Hetzner GPU cloud instances:
- `GX2-4`: 4× NVIDIA A30 (24GB each) — good for TI2V-5B
- `GX2-10`: 2× A100 80GB — recommended for T2V-A14B

---

## Step 1 — Ingest licensed English episodes

Start from the English-title IIW production masters indexed in:

- `materials/training-data/iiw_english_source_manifest.json`
- `docs/internal/iiw-english-episode-source-manifest.csv`

Do not train from both English and French title variants until a duplicate/variant pass says they are meaningfully different. For now, one canonical English master per production episode is enough.

```bash
python tools/build_iiw_english_source_manifest.py
# next: run the IIW-specific extraction job to replace training-data/clips and first_frames
# then package with train:step3-wan22-package
```

The existing 13-episode `training_caption` fields are reusable as bootstrap labels where the manifest maps to the old YouTube bible. New production episodes need new bible/caption metadata.

---

## Step 2 — Transcription and captioning (optional)

The existing captions are high quality (wiki-anchored, conservative VLM, no false gadget names). If you want to upgrade with local GPU:

```bash
# GPU ASR (whisper-large-v3-turbo)
devenv tasks run train:step2-transcript-base

# Transcript disambiguation using Qwen2.5-Omni
devenv tasks run train:step2-transcript-disambiguate

# Scene-level context (story bible + shot reference)
devenv tasks run train:step2-bible
devenv tasks run train:step2-shot-reference
devenv tasks run train:step2-scene-context

# VLM captioning (Qwen2.5-VL-7B, GPU)
devenv tasks run train:step2-caption

# Apply any manual caption overrides
devenv tasks run train:step2-reviewed
```

**Skip this block** if you want to use the existing `training_caption` field — just proceed to Step 3.

---

## Step 3 — Package dataset

```bash
# For Wan2.2 (recommended)
devenv tasks run train:step3-wan22-package

# For LTX-2 (alternative)
devenv tasks run train:step3-ltx2-data
```

---

## Step 4 — Latent encoding (LTX-2 only)

```bash
devenv tasks run train:step4-ltx2-preprocess   # ~2–4h on A100
```

Skip for Wan2.2 — not needed.

---

## Step 5 — LoRA training

For the licensed IIW rebuild, start with the video-only smoke subset rather than the legacy pipeline task:

```bash
python tools/run_wan22_train.py \
  --training-data-dir materials/training-data/iiw-english-smoke-video-only \
  --output-dir materials/training-data/iiw-english-smoke-video-only/wan22_checkpoints \
  --diffsynth-path /path/to/DiffSynth-Studio \
  --wan22-model /path/to/Wan2.2-TI2V-5B \
  --model-variant ti2v-5b \
  --lora-rank 16 \
  --epochs 1 \
  --dataset-repeat 20 \
  --learning-rate 2e-5 \
  --num-frames 81 \
  --height 480 \
  --width 832 \
  --gradient-accumulation-steps 4
```

Local dry-run validation:

```bash
python tools/run_wan22_train.py \
  --training-data-dir materials/training-data/iiw-english-smoke-video-only \
  --output-dir materials/training-data/iiw-english-smoke-video-only/wan22_checkpoints \
  --model-variant ti2v-5b \
  --lora-rank 16 \
  --epochs 1 \
  --dataset-repeat 20 \
  --learning-rate 2e-5 \
  --num-frames 81 \
  --height 480 \
  --width 832 \
  --gradient-accumulation-steps 4 \
  --dry-run
```

The older derivation remains available for the legacy reviewed dataset only:

```bash
# Wan2.2 TI2V-5B legacy pipeline task
devenv tasks run train:step5-wan22-train

# LTX-2 text-to-video alternative
devenv tasks run train:step5-ltx2-train
```

Smoke settings: LoRA rank 16, learning rate 2e-5, 1 epoch, dataset repeat 20, batch size 1 + gradient accumulation 4.

Smoke output: `materials/training-data/iiw-english-smoke-video-only/wan22_checkpoints/`

---

## Step 6 — Decommission builder

```bash
devenv tasks run builder:server:cancellation:execute-now
```

Do not forget this step. Hetzner GPU instances are billed hourly.

---

## Expected output

After training, the LoRA checkpoint enables:

- **Text → image**: `"Sam (red-orange hair, green catsuit) in WOOHP HQ, medium shot"` → keyframe
- **Text → video**: `"trio confronts villain on Singapore rooftop, mid-morning"` → 5–8s clip
- **Image → video**: storyboard frame → animated clip (character identity locked by input)

The model knows the franchise natively — it does not need to be told "make it look like Totally Spies."

---

## Troubleshooting

**`__noChroot` error on local machine:**
GPU pipeline steps require CUDA and can only run on the builder. CPU steps (step1, step2-bible, step2-shot-reference) run locally. This is correct and expected.

**Model not found:**
```bash
devenv tasks run models:ingest:wan22   # download Wan2.2-TI2V-5B to Nix store
devenv tasks run models:ingest:qwen    # Qwen2.5-VL for captioning
devenv tasks run models:ingest:asr     # whisper-large-v3-turbo
```

**Out of memory during training:**
Reduce batch size or switch to TI2V-5B (24 GB) from T2V-A14B (80 GB).
