# Totally Spies AI Pipeline — Executive Summary
*For: Banijay / Laurent meeting*
*Licensing verified April 2026 — full citations in `model-licensing.md`*

---

## What we are doing

- Building a **franchise-specific AI production pipeline** for Totally Spies Season 7
- Output: a fine-tuned model that generates S7-accurate marketing video and stills on demand
- Approach: **Bible-First Fine-Tuning** — catalogue the franchise in depth, use that catalogue as the training signal, fine-tune one open model
- Model: **Wan2.2** (Wan-AI, Apache 2.0) — genuinely hybrid, generates images and video from the same weights, fully commercial, no geographic or revenue restrictions
- Status: data work complete, pipeline ready, waiting for licensed episode files

---

## Why generic AI won't work for this franchise

- Sam, Clover and Alex share the same body proportions — **hair colour and suit colour are the only reliable differentiators**
- Without explicit anchoring, a 235-billion parameter vision model confuses Sam and Clover in ~30% of shots — a generation model will do the same, producing brand errors on every frame
- Season 7 is a visual reboot — different art style from S1–6 (cleaner linework, flat colour fills, anime-influenced proportions) — a model trained on the back catalogue produces the wrong look
- New S7 characters — Zerlina Lewis, Toby, Cyberchac, the WOOHP-e device, the Singapore setting — exist nowhere in any public training dataset; they cannot be prompted into existence

---

## How — three phases

### Phase 1 — Visual bible ✅ Done
- Catalogued every shot across 13 S7 episodes before touching any generation model
- Cross-validated everything against the official Totally Spies Wiki:
  - Villain identities corrected across all 13 episodes
  - Character anchors locked: Sam = orange hair + green suit, Clover = blonde + red suit, Alex = black hair + yellow suit
  - ~30 gadgets named by exact wiki name (WOOHP-e, Ultra-fixative Structural Foam, etc.)
  - 18-category canonical location hierarchy built (WOOHP HQ, Singapore City, AIYA Academy, Bubble Spy Café, etc.)
- The bible is the **training signal** — the quality of what you train on determines the quality of what comes out

### Phase 2 — Caption-supervised training dataset ✅ Done
- 1,551 video clips extracted from 13 episodes — 106 minutes at 720p
- 6,645 reference frames extracted for image training
- VLM caption generated for every clip, with full bible context injected per prompt:
  - Character visual anchors in every prompt → model cannot confuse Sam and Clover
  - Episode-specific villain and gadget data per clip
  - Speaker-attributed dialogue as additional grounding
- Result: 96% of clips have correctly identified characters, 49% name gadgets by exact wiki name

### Phase 3 — LoRA fine-tuning ⏳ Ready to run
- **LoRA** (Low-Rank Adaptation): a thin franchise-specific layer added on top of an existing model
  - Base model keeps everything it knows about motion, physics, animation
  - LoRA layer learns the S7 visual style, the characters, the locations
  - Training time: 8–12 hours on one GPU
  - Output: small portable weights, versioned, updatable
- **One remaining dependency: licensed episode files from Banijay**
- When they arrive → pipeline runs end to end in 12–16 hours

---

## What it produces

- Director gives a **brief**: "trio confronts villain on Singapore rooftop, Clover leads, mid-morning light"
- Brief → structured prompt via our prompt bible
- **Wan2.2 T2V mode**: text prompt → 5–8 second clip directly
- **Wan2.2 I2V mode**: storyboard frame → animated clip — character identity locked by the input image, no ambiguity
- **Wan2.2 image mode**: text → marketing still, key art, social post
- One fine-tuned model covers all three outputs
- The model knows the franchise — it does not need to be told "make it look like Totally Spies"

---

## The model — why Wan2.2

- **Only two open models** are commercially usable for a French production with no restrictions: Wan2.1 and Wan2.2
- Every other candidate has a verified legal blocker:
  - **FLUX.1 / FLUX.2 [dev]**: non-commercial weights — *"use for revenue-generating activity is NOT a Non-Commercial Purpose"* — commercial use requires a paid BFL agreement
  - **LTX-Video (Lightricks)**: $10M annual revenue threshold — Banijay at ~€3B/year triggers it — penalty for breach is double damages
  - **HunyuanVideo (Tencent)**: license text reads *"this agreement does not apply in the European Union, United Kingdom and South Korea"* — using it in France is unauthorized
  - **Wan2.7**: no open weights, API-only, cannot be fine-tuned
- **Wan2.2 is Apache 2.0** — no revenue threshold, no geography, no MAU limit, Wan-AI explicitly states *"we claim no rights over your generated contents"*
- **Wan2.2 is a genuine hybrid model** — same weights, same fine-tune, same LoRA:
  - Text → image (stills, key art)
  - Text → video (generate a scene from a brief)
  - Image → video (animate a reference frame or storyboard)

---

## Budget fallback — if video gets cut

- Use **Wan2.2 image mode** — same model, same fine-tune, same Apache 2.0 license
- Training data already built: 6,645 frames + 1,551 captions — nothing extra needed
- Produces: key art, character sheets, social stills, storyboard frames, thumbnails
- Does not produce: motion / video
- **Upgrade path**: fine-tune Wan2.2 once — use image mode now, switch to video mode when budget opens — same weights, no restart

| | Image tier | Video tier |
|---|---|---|
| Model | Wan2.2 image mode | Wan2.2 video mode |
| Training | Same LoRA | Same LoRA |
| Training time | ~4–6 h | ~8–12 h |
| Output | Stills, key art, social | Clips, promos, animated content |
| Data prep | ✅ Done | ✅ Done |
| License | Apache 2.0 | Apache 2.0 |
| Upgrade | → video when ready | — |

---

## Pipeline state right now

| What | Status |
|---|---|
| Visual bible — 2,852 shots, 13 episodes | ✅ Done |
| Wiki cross-validation — villains, gadgets, characters | ✅ Done |
| Training captions — 1,551 clips, 100% complete | ✅ Done |
| Training frames — 6,645 at 720p | ✅ Done |
| Dataset packages — Wan2.2 format | ✅ Ready |
| GPU training infrastructure | ✅ Ready |
| Licensed episode files | ⏳ Needed from Banijay |
| Episodes 14–26 | ⏳ Behind paywall, monitored |

---

## Key numbers

- 13 episodes analysed (S7 first half)
- 2,852 shots catalogued
- 1,551 training clips — 106 minutes at 720p
- 6,645 frames extracted
- 29,731 words transcribed, 51% speaker-attributed
- 1,551 / 1,551 VLM captions — 100%, wiki-anchored
- ~30 gadgets wiki-verified
- 18 canonical locations
- 1 model: Wan2.2, Apache 2.0
- Training time once licensed files arrive: 8–12 h on one GPU