# Totally Spies S7 evaluation rubric

## Purpose

This rubric is for reviewing future generated clips, shots, or test reels
against the official Season 7 benchmark packs.

Use it after generation, not just during planning.

## Benchmark packs

Primary pack paths:

- `materials/benchmark/youtube-s7-validation/packs/character-costume-consistency/`
- `materials/benchmark/youtube-s7-validation/packs/dialogue-hold/`
- `materials/benchmark/youtube-s7-validation/packs/action-gadget/`

## Evaluation order

Always score in this order:

1. **character-costume-consistency**
2. **dialogue-hold**
3. **action-gadget**

Reason:

- if identity fails, stop
- if holds fail, stop
- only then evaluate action pressure

## Scoring scale

Use a 1–5 scale per criterion:

- **1** = unusable / obviously off-brand
- **2** = weak / frequent failures
- **3** = mixed / partially usable with visible issues
- **4** = strong / minor issues only
- **5** = benchmark-level / convincingly on-style

## Gate criteria

### Gate 1 — identity and costume

Reject immediately if any of these are true:

- face identity drifts across adjacent frames
- hair shape or silhouette changes unpredictably
- costume colors or shapes mutate between shots
- trio coherence breaks in shared shots
- supporting characters lose their distinct anchors

### Gate 2 — held / restrained motion

Reject immediately if any of these are true:

- held shots shimmer, breathe, or jitter
- backgrounds drift during supposed holds
- linework shows texture boiling / crawling
- motion bleed appears where stillness is expected
- facial changes unnecessarily propagate into the entire body

### Gate 3 — action / gadget readability

Reject immediately if any of these are true:

- gadgets become unreadable
- props melt into hands or sleeves
- color boundaries bleed across costume edges
- silhouettes collapse under motion
- action becomes generic fluid realism instead of controlled cutout-style motion

## Detailed rubric

### A. Character / costume consistency pack

| Criterion | What to look for | Weight |
| --- | --- | ---: |
| Face identity lock | same face, eye placement, nose/mouth structure across frames | 5 |
| Hair / silhouette stability | no drifting hair mass or outline changes | 4 |
| Costume continuity | same colors, seams, collars, gloves, hero-mission details | 5 |
| Trio coherence | Sam/Clover/Alex remain distinct and stable together | 4 |
| Supporting-cast anchors | Jerry, Mandy, Zerlina, Toby, etc. remain recognizable | 3 |

### B. Dialogue / hold pack

| Criterion | What to look for | Weight |
| --- | --- | ---: |
| Hold stability | body and pose remain stable across low-motion beats | 5 |
| Facial micro-change control | only intended expression / mouth changes occur | 4 |
| Background stability | walls, consoles, screens, city backgrounds do not breathe | 5 |
| Line stability | no shimmer, texture boiling, or crawling edges | 5 |
| Prop / gadget stability | held props remain locked unless intentionally animated | 3 |

### C. Action / gadget pack

| Criterion | What to look for | Weight |
| --- | --- | ---: |
| Gadget readability | gadgets remain clear, attached, and recognizable | 5 |
| Pose clarity under motion | action remains legible and cutout-like | 4 |
| Edge preservation | costume / body edges stay crisp under movement | 4 |
| Background separation | character motion does not drag the world with it | 4 |
| Motion restraint | action stays controlled rather than over-fluid or physics-heavy | 5 |

## Named failure modes to track

These terms should be used consistently in review notes:

- **identity drift** — face or character anchor changes across frames
- **costume morphing** — wardrobe details shift or mutate
- **silhouette degradation** — character outline loses clarity under motion
- **texture boiling** — static lines or fills shimmer / crawl frame to frame
- **background breathing** — static background subtly drifts or pulses
- **motion bleed** — motion appears in areas that should remain still
- **gadget melting** — prop geometry collapses into hands, sleeves, or effects
- **color bleeding** — neighboring color regions merge during motion

## Suggested pass thresholds

### Hard pass

- no gate failures
- average weighted score **>= 4.2**
- no criterion below **3** in character/costume or dialogue/hold

### Conditional pass

- no gate failures
- average weighted score **>= 3.5**
- limited isolated issues that are fixable without changing the whole pipeline

### Fail

- any gate failure
- or average weighted score **< 3.5**
- or repeated instability in identity / holds / gadget readability

## Review template

Use this structure for every tested output:

- **Test asset:**
- **Model / pipeline:**
- **Prompt / input:**
- **Pack used:**
- **Score summary:**
- **Gate result:** pass / conditional pass / fail
- **Top failures observed:**
- **Would this be reviewable by Laurent / Banijay?** yes / no
- **Would this justify more training?** yes / no

## Practical note

If time is limited, evaluate only these three questions first:

1. Do the characters stay themselves?
2. Do the holds actually hold?
3. Do gadgets and action remain readable without turning mushy?

If the answer to any one of those is "no," the output is not ready.
