# IIW material impact review

Date: 2026-05-03

Purpose: compare the newly synced IIW/Banijay Totally Spies S7 production package against the work already done from official YouTube/reference sources, and decide what it completes, enhances, or invalidates.

Primary inputs:

- `docs/internal/iiw-totallyspies-sync-inventory.md`
- `docs/internal/iiw-totallyspies-sync-file-manifest.csv`
- `docs/internal/iiw-totallyspies-archive-file-manifest.csv`
- `docs/research/dataset-state.md`
- `docs/research/s7-current-understanding.md`
- `materials/training-data/manifest.json`
- `materials/benchmark/youtube-s7-validation/bible/master-index.json`

Method note: this review is based on filesystem inspection, metadata, filenames, video technical inspection, existing manifests, and archive contents. It does not claim visual semantics beyond what the filenames/metadata and previous bible files support.

---

## Executive read

The IIW package does **not** invalidate the core approach. It validates the project direction and gives us the missing licensed material. It does, however, make several existing docs and some assumptions obsolete.

The old work remains valuable as a metadata, captioning, shot-taxonomy, diarization, and evaluation scaffold. The old media files should no longer be treated as training source material because they are 720p YouTube reference clips and explicitly marked non-licensed.

The biggest changes are:

1. We now have all 26 S7 episodes in licensed form, not only 13 official YouTube episodes.
2. Production episode numbering differs from our YouTube/wiki-derived numbering in important ways.
3. We now have canonical production design sheets for characters, outfits, props, gadgets, vehicles, backgrounds, and style guides.
4. `03_PROPS.zip` is the complete props package; the extracted props folder is only a subset.
5. Some current “remaining gaps” are now solved, while other gaps remain: scripts, storyboards, animatics, audio stems, 3D assets, and high-bitrate masters are still absent.

---

## What the IIW package completes

### 1. Licensed source dependency

Previous state in `docs/research/s7-current-understanding.md`:

- 13 S7 episodes available from official YouTube.
- 13 episodes behind paywall.
- Licensed episode files not yet received from Banijay.
- Licensed files were the single remaining dependency.

New state:

- Licensed package is received and synced.
- All 26 S7 episodes are present as `.mov` masters.
- Existing YouTube-derived clips can be replaced as training media.

Impact:

- `materials/training-data/README.md` instruction “When licensed episodes arrive…” is now active, not future tense.
- The current `materials/training-data/clips/` package is now a reference/bootstrap dataset, not the final training dataset.
- Training should be rebuilt from IIW masters.

### 2. Full season coverage

Existing training dataset covers 13 episodes and 1,551 clips.

IIW package covers the full season:

| Production EP | IIW English-title file |
| --- | --- |
| 01 | PANDAPOCALYPSE |
| 02 | IT TAKES A SLOB |
| 03 | TOTALLY VINTAGE |
| 04 | STINK-O-RAMA |
| 05 | CREEPY CRAWLY CREATURE CATCHER |
| 06 | TOTALLY TROLLING MUCH |
| 07 | OVER-SIMULATED |
| 08 | IT'S TOTALLY A TEST |
| 09 | TERRIBLE TODDLER TOYS |
| 10 | TOTALLY TALENTED |
| 11 | THE DAH WHO |
| 12 | MEGA MOON CHEESE |
| 13 | THE WILD LIFE |
| 14 | WHAT WOOLLY MAMMOTH |
| 15 | MYSTERY ON THE WOOHP EXPRESS |
| 16 | PUMPKIN PARTICLE PERIL V2 |
| 17 | UNDERCOVER SUPERVILLAINS |
| 18 | MANDYS MIND-BLOWING MAINFRAME |
| 19 | OLDIES AND GOODIES |
| 20 | TOTALLY PAWSOME |
| 21 | A DOG GONE DAY |
| 22 | SOMETHINGS FISHY |
| 23 | FOREVER LIPTASTIC |
| 24 | GLITTERSPY |
| 25 | LOCKED IN SPACE PERIL |
| 26 | CYBER SWEETHEART |

New-to-us episodes compared with the 13-episode YouTube training set:

- EP04 `STINK-O-RAMA`
- EP09 `TERRIBLE TODDLER TOYS`
- EP13 `THE WILD LIFE`
- EP15 `MYSTERY ON THE WOOHP EXPRESS`
- EP16 `PUMPKIN PARTICLE PERIL V2`
- EP18 `MANDYS MIND-BLOWING MAINFRAME`
- EP19 `OLDIES AND GOODIES`
- EP21 `A DOG GONE DAY`
- EP22 `SOMETHINGS FISHY`
- EP23 `FOREVER LIPTASTIC`
- EP24 `GLITTERSPY`
- EP25 `LOCKED IN SPACE PERIL`
- EP26 `CYBER SWEETHEART`

### 3. Official design bible material

The previous pipeline inferred character identity, outfit, props, gadgets, and location labels from VLM captions, wiki data, and YouTube frames.

The IIW package adds canonical production art:

- 2,015 character design files, 24.0 GB
- 1,876 background design files, 87.9 GB
- 660 extracted prop/gadget files, 6.6 GB
- 4,003 complete prop archive members in `03_PROPS.zip`, 59.8 GB
- 260 style-guide/key-art files, 3.2 GB
- 2,604 layered Photoshop assets overall
- 97 Illustrator vector assets
- 16 PDFs

Impact:

- Character and outfit identification no longer needs to rely only on frame-level VLM inference.
- The official art can become the canonical visual dictionary for Sam, Clover, Alex, Zerlina, Toby, Jerry, Mandy, Glitterstar, Cyberchac, WOOHP agents, guests, extras, animals, workers, and monsters.
- Prop/gadget matching can shift from fuzzy wiki text matching to filename-backed production prop IDs.
- Background/location conditioning can use production colour-card backgrounds rather than only episode frame crops.

---

## What the IIW package enhances

### 1. Captions and entity grounding

Existing captions are still valuable because they encode:

- shot type
- location category
- character names
- outfit descriptions
- dialogue context
- villain/gadget context
- story context

But the IIW package enhances the source of truth:

| Existing method | New enhancement |
| --- | --- |
| VLM guesses identities from frames | Official character sheets by named folder |
| Hair/suit rules for Sam/Clover/Alex | Official turnarounds and outfit variants |
| Wiki gadget names + F1 matching | Production prop filenames and PSD/JPG pairs |
| Canonical location categories from VLM/location pass | Production background colour cards and style-guide location plates |
| 13-episode metadata | full 26-episode production coverage |

Recommended update:

- Keep `training_caption` structure.
- Rebuild its entity dictionary from IIW production filenames.
- Add new fields: `production_episode`, `production_code`, `source_master`, `production_asset_refs`, `official_design_refs`.

### 2. Outfit database

The existing outfit pipeline improved to 856 / 1,551 clips with outfit data.

The IIW character folders contain many named outfit files, for example:

- Alex casual outfits across production episodes
- Alex reporter, botanist student, bodyguard, camping, snowboard, luxury, pyjama, wedding, race, sportswear variants
- Clover casual outfits, scientist, stylist, surf, winter, vintage, wingsuit, moto suit, snowboard, wedding variants
- Similar folders exist for Sam and secondary characters

Impact:

- Outfit labels can be canonicalized from filenames instead of inferred from frames only.
- This directly improves promptability: “Alex in camping outfit”, “Clover in vintage spy suit”, etc.
- This can balance Alex visually even if Alex is less frequent in episode footage.

Caveat:

- Design sheets should not be over-weighted in image/video training, or the model may learn to output reference-sheet compositions instead of cinematic frames.

### 3. Gadget/prop database

The current docs say only 79 / 1,551 clips have wiki-named gadgets, partly because gadget VLM identification was intentionally conservative.

The IIW package changes this significantly:

- `03_PROPS.zip` has 4,003 files.
- 3,344 of those are not in the extracted sync folder.
- It includes WOOHP/spies gadgets, vehicles, phones, remotes, robots, weapons, technology, chemistry, tools, security, makeup, clothing accessories, food, furniture, and general props.

Examples where production filenames match or clarify current gadget concepts:

- Compowder: `TS_700_PR_COMPOWDER_color`, `TS_703_PR_COMPOWDER_DISMANTLED_Color`
- Ultra-fixing foam: `TS_701_PR_ULTRA-FIXING_FOAM_v1_Color`
- Explosive eyebrow pencil: `TS_701_PR_GADGET_EXPLOSIVE_EYEBROW_PENCIL_V1_Color`
- Laser lipstick: `TS_703_PR_GADGET_LASER_LIPSTICK_Color`
- Laser parasol: `TS_703_PR_GADGET_LASER_PARASOL_Color`
- Magnetic boots: `TS_709_PR_GADGET_MAGNETIC_BOOTS_Color`
- Plasma jetpack: `TS_717_PR_GADGET_PLASMA_JETPACK_V1_Color`
- Moo box: `TS_714_PR_GADGET_MOO_BOX_V1_Color`, `TS_714_PR_GADGET_MOO_BOX_ICE_V1_Color`
- Drones: multiple `Drone` assets and expression/update files

Impact:

- The current gadget database is not wrong, but it is incomplete and too wiki/frame-derived.
- It should be merged with production filenames and treated as a crosswalk, not as the canonical list.
- `03_PROPS.zip` should be unpacked or streamed as canonical prop source for ingestion.

### 4. Background and location grounding

The current dataset has location known for 100% of clips, using 18 canonical categories.

The IIW package enhances this with:

- `02_BG/COLOR_CARD`: 931 files, 54.0 GB
- `02_BG/SPEEDLINES`: 945 files, 33.9 GB
- style-guide location plates for Dubai, Seoul, Siberia, New York, Paris, Singapore

Impact:

- The 18-location taxonomy remains useful for model conditioning.
- Production background files can improve exact environment style and compositional consistency.
- Speedlines should be downweighted or separated, because they are stylized action backgrounds and may bias the model toward abstract motion-line outputs if overused.

### 5. Evaluation and holdout design

The old benchmark remains very useful. It should now be repurposed:

- YouTube clips: reference/evaluation only
- IIW masters: training source
- Design assets: identity/prop/location source
- Existing captions: bootstrap labels

Recommended evaluation setup:

- Hold out complete production episodes/sequences from the IIW masters.
- Keep the existing YouTube-derived clips as a public-ish reference/eval benchmark, not as training.
- Create separate eval sets for identity, outfit, prop, background, and motion.

---

## What the IIW package invalidates or makes obsolete

### 1. “Licensed files not yet received”

Invalidated.

Update all docs that say licensed files are still missing. The package has arrived.

Affected docs:

- `docs/research/s7-current-understanding.md`
- `docs/research/dataset-state.md`
- `materials/training-data/README.md`
- possibly `docs/research/gpu-training-runbook.md`

### 2. “13 episodes available, 13 unavailable” as current project state

Invalidated as current state.

It remains historically true for the YouTube discovery phase, but not for the project now.

### 3. Episode numbering from YouTube/wiki context

Partially invalidated.

The 13 known episode names are mostly confirmed, but their production episode numbers differ from the previous YouTube/wiki-derived assumptions.

| Existing known episode | IIW production EP/title |
| --- | --- |
| Frankenpanda | EP01 `PANDAPOCALYPSE` / alternate title `FRANKENPANDA V2` |
| It Takes A Slob | EP02 `IT TAKES A SLOB` |
| Totally Vintage | EP03 `TOTALLY VINTAGE` |
| Creepy Crawly Creature Catcher | EP05 `CREEPY CRAWLY CREATURE CATCHER` |
| Totally Trolling, Much? | EP06 `TOTALLY TROLLING MUCH` |
| Over | EP07 `OVER-SIMULATED` |
| It's Totally a Test | EP08 `IT'S TOTALLY A TEST` |
| Totally Talented | EP10 `TOTALLY TALENTED` |
| The DAH | EP11 `THE DAH WHO` |
| Mega Moon Cheese | EP12 `MEGA MOON CHEESE` |
| What Woolly Mammoth | EP14 `WHAT WOOLLY MAMMOTH` |
| Undercover Supervillains | EP17 `UNDERCOVER SUPERVILLAINS` |
| Totally Pawsome | EP20 `TOTALLY PAWSOME` |

Impact:

- Do not use YouTube episode number as production episode number.
- Add `production_episode` alongside existing `episode_id` and `episode_name`.
- Any gadget/villain mapping keyed by episode number needs review.
- Existing mapping keyed by episode name is probably salvageable.

### 4. English/French duplicate assumption

Needs correction.

There are two title variants per episode. Filenames look like English/French pairs, but the `.mov` files have the same video technical spec and appear visually equivalent in spot technical comparison. Audio tracks are dual mono AAC in every inspected master.

Impact:

- Do not blindly train both title variants as independent visuals.
- Choose one canonical visual master per production episode.
- Keep alternate-title files as fallback and for audio/title metadata review.
- Add a video-frame-hash duplicate pass before final extraction.

### 5. Props folder completeness

Invalidated.

The extracted `02_Elements/DESIGN/03_PROPS/03_PROPS/` folder is not the complete prop package.

Observed:

- `03_PROPS.zip`: 4,003 files, 59.8 GB uncompressed
- Extracted props folder: 659 matching files from the ZIP
- Only-in-archive: 3,344 files

Impact:

- Treat `03_PROPS.zip` as source of truth.
- Do not train only from extracted `03_PROPS/03_PROPS/` or the model will miss most props.

### 6. “Alex under-represented cannot improve”

Partially invalidated.

Existing claim in `dataset-state.md`: Alex under-representation is show bias, not data bias.

This remains true for episode-frame frequency, but the IIW design sheets provide many Alex canonical references.

Impact:

- We cannot change Alex’s actual screen time in motion training.
- But we can improve Alex identity fidelity in image/keyframe generation by balancing design-sheet sampling.
- Keep motion distribution truthful; balance identity adapters separately.

### 7. Wiki-only character/gadget ground truth

Partially invalidated.

Wiki remains useful for names, plot context, villains, and public labels. But production files are more authoritative for visual asset identity.

Impact:

- Build an official production crosswalk:
  - wiki name
  - production filename/name
  - production episode code
  - visual category
  - canonical prompt token

---

## What remains valid

### Wan2.2 choice

Still valid.

Nothing in the IIW material changes the model selection logic. Wan2.2 remains the practical Apache-licensed model for image/video LoRA training.

### Caption schema

Still valid.

The `training_caption` idea is sound: visual description + structured context. It should be extended, not discarded.

### Location taxonomy

Still useful.

The 18 location categories are still useful as conditioning labels. They should now be connected to background production assets.

### Character anchors

Still valid but should be superseded by production references where possible.

Sam/Clover/Alex hair and suit anchors remain useful for frame captions, but production character sheets provide stronger identity references and outfit variants.

### Shot/scene extraction pipeline

Still valid.

The existing scene detection, shot taxonomy, clip extraction, manifest-building, and packaging logic should be reused against the IIW masters.

### Diarization/transcript work

Still useful but optional for first keyframe/image pass.

For video/action generation, audio and transcript are less central than visual shot labels. For semantic prompt alignment, the transcript/diarization work remains valuable.

---

## Recommended migration plan

### Phase 1: Update source-of-truth docs

Change current-state language:

- licensed files received
- full season present
- YouTube clips are reference only
- IIW masters are canonical training media
- production episode numbering supersedes YouTube numbering
- props source is `03_PROPS.zip`

### Phase 2: Build a production source manifest

Create a new manifest, for example:

`materials/training-data/iiw_source_manifest.json`

Fields:

```json
{
  "production_episode": "01",
  "canonical_title": "PANDAPOCALYPSE",
  "alternate_title": "FRANKENPANDA V2",
  "source_master": ".../Totally_Spies_S7_EP01_PANDAPOCALYPSE.mov",
  "alternate_master": ".../Totally_Spies_S7_EP01_FRANKENPANDA_V2.mov",
  "video_duplicate_status": "pending | confirmed_duplicate | variant",
  "duration_s": 1317.269,
  "fps": 25,
  "width": 1920,
  "height": 1080,
  "bitrate_bps": 3553728,
  "existing_youtube_episode_id": "7lA-b6ou8yc",
  "existing_bible_name": "Frankenpanda"
}
```

### Phase 3: Rebuild training media from IIW masters

Replace:

- `materials/training-data/clips/`
- `materials/training-data/first_frames/`
- `wan21_metadata.json` / renamed Wan2.2 metadata

With licensed IIW-derived media.

Keep:

- old manifest as `youtube_reference_manifest.json`
- old captions as label seed
- old shot IDs as cross-reference where episode/title matches

### Phase 4: Create official design dictionaries

Generate:

- `official_characters.json`
- `official_outfits.json`
- `official_props.json`
- `official_backgrounds.json`
- `official_style_guides.json`

From:

- `01_CH/`
- `02_BG/`
- `03_PROPS.zip`
- `STYLE GUIDE/`

### Phase 5: Merge wiki + production references

Create a `production_wiki_crosswalk.json`:

```json
{
  "wiki_name": "Compowder",
  "production_names": [
    "TS_700_PR_COMPOWDER_color",
    "TS_703_PR_COMPOWDER_DISMANTLED_Color"
  ],
  "category": "spy gadget",
  "prompt_token": "compowder",
  "confidence": "high"
}
```

### Phase 6: Retrain as a two-lane system

Do not simply dump all assets into one Wan2.2 LoRA.

Use:

1. **Episode-frame/video LoRA** for finished show look and motion.
2. **Design-reference LoRA or adapters** for character, outfit, prop, and background fidelity.
3. Optional focused adapters for main trio, WOOHP gadgets, and specific campaign looks.

---

## Training implications

### For keyframes/images

The IIW package is a major upgrade. Use:

- episode frames for cinematic composition
- character sheets for identity
- outfit sheets for promptability
- props/gadgets for object fidelity
- background colour cards for environment style
- key art/style guide for brand composition, lightly weighted

Risk:

- Too many design sheets can make the model produce reference-sheet outputs.
- Use design assets as identity anchors and captions, not as the majority of frame-style training.

### For video

Use only episode masters for motion LoRA.

Do not train video from static character sheets or props.

Use the design assets for conditioning/reference or image pretraining, not temporal learning.

### For evaluation

Existing YouTube-derived dataset should become evaluation/reference material rather than training material.

Use it to test:

- whether the new licensed-trained model still reproduces known S7 style
- whether it generalizes across YouTube benchmark prompts
- whether production episode numbering changes break any captions or labels

---

## Immediate action items

1. Update docs that describe licensed files as missing.
2. Build canonical IIW episode source manifest.
3. Add production episode number mappings to existing bible records.
4. Run visual duplicate/hash pass across the two `.mov` title variants per episode.
5. Re-extract clips and first frames from one canonical IIW master per production episode.
6. Unpack or stream `03_PROPS.zip` for prop dictionary construction.
7. Generate official design dictionaries from file/folder names.
8. Create a production/wiki crosswalk for gadgets, props, characters, outfits, and villains.
9. Rebuild Wan2.2 training package from licensed media.
10. Keep the old YouTube training dataset as a validation/reference set.

---

## Bottom line

The previous work is **not wasted**. It is the scaffold: captions, taxonomy, scripts, evaluation habits, location categories, shot segmentation, and training packaging.

The IIW sync changes the source hierarchy:

1. IIW/Banijay licensed masters and production art become canonical.
2. Existing YouTube-derived media becomes reference/evaluation only.
3. Wiki remains useful for names and context, but production filenames and design sheets become stronger visual ground truth.
4. Some docs are now stale and should be updated before the next training run.
