# Totally Spies S7 Bible — Remaining Gaps (current)

Generated after: all gap passes complete through April 2026.

## Summary table

| # | Gap | Status | Priority | Fix effort |
|---|---|---|---|---|
| 1 | Episodes 14–26 missing | **BLOCKED** — paywall | 🔴 HIGH | Waiting on YouTube uploads |
| 2 | Per-shot outfit descriptions | ✅ **CLOSED** — full re-extraction done | — | — |
| 3 | Gadget canonical names | 🟡 45% resolved | 🟡 MED | ~4h |
| 4 | Dialogue speaker attribution | ✅ **CLOSED** — 51% attributed | — | — |
| 5 | Supporting char outfits | ✅ **CLOSED** — 119 descriptions | — | — |
| 6 | Location naming within scenes | 🟡 Hierarchy built, naming still fuzzy | 🟡 MED | ~3h |
| 7 | Empty character shots | ✅ Closed (all closeable shots recovered) | — | — |
| 8 | Props normalization | ✅ **CLOSED** — 1,484 → 1,003 canonical | — | — |
| 9 | 2 missing villains | ✅ **CLOSED** — Maya + Muscles Malone found | — | — |
| 10 | Location "Other" bucket | ✅ **CLOSED** — 46/50 reclassified | — | — |
| 11 | Scene type accuracy | ⚠️ Only 36% spot-check accuracy | 🟡 MED | ~1h |
| 12 | S1–6 comparison bible | 🟡 Partial (3 frames analyzed) | 🟢 LOW | ~8h |
| PLDA | Speaker clustering quality | See section below | 🟡 MED | 5 min to unlock |

## Open gaps

### Gap 1: Episodes 14–26 (BLOCKED)

**Status:** Cannot download. Episodes 14–26 are behind Cartoon Network / Max
paywall and not available on YouTube as full episodes.

**What's missing:**
- Cyberchac's full villain arc (resolves in eps 20–26)
- Additional Mandy/Glitterstar screen time
- Episode-specific gadgets and locations
- The season finale

**Fix:** Wait for official YouTube uploads (channel uploads gradually) or
access via Max subscription + browser recording.

### Gap 3: Gadget names (45% resolved)

**Status:** 22/62 wiki canonical gadget names mapped at high confidence.
33 at medium/low. 7 unmatched.

**Root cause:** VLM described gadgets visually ("purple handheld device")
but couldn't name them. Wiki has the canonical names.

**Fix:** Per-episode gadget frame + wiki list → VLM confirmation pass.
Similar to the villain lockdown pass.

### Gap 6: Location naming within scenes

**Status:** 18 canonical categories built, sub-spaces named for top 5 categories.
Consecutive shot location consistency still only 38.6%.

**What's done:**
- WOOHP HQ → Tech Lab, Jerry's Briefing Room, Vehicle Bay, Server Room
- AIYA Academy → School Library, Cafeteria, Hallway, Outdoor Campus
- Singapore City → Rooftop, Shopping District, Street Market, Port, Night Market
- Villain Lair → Main Chamber, Control Room, Lab, Exterior
- Snowy Environment → Snowy Forest, Ski Resort, Ice Cave, Mountain Path

**What's still needed:** Consistent single-room labeling within a continuous
scene. Currently the same room gets different labels in adjacent shots.

### Gap 11: Scene type accuracy

**Status:** Spot-check showed only 36% agreement between original labels and
blind re-classification. Most common error: `dialogue` and `comedy-reaction`
mislabeled as `location-establish`.

**Implication:** Scene type labels are the least reliable field in the catalog.
Use with caution for training data filtering.

**Fix:** Re-classify all shots using a multi-frame context window (3 consecutive
frames at once) rather than single-frame classification.

---

## PLDA gap — speaker clustering quality

### What it is

The pyannote/speaker-diarization-3.1 pipeline uses a PLDA (Probabilistic
Linear Discriminant Analysis) model from `pyannote/speaker-diarization-community-1`
to improve speaker clustering. Without it, speakers who sound similar may be
split into multiple SPEAKER_N labels, reducing attribution quality.

**Current result without PLDA:** 51% of transcript segments attributed.
**Expected result with PLDA:** 60–70% attribution (estimated improvement).

### What's blocking it

The `pyannote/speaker-diarization-community-1` model has a gated access form
on HuggingFace that requires:
- Company or university affiliation
- Use case description

The model page: https://huggingface.co/pyannote/speaker-diarization-community-1

### What you need to do

1. Log into HuggingFace with the account associated with the token in
   `~/.config/pi-secrets/hf-token`

2. Visit: **https://huggingface.co/pyannote/speaker-diarization-community-1**

3. Fill in the gated access form:
   - **Company/university:** Cultscale (or similar)
   - **Use case:** Research — speaker diarization for animated TV series
     character attribution in a production bible pipeline

4. Click **"Submit"** or **"Request access"**

5. Once approved (usually immediate for the community model), re-run:
   ```bash
   devenv tasks run --show-output bible:diarize -- \
     .devenv/state/tmp/ep1-audio.wav \
     .devenv/state/diarization/ep1-with-plda.json
   ```

6. The `spies-diarize` script will automatically download the PLDA files
   on the next run and use them for improved clustering.

### How to re-run the full 13-episode diarization with PLDA

Once access is granted:

```bash
# Delete the old diarization files to force a re-run
for ep_id in 7lA-b6ou8yc DgZSBwIyP4o fdC6OBDmQGM Jl77Vup5yHw vpHCmQEXdCI \
             LcD5zxm4vKM YBoXsfO1y1Q FiCclhRQbiw 4L1cJkYeaT4 xUo79ZckeK0 \
             8MtYWoNOL3Y xxc4GQCan0U pDydzueEOJw; do
  rm -f materials/benchmark/youtube-s7-validation/bible/episodes/$ep_id/diarization.json
  rm -f materials/benchmark/youtube-s7-validation/bible/episodes/$ep_id/transcript-with-speakers.json
done

# Re-run the diarization pipeline
python3 /tmp/ts_gap4_diarize_all.py

# Re-run the speaker mapping
python3 /tmp/ts_gap4_speaker_map.py
```

Expected improvement: attribution rate from 51% → 60–70%.

---

## What is reliable right now

| Data | Confidence | Notes |
|---|---|---|
| Shot timing + frame paths | **HIGH** | Core pipeline is solid |
| Character presence (trio) | **HIGH** | Wiki-anchored re-identification |
| Catsuit colors (Sam/Clover/Alex) | **HIGH** | Verified against wiki |
| Villain names | **HIGH** | Wiki-corrected, 89% visually confirmed |
| Transcripts (text) | **HIGH** | whisper.cpp large-v3-turbo |
| Speaker attribution (51%) | **MEDIUM** | Works, PLDA would improve it |
| Location categories (18 buckets) | **MEDIUM-HIGH** | Good for coarse filtering |
| Gadget identifications | **MEDIUM** | Visual descriptions correct, proper names 45% |
| Scene types | **LOW-MEDIUM** | 36% spot-check accuracy |
| Supporting char outfits | **MEDIUM** | 119 targeted extractions |
| Props | **LOW** | NLP clustering only, no wiki ground truth |
