# CNN Workspace — Agent Instructions

## What this workspace does

Captures CNN International live streams from FAST/Pluto TV feeds, records them
as MP4, transcribes audio via Whisper, and produces cleaned interview excerpts.
Finished media is organized under `archives/`.

## Common tasks an agent should offer

### Capture a new recording session
```bash
./capture-cnni-all.sh <seconds>
```
Suggest a duration (e.g. 1800 for 30 min). Output lands in `recordings/<timestamp>/`.
After review, move finished sessions to `archives/recordings/<timestamp>/`.

### Transcribe a recording
Use the Pi `transcribe` skill on an MP4 from `recordings/` or `archives/recordings/`.
Save the transcript as `<basename>.transcript.txt` alongside the source file.

For files exceeding Whisper's size limit, first extract audio and split:
```bash
ffmpeg -i input.mp4 -vn -acodec libmp3lame -q:a 4 input.mp3
# then split into ≤25 MB chunks if needed
```

### Extract a clip by timestamp
```bash
ffmpeg -y -ss HH:MM:SS -to HH:MM:SS -i source.mp4 \
  -c copy -movflags +faststart output_clip.mp4
```

### Clean up a transcript into an excerpt
- Add speaker labels (`[Becky Anderson]`, `[Fuad Siniora]`, etc.)
- Remove ASR artefacts, filler, and ad breaks
- Preserve the interviewee's meaning faithfully; light grammar polish only
- Save as `<basename>_<speakers>_excerpt.txt`
- Move completed clip/excerpt bundles into `archives/recordings/<timestamp>/`

### Compare clips in the browser
Serve the repo root, then open the clip viewer:
```bash
cd /home/mnm/workspaces/cnn
python3 -m http.server 8000
# open /clips/index.html
```

### Refresh CNN FAST stream info
```bash
cnn-fast-list    # check available formats
cnn-fast-info    # full JSON metadata
```

## Conventions

- **File naming**: `YYYYMMDD_HHMMSS_<source_key>` for timestamped captures.
- **Metadata files** (`.txt`): contain label, gateway URL, resolved URL, start time.
- **Transcripts**: `<name>.transcript.txt` — raw ASR output.
- **Excerpts**: `<name>_<speakers>_excerpt.txt` — cleaned, speaker-labelled text.
- **No large files in git**: MP4s, recordings, logs, and archives are gitignored.
- **Gateway URLs expire**: if `capture-cnni-all.sh` returns 403, the URLs in the
  script need updating from the source playlist.

## Tools available in devenv

`yt-dlp`, `ffmpeg`, `curl`, `jq` — enter via `direnv allow` or `devenv shell`.

## Visual frame inspection — image size rules

Models will return **HTTP 413** if the total image payload in a single request is too large.
Keep all images sent to the model under ~150 KB each and no more than 2–3 per turn.

### Rule 1 — Always use JPEG, never PNG for video stills

```bash
# Good: small JPEG
ffmpeg -ss 00:19:30 -i source.mp4 -vframes 1 \
  -vf scale=640:-1 -q:v 3 /tmp/frame.jpg        # ~60 K

# Bad: lossless PNG
ffmpeg -ss 00:19:30 -i source.mp4 -vframes 1 /tmp/frame.png  # ~800 K — will 413
```

### Rule 2 — Use contact sheets for multi-frame timeline scans

Compose multiple frames into one tiled JPEG so only one image is sent:

```bash
# Extract 1 frame every 5 seconds as small JPEGs, then tile into a sheet
ffmpeg -ss START -t DURATION -i source.mp4 \
  -vf "fps=1/5,scale=320:-1" -q:v 4 /tmp/frames/f%04d.jpg

ffmpeg -y -pattern_type glob -i '/tmp/frames/*.jpg' \
  -vf "tile=4x4" -q:v 4 /tmp/sheet.jpg            # typically 100–200 K for 16 frames
```

This is the approach that worked in session 204e1b54 for the 20-minute and 24-minute
window scans (`stills_20m_sheet.jpg` 167 K, `stills_24m_sheet.jpg` 98 K). Do not
switch to reading individual frames when the contact-sheet approach is already working.

### Rule 3 — Crop to the lower-third for chyron / name-plate reading

CNN lower-thirds occupy the bottom ~25% of the frame. Crop before sending:

```bash
ffmpeg -ss 00:19:30 -i source.mp4 -vframes 1 \
  -vf "crop=iw:ih*0.25:0:ih*0.75,scale=640:-1" \
  -q:v 3 /tmp/chyron.jpg                          # ~15–30 K
```

### Rule 4 — Use OCR instead of visual inspection for chyron text

For reading guest names, titles, or lower-third text, OCR is more reliable
than asking the model to read a frame and does not consume any image quota:

```bash
nix run nixpkgs#tesseract -- /tmp/chyron.jpg stdout 2>/dev/null
```

Tesseract works well on CNN-style high-contrast chyrons.

### Quick reference

| Task | Correct approach | Avoid |
|---|---|---|
| Timeline scan (many frames) | ffmpeg → tile → 1 JPEG sheet | Reading N individual PNGs |
| Single frame check | JPEG `-q:v 3 scale=640:-1` (~60 K) | Full-res PNG (~800 K) |
| Chyron / lower-third text | Crop bottom 25% → OCR (tesseract) | Full-frame upload |
| Segment boundary inspection | 1 contact sheet (4×2 grid) over window | 8 separate reads in one turn |

## Ollama models — subagent use to avoid context overload

When a session accumulates a large context (long transcripts, many tool results, multiple
recording passes), offload isolated tasks to a fresh Pi subagent rather than continuing
in the same context. Use `pi --model <id> --non-interactive` or the Pi SDK.

### Available ollama models

#### Local (on-machine, no cloud)

| Pi model ID | Parameters | Vision | Notes |
|---|---|---|---|
| `ollama/qwen2.5vl:7b` | 8.3 B (Q4_K_M) | ✓ | Best for quick frame/chyron reads; low latency; limited context |

#### Cloud-proxied via local ollama (`:cloud` suffix)

These route through `localhost:11434` but are served by cloud infrastructure.

| Pi model ID | Vision | Reasoning | Context | Best for |
|---|---|---|---|---|
| `ollama/qwen3.5:397b-cloud` | ✓ | ✓ | 262 K | Main workhorse; long transcripts; complex multi-step tasks |
| `ollama/kimi-k2.5:cloud` | ✓ | ✓ | 262 K | Reasoning + vision; good alternative to qwen3.5 |
| `ollama/qwen3-coder-next:cloud` | — | — | — | Code-heavy tasks (script generation, ffmpeg pipelines) |
| `ollama/qwen3-vl:235b-cloud` | ✓ | ✓ | — | High-quality visual analysis when 7b local isn't accurate enough |
| `ollama/minimax-m2.7:cloud` | — | ✓ | 204 K | Large-context text reasoning; alternative when qwen3.5 is busy |
| `ollama/deepseek-v3.2:cloud` | — | ✓ | — | General reasoning; good for transcript cleanup |
| `ollama/gemini-3-flash-preview:cloud` | ✓ | ✓ | — | Fast vision + reasoning; flash-tier speed |

#### Enabled in Pi settings (immediately selectable with `/model`)

`ollama/qwen3.5:397b-cloud`, `ollama/qwen3-coder-next:cloud`,
`ollama/kimi-k2.5:cloud`, `ollama/qwen2.5vl:7b`

Others listed above are available in `~/.pi/agent/models.json` but not in
`enabledModels`; add them to `settings.json` if needed.

### Subagent patterns for this workspace

#### Chyron / lower-third OCR
Use the **local** `qwen2.5vl:7b` — fast, no cloud round-trip, accurate on
high-contrast CNN text:
```bash
# Crop to lower-third, send to local vision model
ffmpeg -ss 00:19:30 -i source.mp4 -vframes 1 \
  -vf "crop=iw:ih*0.25:0:ih*0.75,scale=640:-1" -q:v 3 /tmp/chyron.jpg
# Then: pi --model ollama/qwen2.5vl:7b --non-interactive \
#   "What name and title appears in this lower-third? /tmp/chyron.jpg"
```

#### Transcript search in a long file
Open a fresh subagent pointed directly at the transcript file rather than
pasting it into an already-large context:
```bash
pi --model ollama/qwen3.5:397b-cloud --non-interactive \
  "Search /path/to/recording.transcript.txt for the first mention of
   'Khalil Helou' and return the surrounding paragraph with its timestamp."
```

#### Frame-range visual inspection (contact sheet)
Generate the sheet locally, then use the local VL model to read it — no
image goes to a cloud API, no 413 risk:
```bash
ffmpeg -y -pattern_type glob -i '/tmp/frames/*.jpg' \
  -vf "tile=4x4" -q:v 4 /tmp/sheet.jpg
pi --model ollama/qwen2.5vl:7b --non-interactive \
  "Identify any guest lower-thirds visible in this CNN contact sheet. /tmp/sheet.jpg"
```

#### Audio repair / transcription of a broken stream
When the source has HE-AAC with a mismatched sample-rate header (as seen in
the `us_cnn_international_hd` capture), patch the ADTS header before
transcribing — do not just re-run Whisper on the raw file:
```bash
# See session 204e1b54 for the full ADTS 48→24 kHz patch script
# Short version: extract the bad-zone AAC, rewrite sample_freq_index=6 (24 kHz),
# decode with faad2, convert to MP3, then transcribe with the Pi transcribe skill
```

## Pi skills relevant to this workspace

| Skill          | Use for                                      |
|----------------|----------------------------------------------|
| `transcribe`   | Whisper ASR on captured audio/video           |
| `brave-search` | Look up context on interviewees or events     |
| `browser-tools`| Discover new stream URLs if FAST page changes |