# CNN Stream Capture & Analysis Workspace

Capture CNN International live streams, transcribe them, extract notable segments, and keep the finished media in local archives.

## Quick start

```bash
# Enter the Nix devenv (provides yt-dlp, ffmpeg, curl, jq)
direnv allow          # automatic via .envrc
# — or —
devenv shell          # manual
```

## Tools provided by devenv

| Tool     | Purpose                              |
|----------|--------------------------------------|
| `yt-dlp` | Stream discovery and download        |
| `ffmpeg` | Stream recording and clip extraction |
| `curl`   | URL resolution and HTTP probing      |
| `jq`     | JSON processing                      |

## Capture modes

### 1. CNN FAST streams (via yt-dlp)

Devenv exposes three helper scripts that target the CNN FAST page
(`https://edition.cnn.com/videos/fast/cnni-fast`):

```bash
cnn-fast-list              # list available stream formats
cnn-fast-info              # dump extractor JSON metadata
cnn-fast-grab              # download from the default CNN FAST URL
cnn-fast-grab <url>        # override the target URL
cnn-fast-grab <url> -f bv*+ba/b   # pass extra yt-dlp flags
```

### 2. Multi-stream gateway recording (via ffmpeg)

`capture-cnni-all.sh` records three CNNi feeds simultaneously from
gateway URLs (gohyperspeed / SceneTime-sourced Pluto TV feeds):

| Key                  | Label                  |
|----------------------|------------------------|
| `usa_cnni_hd_east`   | USA: CNNi HD East      |
| `pluto_germany_cnni` | (PLUTO Germany) CNNi   |
| `pluto_uk_cnni`      | (PLUTO UK) CNNi        |

```bash
./capture-cnni-all.sh                  # record until Ctrl-C
./capture-cnni-all.sh 1800             # stop after 1800 s
OUTDIR=./recordings ./capture-cnni-all.sh 600
```

Each run creates a timestamped directory under `recordings/` containing:
- `<key>.mp4` — browser-playable recording (H.264 + AAC, faststart)
- `<key>.log` — ffmpeg stderr
- `<key>.txt` — metadata (label, gateway URL, resolved URL, start time)

Finished sessions are moved to `archives/recordings/<timestamp>/`.

## Directory structure

```
.
├── AGENTS.md                  # workspace-specific agent guidance
├── archives/                  # finished captures and supporting artifacts
│   ├── clips/<timestamp>/     # archived comparison clips
│   └── recordings/<timestamp>/ # archived full capture sessions
├── capture-cnni-all.sh        # multi-stream recorder
├── clips/
│   └── index.html             # side-by-side viewer for archived clips
├── devenv.nix                 # Nix devenv config (packages + helper scripts)
├── devenv.yaml / devenv.lock  # devenv inputs
└── recordings/                # active capture landing zone for new runs
```

## Post-capture workflow

1. **Transcribe** a recording with the Pi `transcribe` skill (Groq Whisper).
2. **Extract** notable segments with ffmpeg timestamp trimming.
3. **Clean up** the raw ASR into a readable interview excerpt (speaker-labelled).
4. **Move finished artifacts into `archives/`** so the working tree stays tidy.

Example (from `archives/recordings/20260403_145807`):
- Full recording → `pluto_germany_cnni.mp4` (464 MB, ~30 min)
- Transcript → `pluto_germany_cnni.transcript.txt`
- Interview clip → `pluto_germany_cnni_becky-anderson_fuad-siniora_interview.mp4` (~85 MB)
- Cleaned excerpt → `pluto_germany_cnni_becky-anderson_fuad-siniora_excerpt.txt`

## Viewing clips side-by-side

Serve the repository root so the viewer can reach `archives/`:

```bash
cd /home/mnm/workspaces/cnn
python3 -m http.server 8000
# open http://localhost:8000/clips/index.html in a browser
```

## Notes

- Gateway URLs may expire or return 403; re-discover from the source playlist if needed.
- Large media and archive artifacts are `.gitignore`d — only scripts and config are versioned.
- The archived clip set is a one-time survey of CNN channels across Pluto TV regions (USA, CA, UK, France, Germany, Italy) and US direct feeds.
