# Builder operator runbook

Canonical workflow for a Hetzner auction builder in this repo.

## Preconditions

- `devenv shell`
- Hetzner Robot Webservice credentials loaded in `.env`
- Local SSH key present at `~/.ssh/id_ed25519`

## 1. Find and order a server

Preview candidates:

```bash
devenv tasks run --show-output builder:server:find-cheapest
devenv tasks run --show-output builder:server:find-cheapest-gpu
devenv tasks run --show-output builder:server:find-cheapest-ada
```

Plan or execute an order:

```bash
devenv tasks run --show-output builder:server:order:plan-cheapest
devenv tasks run --show-output builder:server:order:execute-cheapest
devenv tasks run --show-output builder:server:order:wait
devenv tasks run --show-output builder:server:select-latest
```

## 2. Rebuild the host through rescue into the canonical builder

Full end-to-end path:

```bash
devenv tasks run --show-output builder:validate:end-to-end
```

Manual canonical step-by-step path:

```bash
devenv tasks run --show-output builder:host:rescue:plan
devenv tasks run --show-output builder:host:rescue:execute
devenv tasks run --show-output builder:host:inspect
devenv tasks run --show-output builder:host:rootfs:prepare
devenv tasks run --show-output builder:host:boot:plan
devenv tasks run --show-output builder:host:boot:execute
devenv tasks run --show-output builder:local:register-remote
devenv tasks run --show-output builder:local:state-check
devenv tasks run --show-output builder:validate:remote-fresh
```

## 3. Use the remote builder for repo builds

```bash
export SPY_NIX_BUILD_OPTS="--builders 'ssh-ng://builder@[<builder-host>] x86_64-linux ~/.ssh/id_ed25519 24 48 benchmark,big-parallel,cuda - <host-key-base64>'"

devenv tasks run --show-output train:step1-scenes
```

**Do not add `--max-jobs 0`.**

Without it, Nix builds CPU-only steps locally and dispatches only
GPU-tagged derivations (`requiredSystemFeatures = ["cuda"]`) to the remote
builder. This avoids wasting GPU server time on work that doesn't need it.

On real GPU hosts, `builder:local:register-remote` will advertise registered features from state, typically including `cuda`.

## 4. Check builder state

```bash
devenv tasks run --show-output builder:local:state-check
```

This checks:

- canonical state files exist and are internally consistent
- registered features are a subset of supported features
- `builder-remote.conf` matches the saved host and host key
- stale legacy state files are absent

## 5. Monitor running builds

```bash
# One-shot GPU/CPU/RAM/disk status
bash tools/builder-progress-check.sh

# Continuous (every 15s)
bash tools/builder-progress-check.sh --watch

# Custom interval
bash tools/builder-progress-check.sh --watch --interval 5
```

## 5. Cancellation / teardown

Show cancellation status:

```bash
devenv tasks run --show-output builder:server:cancellation:show
```

Plan or execute immediate cancellation:

```bash
devenv tasks run --show-output builder:server:cancellation:plan-now
devenv tasks run --show-output builder:server:cancellation:execute-now
```

Revoke cancellation:

```bash
devenv tasks run --show-output builder:server:cancellation:revoke
devenv tasks run --show-output builder:server:cancellation:revoke-execute
```

