# runpod-devenv-module

A reusable [devenv](https://devenv.sh) module that provides scripted access to a generic RunPod remote GPU job runner.

The `devenv` shell includes `runpodctl`, and the module prefers it where it is a clean fit instead of re-implementing the same API surface.

## Usage

**`devenv.yaml`**
```yaml
inputs:
  devenv-runpod:
    url: github:cultscale/runpod-devenv-module
```

**`devenv.nix`**
```nix
{ inputs, pkgs, ... }: {
  imports = [ inputs.devenv-runpod.devenvModules.runpod ];

  runpod = {
    enable = true;

    defaults = {
      image      = "docker.io/myorg/trainer:latest";
      diskGb     = 100;
      minVramGb  = 48;
    };

    jobs.my-job = {
      gpuPriority = [ "NVIDIA RTX 6000 Ada Generation" ];
      entrypoint  = "./train.sh";
      ports       = "22/tcp,8000/http";
      passEnvVars = [ "HF_TOKEN" ];
    };
  };
}
```

This generates a `runpod-submit-my-job` script (and shared `runpod-status`, `runpod-logs`, `runpod-download`, `runpod-destroy`, `runpod-search`).

When the entrypoint is part of the uploaded workspace, use a relative path such as `./train.sh`. The runner changes into the uploaded remote workspace before launching it.

If your image already contains the job code, omit `workspace` and use an absolute entrypoint such as `/opt/runpod/bin/serve.sh`.

Inside the shell you can also use `runpodctl` directly for lower-level RunPod tasks such as listing GPUs, inspecting pods, or managing SSH keys.

## Environment variables

| Variable | Description |
|---|---|
| `RUNPOD_API_KEY` | RunPod API key |
| `RUNPOD_SSH_KEY_PATH` | Path to SSH private key |
| `RUNPOD_IMAGE` | Docker image override (optional) |
| `RUNPOD_SSH_USER` | SSH user override (default: `root`) |

Set `RUNPOD_API_KEY` and `RUNPOD_SSH_KEY_PATH` in the calling shell, `.envrc`, or inline on the command you run. The module intentionally leaves credential values unset so inherited secrets continue to work inside `devenv`.

## Module options

See [`default.nix`](./default.nix) for all typed options under `runpod.*`.

## Workflow

```bash
# Submit a job with an uploaded workspace
runpod-submit-my-job --workspace /path/to/prepared-workspace

# Or run directly from the container image
runpod-submit-my-job

# Monitor
runpod-status --run <run-id>
runpod-logs --run <run-id>

# Download artifacts
runpod-download --run <run-id>

# Clean up
runpod-destroy --run <run-id>
```

## `runpodctl` integration

`runpodctl` is included in the `devenv` shell and the module now prefers it where it maps cleanly to the workflow:

- `runpod-search` prefers `runpodctl gpu list`
- on-demand pod creation prefers `runpodctl pod create`
- pod summary and teardown prefer `runpodctl pod list` and `runpodctl pod delete`
- direct `runpodctl` use stays available for manual pod and SSH-key management

The runner now resolves any configured GPU preference list down to one available GPU before creation. The custom Python runner remains for the higher-level parts that `runpodctl` does not replace cleanly in this repo yet, especially workspace bundling, remote entrypoint orchestration, artifact sync, the SSH readiness/runtime-port poll, and spot pod creation.

## Example templates

The repository includes reference templates under `templates/`:

- `templates/llm-inference` — vLLM-style OpenAI-compatible API deployment
- `templates/lora-trainer` — PEFT / Unsloth-flavoured LoRA or QLoRA training
- `templates/video-finetuner` — Diffusers-style video finetuning

Each template imports `/default.nix` from this repository in `devenv.yaml`, so it exercises the local module directly. Treat them as starting points for real implementations: each template shows the job shape, env contract, artifact layout, and a corresponding image-first path.

The training-oriented templates default to `SIMULATE_ONLY=1` so you can validate the module workflow before wiring in a real image and training command. The `llm-inference` template now defaults to a real vLLM startup plus a probe request against a small instruct model, while `SIMULATE_ONLY=1` remains available for smoke tests.

Inside a template directory, `devenv.local.nix` is auto-loaded by `devenv` and is a good place for local-only secrets or overrides such as `RUNPOD_API_KEY`, `RUNPOD_SSH_KEY_PATH`, `HF_TOKEN`, or a temporary `runpod.defaults.autoDestroy = false;` override while debugging.

## Image-first workflow

For repeatable production-style runs, prefer a prebuilt image that already contains your app code and stable runtime dependencies. In that mode:

- bake your entrypoint into the image
- omit `workspace`
- use an absolute `entrypoint`
- pass only runtime config and secrets via environment variables

Example:

```nix
jobs.serve = {
  image = "ghcr.io/acme/llm-inference:2026-03-19";
  workspace = "";
  entrypoint = "/opt/runpod/bin/serve.sh";
  ports = "22/tcp,8000/http";
  passEnvVars = [ "HF_TOKEN" ];
};
```

The repository now supports both patterns:

- uploaded workspace + relative entrypoint for fast iteration
- image-owned absolute entrypoint for prebuilt images

Each sample template is now positioned around that same progression:

- start with the checked-in workspace script while iterating
- bake that script and its stable dependencies into your image
- set `RUNPOD_IMAGE` plus `RUNPOD_IMAGE_ENTRYPOINT` locally to switch the template into image-owned mode

## Live validation notes

A real RunPod smoke test was completed with `templates/llm-inference` in `SIMULATE_ONLY=1` mode. The successful pod used:

- image `runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404`
- GPU `NVIDIA RTX A5000`

The uploaded workspace completed end-to-end and synced back:

- `status.json`
- `artifacts/connection.txt`
- `artifacts/launch-plan.txt`

For one-off commands from a template directory, pass credentials from the calling environment:

```bash
cd templates/llm-inference
RUNPOD_API_KEY=... RUNPOD_SSH_KEY_PATH=~/.ssh/id_ed25519 \
  devenv shell -- runpod-search --limit 3
```

Use a RunPod-ready image that keeps SSH available. During live testing, plain `docker.io/library/ubuntu:22.04` exited immediately, while the RunPod PyTorch image above worked reliably for SSH, bundle upload, and remote script launch.

The `llm-inference` template now defaults to that same RunPod PyTorch image and requests `ports = "22/tcp,8000/http"` so the uploaded workspace can start vLLM, expose an OpenAI-compatible API on port `8000`, and sync back probe artifacts such as `artifacts/models.json` and `artifacts/probe-response.json`.

A fresh end-to-end `runpod-submit-serve` run also completed successfully against the real inference template, returning a synced OpenAI-compatible probe response with assistant content `Hello! How can I assist you today?`.

The RunPod PyTorch base image is PEP 668 managed, so the inference template bootstraps `vllm` inside a workspace-local virtualenv (`.venv`) before starting the server. Expect the first cold start on a fresh pod to spend several minutes installing that runtime.

If you want to switch the `llm-inference` template to a prebuilt image without editing committed files, set `RUNPOD_IMAGE` and `RUNPOD_IMAGE_ENTRYPOINT` in `templates/llm-inference/devenv.local.nix`. When `RUNPOD_IMAGE_ENTRYPOINT` is non-empty, the template stops uploading `workspace/` and launches the image-owned entrypoint directly.

The repository includes `templates/llm-inference/image/Dockerfile`, `templates/lora-trainer/image/Dockerfile`, and `templates/video-finetuner/image/Dockerfile` as starting points for baking the sample entrypoints into your own images. Build them from the repository root so Docker can copy the template assets:

```bash
docker build -f templates/llm-inference/image/Dockerfile -t ghcr.io/acme/llm-inference:latest .
```

After adding image-first support, both launch modes were exercised live:

- image mode completed with no uploaded workspace using an absolute entrypoint baked into the image (`launchMode = "image"`)
- workspace mode still completed successfully with `SIMULATE_ONLY=1` (`launchMode = "workspace"`)

`runner.py` now sends an explicit `User-Agent` header to `api.runpod.io/graphql`. In live testing, Python's default `urllib` user agent was rejected by the RunPod edge with Cloudflare `error code: 1010`, while `curl` and `urllib` with an explicit user agent succeeded.

## License

MIT
