Template: llm-inference
Model: Qwen/Qwen2.5-0.5B-Instruct
Port: 8000

This template now defaults to a real vLLM startup and probe request.
Set SIMULATE_ONLY=1 to skip the server launch and validate only the RunPod module wiring.

OpenAI-compatible checks from inside the pod:
  curl http://127.0.0.1:8000/v1/models
  curl http://127.0.0.1:8000/v1/chat/completions \
    -H 'content-type: application/json' \
    -d '{"model":"Qwen/Qwen2.5-0.5B-Instruct","messages":[{"role":"user","content":"Hello from RunPod"}]}'