Template: llm-inference Model: Qwen/Qwen2.5-0.5B-Instruct Port: 8000 This template now defaults to a real vLLM startup and probe request. Set SIMULATE_ONLY=1 to skip the server launch and validate only the RunPod module wiring. OpenAI-compatible checks from inside the pod: curl http://127.0.0.1:8000/v1/models curl http://127.0.0.1:8000/v1/chat/completions \ -H 'content-type: application/json' \ -d '{"model":"Qwen/Qwen2.5-0.5B-Instruct","messages":[{"role":"user","content":"Hello from RunPod"}]}'