What this does

Exposes a running vLLM instance through REST endpoints that match the OpenAI Chat Completions and Completions API interface. Any client library built for the OpenAI API can target the local vLLM server with minimal configuration changes.

Steps

Start the vLLM server. The --host flag binds the listening interface; --port sets the TCP port.
```
vllm serve <model> --host 0.0.0.0 --port 8000
```
Expected output: Uvicorn running on http://0.0.0.0:8000.
Check server readiness. The /v1/models endpoint confirms the API is responding.
```
curl -s http://localhost:8000/v1/models | python -m json.tool
```
Expected output: a JSON object with a data array containing model metadata.

Send a chat completions request. The header and JSON shape mirror the OpenAI API exactly.

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model-id>",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "max_tokens": 20,
    "temperature": 0
  }'

Expected output: JSON with choices[0].message.content containing the model's response.

Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

curl -s -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"<model-id>","messages":[{"role":"user","content":"ping"}],"max_tokens":5}'
# Expected: a short text response (not an error)

Common failures

404 Not Found on /v1/chat/completions — Model identifier mismatch. Verify with curl http://localhost:8000/v1/models and use the exact string.
Connection refused — The server is not running or the port is blocked. Confirm with lsof -i :8000.
Invalid request error — Missing required fields such as max_tokens. Add "max_tokens": 20 to the payload.
model not found in response — The model was registered under a different name. Use the exact string from /v1/models.

How to run vLLM with an OpenAI-compatible API

What this does

Steps

Verification

Common failures

Related guides