What this does

Launches vLLM's OpenAI-compatible API server inside a Docker container with GPU acceleration, providing high-throughput inference via a familiar REST interface.

Steps

Start the vLLM container with GPU passthrough and model volume.

docker run -d \
  --gpus all \
  -v /path/to/models:/models \
  -p 8000:8000 \
  --shm-size=1g \
  --name vllm-server \
  vllm/vllm-openai:latest \
  --model /models/llama-model \
  --gpu-memory-utilization 0.9

Expected output: Container starts and logs display vLLM server initialization.

Confirm the API server is responding.
```
curl http://localhost:8000/health
```
Expected output: {"status":"OK"} indicating the server is ready.

Send a completion request to the OpenAI-compatible endpoint.

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/models/llama-model","prompt":"What is machine learning?","max_tokens":128}'

Expected output: JSON response with generated text and usage statistics.

Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

curl -s http://localhost:8000/v1/models
# Expected: model identifier string

Common failures

Shared memory size too small — Increase --shm-size to 4g or 8g if errors occur during heavy loads.
GPU memory exhausted at startup — Lower --gpu-memory-utilization to 0.7 or use a quantized model variant.
Model path not accessible inside container — Verify with docker exec vllm-server ls /models.
Port 8000 conflict — Use -p 8001:8000 and update the client URL accordingly.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

How to run vLLM in Docker

What this does

Steps

Verification

Common failures

Operator checkpoint

Operator checkpoint

Related guides