What this does

Measures end-to-end latency from HTTP request dispatch to complete response receipt using command-line timing tools and API metadata. After this guide a reproducible wall-clock benchmark and tokens-per-second metric will be available for any model on the current hardware.

Steps

Send a request and measure total wall-clock time. Captures end-to-end request duration using the time command.
```
time curl -s http://localhost:11434/api/generate -d '{
  "model": "llama3:q4_K_M",
  "prompt": "Explain quantum entanglement in one sentence.",
  "stream": false
}' | jq .
```
Expected output: JSON response body followed by real time showing total elapsed seconds.
Parse timing fields from the API response. The Ollama API returns eval_count (tokens generated) and eval_duration (nanoseconds spent generating).
```
curl -s http://localhost:11434/api/generate -d '{
  "model": "llama3:q4_K_M",
  "prompt": "Explain quantum entanglement in one sentence.",
  "stream": false
}' | jq '{eval_count, eval_duration}'
```
Expected output: JSON object with numeric values for tokens and duration.
Calculate tokens per second from these fields. Divides eval_count by eval_duration after converting nanoseconds to seconds.
```
curl -s http://localhost:11434/api/generate -d '{"model":"llama3:q4_K_M","prompt":"Count from one to ten.","stream":false}' | jq '.eval_count / (.eval_duration / 1e9)'
```
Expected output: A floating-point number representing tokens per second.

Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

time curl -s http://localhost:11434/api/generate -d '{"model":"llama3:q4_K_M","prompt":"Count from one to five.","stream":false}' | jq .
# Expected: JSON response with "response" field and wall-clock time displayed

Common failures

connection refused - Ollama service is not running or URL is wrong; start with ollama serve.
empty response body - Model name is incorrect or request format is invalid; check JSON payload keys.
jq command not found - Install jq via package manager or parse JSON with Python instead.
stream mode missing timing fields - Set "stream": false for benchmark runs to get eval_count and eval_duration.
high variance across runs - Cold-start effects and system load contribute to outliers; run three iterations and discard the first.

How to benchmark model response time using the Ollama API

What this does

Steps

Verification

Common failures

Related guides