What this does

Repeat penalty reduces the probability of tokens that have already been generated. This prevents the model from getting stuck in repetitive loops, a common issue in long generations.

Steps

Apply repeat penalty via API. Default is 1.0 (no penalty). Increase to 1.1-1.2 for mild correction.

curl -s http://localhost:11434/api/generate \
  -d '{"model": "llama3.2", "prompt": "List 20 things about dogs: 1.",
       "options": {"repeat_penalty": 1.2, "temperature": 0.7}, "stream": false}' \
  | jq -r '.response'

Use frequency_penalty and presence_penalty for finer control.

frequency_penalty: Reduces probability proportional to how often a token has appeared.
presence_penalty: Reduces probability if a token has appeared at all.

import requests
response = requests.post("http://localhost:11434/api/generate", json={
    "model": "llama3.2",
    "prompt": "Write a 500-word essay on trees.",
    "options": {
        "repeat_penalty": 1.15,
        "frequency_penalty": 0.3,
        "presence_penalty": 0.2,
        "temperature": 0.7
    }
})
print(response.json()["response"])

Create a Modelfile with anti-loop defaults.

FROM llama3.2
PARAMETER repeat_penalty 1.2
PARAMETER frequency_penalty 0.2
PARAMETER presence_penalty 0.1

ollama create anti-loop-llama -f Modelfile

Test with a loop-prone prompt.

curl -s http://localhost:11434/api/generate \
  -d '{"model": "anti-loop-llama", "prompt": "Tell me about AI: Artificial Intelligence is",
       "stream": false}' | jq -r '.response'

Expected: The output should naturally terminate without repeating the same phrases.

Verification

# Test with and without repeat penalty
curl -s ... -d '{"options":{"repeat_penalty":1.0}}' | jq -r '.response' > baseline.txt
curl -s ... -d '{"options":{"repeat_penalty":1.2}}' | jq -r '.response' > penalized.txt
# Expected: baseline.txt shows repetitive patterns; penalized.txt shows diverse vocabulary

Common failures

Too high repeat penalty causes incoherence: Values above 1.5 force unnatural word choices. Stay in 1.05-1.3 range.
Repeat penalty doesn't affect short responses: Looping is primarily an issue in long generations (>100 tokens). For short outputs, the penalty has little effect.
Penalty interacts with temperature: High temperature + high repeat penalty = unstable output. Balance both: low temp (0.5) + moderate penalty (1.15) is a safe combo.

How to configure repeat penalty to reduce looping responses

What this does

Steps

Verification

Common failures

Related guides