HOW-TO · INF
How to configure repeat penalty to reduce looping responses
PREREQUISITES
Ollama installed
What this does
Repeat penalty reduces the probability of tokens that have already been generated. This prevents the model from getting stuck in repetitive loops, a common issue in long generations.
Steps
Apply repeat penalty via API. Default is 1.0 (no penalty). Increase to 1.1-1.2 for mild correction.
curl -s http://localhost:11434/api/generate \ -d '{"model": "llama3.2", "prompt": "List 20 things about dogs: 1.", "options": {"repeat_penalty": 1.2, "temperature": 0.7}, "stream": false}' \ | jq -r '.response'Use frequency_penalty and presence_penalty for finer control.
frequency_penalty: Reduces probability proportional to how often a token has appeared.presence_penalty: Reduces probability if a token has appeared at all.
import requests response = requests.post("http://localhost:11434/api/generate", json={ "model": "llama3.2", "prompt": "Write a 500-word essay on trees.", "options": { "repeat_penalty": 1.15, "frequency_penalty": 0.3, "presence_penalty": 0.2, "temperature": 0.7 } }) print(response.json()["response"])Create a Modelfile with anti-loop defaults.
FROM llama3.2 PARAMETER repeat_penalty 1.2 PARAMETER frequency_penalty 0.2 PARAMETER presence_penalty 0.1ollama create anti-loop-llama -f ModelfileTest with a loop-prone prompt.
curl -s http://localhost:11434/api/generate \ -d '{"model": "anti-loop-llama", "prompt": "Tell me about AI: Artificial Intelligence is", "stream": false}' | jq -r '.response'Expected: The output should naturally terminate without repeating the same phrases.
Verification
# Test with and without repeat penalty
curl -s ... -d '{"options":{"repeat_penalty":1.0}}' | jq -r '.response' > baseline.txt
curl -s ... -d '{"options":{"repeat_penalty":1.2}}' | jq -r '.response' > penalized.txt
# Expected: baseline.txt shows repetitive patterns; penalized.txt shows diverse vocabulary
Common failures
- Too high repeat penalty causes incoherence: Values above 1.5 force unnatural word choices. Stay in 1.05-1.3 range.
- Repeat penalty doesn't affect short responses: Looping is primarily an issue in long generations (>100 tokens). For short outputs, the penalty has little effect.
- Penalty interacts with temperature: High temperature + high repeat penalty = unstable output. Balance both: low temp (0.5) + moderate penalty (1.15) is a safe combo.
Related guides
RELATED GUIDES