RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to configure repeat penalty to reduce looping responses
HOW-TO · INF

How to configure repeat penalty to reduce looping responses

intermediate·10 min·By Fredoline Eruo
PREREQUISITES

Ollama installed

What this does

Repeat penalty reduces the probability of tokens that have already been generated. This prevents the model from getting stuck in repetitive loops, a common issue in long generations.

Steps

  1. Apply repeat penalty via API. Default is 1.0 (no penalty). Increase to 1.1-1.2 for mild correction.

    curl -s http://localhost:11434/api/generate \
      -d '{"model": "llama3.2", "prompt": "List 20 things about dogs: 1.",
           "options": {"repeat_penalty": 1.2, "temperature": 0.7}, "stream": false}' \
      | jq -r '.response'
    
  2. Use frequency_penalty and presence_penalty for finer control.

    • frequency_penalty: Reduces probability proportional to how often a token has appeared.
    • presence_penalty: Reduces probability if a token has appeared at all.
    import requests
    response = requests.post("http://localhost:11434/api/generate", json={
        "model": "llama3.2",
        "prompt": "Write a 500-word essay on trees.",
        "options": {
            "repeat_penalty": 1.15,
            "frequency_penalty": 0.3,
            "presence_penalty": 0.2,
            "temperature": 0.7
        }
    })
    print(response.json()["response"])
    
  3. Create a Modelfile with anti-loop defaults.

    FROM llama3.2
    PARAMETER repeat_penalty 1.2
    PARAMETER frequency_penalty 0.2
    PARAMETER presence_penalty 0.1
    
    ollama create anti-loop-llama -f Modelfile
    
  4. Test with a loop-prone prompt.

    curl -s http://localhost:11434/api/generate \
      -d '{"model": "anti-loop-llama", "prompt": "Tell me about AI: Artificial Intelligence is",
           "stream": false}' | jq -r '.response'
    

    Expected: The output should naturally terminate without repeating the same phrases.

Verification

# Test with and without repeat penalty
curl -s ... -d '{"options":{"repeat_penalty":1.0}}' | jq -r '.response' > baseline.txt
curl -s ... -d '{"options":{"repeat_penalty":1.2}}' | jq -r '.response' > penalized.txt
# Expected: baseline.txt shows repetitive patterns; penalized.txt shows diverse vocabulary

Common failures

  • Too high repeat penalty causes incoherence: Values above 1.5 force unnatural word choices. Stay in 1.05-1.3 range.
  • Repeat penalty doesn't affect short responses: Looping is primarily an issue in long generations (>100 tokens). For short outputs, the penalty has little effect.
  • Penalty interacts with temperature: High temperature + high repeat penalty = unstable output. Balance both: low temp (0.5) + moderate penalty (1.15) is a safe combo.

Related guides

  • How to set temperature parameters for creative output
  • How to set temperature to zero for deterministic responses
RELATED GUIDES
INF
How to set temperature to zero for deterministic responses
INF
How to set temperature parameters for creative output
← All how-to guidesCourses →