RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Large language models / System Prompt
Large language models

System Prompt

A system prompt is the initial instruction or context prepended to a conversation with an LLM. It sets the model's behavior, persona, output format, or constraints before any user input. Operators use it to steer the model without retraining—for example, telling a model to 'answer concisely' or 'act as a code reviewer.' The system prompt is part of the context window, so its length consumes VRAM and reduces available space for user messages. In local inference, a long system prompt can force a smaller usable context or require offloading, slowing generation.

Practical example

A local operator running Llama 3.1 8B on an RTX 3060 12GB with 4K context might use a system prompt like 'You are a helpful assistant that answers in one sentence.' This consumes ~200 tokens of context. If the operator instead uses a 2K-token system prompt with detailed instructions, the remaining context for user messages shrinks to 2K tokens, potentially truncating a long conversation or requiring a larger context window that exceeds VRAM.

Workflow example

In Ollama, the system prompt is set via the SYSTEM directive in a Modelfile: SYSTEM "You are a concise assistant." or passed at runtime with ollama run model --system "...". In LM Studio, it's entered in the 'System Prompt' field in the chat UI. In llama.cpp, it's prepended to the prompt string before inference. Operators often tune the system prompt to reduce verbose output or enforce JSON formatting, directly affecting tokens-per-second and VRAM usage.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →