System Prompt

A system prompt is the initial instruction or context prepended to a conversation with an LLM. It sets the model's behavior, persona, output format, or constraints before any user input. Operators use it to steer the model without retraining—for example, telling a model to 'answer concisely' or 'act as a code reviewer.' The system prompt is part of the context window, so its length consumes VRAM and reduces available space for user messages. In local inference, a long system prompt can force a smaller usable context or require offloading, slowing generation.

A local operator running Llama 3.1 8B on an RTX 3060 12GB with 4K context might use a system prompt like 'You are a helpful assistant that answers in one sentence.' This consumes ~200 tokens of context. If the operator instead uses a 2K-token system prompt with detailed instructions, the remaining context for user messages shrinks to 2K tokens, potentially truncating a long conversation or requiring a larger context window that exceeds VRAM.

In Ollama, the system prompt is set via the SYSTEM directive in a Modelfile: SYSTEM "You are a concise assistant." or passed at runtime with ollama run model --system "...". In LM Studio, it's entered in the 'System Prompt' field in the chat UI. In llama.cpp, it's prepended to the prompt string before inference. Operators often tune the system prompt to reduce verbose output or enforce JSON formatting, directly affecting tokens-per-second and VRAM usage.

Reviewed by Fredoline Eruo. See our editorial policy.

When it doesn't work

Practical example

Workflow example