Local LLM Prompting Guide — system prompts, chat templates, tool calling

Cloud models (GPT-5, Claude, Gemini Pro) are massive and forgiving. They tolerate vague system prompts, recover gracefully from schema-violating tool calls, and adapt to whatever format you throw at them. Local open-weight models — even good ones — don’t. Llama 3 70B will follow Llama 3 chat-template tokens correctly and drop coherence the moment you feed it ChatML. Qwen 3 expects its own /think toggle; DeepSeek R1 silently degrades if you add a system prompt at all.

Three things change every time you switch model family:

Chat template tokens. Llama 3 uses <|begin_of_text|> / <|start_header_id|>. Qwen and Phi-4 use ChatML’s <|im_start|>. Gemma uses <start_of_turn> with no native system role. Mistral uses [INST]...[/INST]. Wrong template = the model loses 20-50% of its instruction-following quality.
Tool-calling format. Hermes-style (Qwen, R1) uses <tool_call>{...}</tool_call> blocks. Llama 3 emits raw JSON in the assistant turn. Mistral is OpenAI compatible. Same-shape input, three different output formats.
Sampler defaults. Qwen ships with temperature 0.7, top_p 0.8, top_k 20. Mistral 3.2 wants 0.15 / 1.0 for tool calls. Phi-4 expects 0.7 / 0.95. Use the wrong defaults and you get either repetition loops or incoherent outputs.

Every kit below lists what the vendor model card actually specifies for that model. Where we’ve verified the behavior on our own hardware, the badge flips from blue “From model card” to green “Tested by runlocalai” with date and rig.

Local LLM prompting kits

Why local prompting isn’t the same as cloud prompting

Qwen

Llama

DeepSeek

Mistral

Phi

Gemma

Coverage grows as we test locally.