RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
BLK · PROMPTING HUB

Local LLM prompting kits

Tested system prompts, chat templates, tool-calling formats, and sampler defaults for every local LLM in the directory. Local models are pickier about prompt structure than cloud models — what works on Claude or GPT-5 often fails on Llama or Qwen. This is the catalog of what actually works, per model, with source attribution.

Models kitted
17
Families covered
6
Tool calling
11/17
Multimodal
2/17

Why local prompting isn’t the same as cloud prompting

Cloud models (GPT-5, Claude, Gemini Pro) are massive and forgiving. They tolerate vague system prompts, recover gracefully from schema-violating tool calls, and adapt to whatever format you throw at them. Local open-weight models — even good ones — don’t. Llama 3 70B will follow Llama 3 chat-template tokens correctly and drop coherence the moment you feed it ChatML. Qwen 3 expects its own /think toggle; DeepSeek R1 silently degrades if you add a system prompt at all.

Three things change every time you switch model family:

  • Chat template tokens. Llama 3 uses <|begin_of_text|> / <|start_header_id|>. Qwen and Phi-4 use ChatML’s <|im_start|>. Gemma uses <start_of_turn> with no native system role. Mistral uses [INST]...[/INST]. Wrong template = the model loses 20-50% of its instruction-following quality.
  • Tool-calling format. Hermes-style (Qwen, R1) uses <tool_call>{...}</tool_call> blocks. Llama 3 emits raw JSON in the assistant turn. Mistral is OpenAI compatible. Same-shape input, three different output formats.
  • Sampler defaults. Qwen ships with temperature 0.7, top_p 0.8, top_k 20. Mistral 3.2 wants 0.15 / 1.0 for tool calls. Phi-4 expects 0.7 / 0.95. Use the wrong defaults and you get either repetition loops or incoherent outputs.

Every kit below lists what the vendor model card actually specifies for that model. Where we’ve verified the behavior on our own hardware, the badge flips from blue “From model card” to green “Tested by runlocalai” with date and rig.

FAM · QWEN

Qwen

5 kits
Qwen 3 30B-A3B
30B params · Alibaba
Model card
Chat template
ChatML (Qwen3 variant)
Tool calling
hermes-style
Default temperature
0.7
Documented quirks
5
Read full kit →
Qwen 2.5 Coder 32B Instruct
32B params · Alibaba
Model card
Chat template
ChatML (Qwen 2.5 variant)
Tool calling
hermes-style
Default temperature
0.2
Documented quirks
5
Read full kit →
Qwen 3 32B
32B params · Alibaba
Model card
Chat template
ChatML (Qwen3 variant)
Tool calling
hermes-style
Default temperature
0.7
Documented quirks
5
Read full kit →
Qwen 3 8B
8B params · Alibaba
Model card
Chat template
ChatML (Qwen3 variant)
Tool calling
hermes-style
Default temperature
0.7
Documented quirks
5
Read full kit →
Qwen 3 14B
14B params · Alibaba
Model card
Chat template
ChatML (Qwen3 variant)
Tool calling
hermes-style
Default temperature
0.7
Documented quirks
5
Read full kit →
FAM · LLAMA

Llama

3 kits
Llama 3.1 8B Instruct
8B params · Meta
Model card
Chat template
Llama 3
Tool calling
json-function-calls
Default temperature
0.6
Documented quirks
5
Read full kit →
Llama 3.3 70B Instruct
70B params · Meta
Model card
Chat template
Llama 3
Tool calling
json-function-calls
Default temperature
0.6
Documented quirks
5
Read full kit →
Llama 3.3 8B Instruct
8B params · Meta
Model card
Chat template
Llama 3
Tool calling
json-function-calls
Default temperature
0.6
Documented quirks
5
Read full kit →
FAM · DEEPSEEK

DeepSeek

3 kits
DeepSeek R1 (671B reasoning)
671B params · DeepSeek
Model card
Chat template
DeepSeek (User/Assistant markers)
Tool calling
not supported
Default temperature
0.6
Documented quirks
5
Read full kit →
DeepSeek R1 Distill Llama 70B
70B params · DeepSeek
Model card
Chat template
Llama 3
Tool calling
not supported
Default temperature
0.6
Documented quirks
5
Read full kit →
DeepSeek R1 Distill Qwen 32B
32B params · DeepSeek
Model card
Chat template
ChatML (Qwen2.5)
Tool calling
not supported
Default temperature
0.6
Documented quirks
5
Read full kit →
FAM · MISTRAL

Mistral

2 kits
Mistral Small 3 24B
24B params · Mistral AI
Model card
Chat template
Mistral Instruct v3
Tool calling
openai-compatible
Default temperature
0.15
Documented quirks
5
Read full kit →
Mistral Small 3.2 24B
24B params · Mistral AI
Model card
Chat template
Mistral Instruct v7
Tool calling
openai-compatible
Default temperature
0.15
Documented quirks
5
Read full kit →
FAM · PHI

Phi

2 kits
Phi-4 14B
14B params · Microsoft
Model card
Chat template
ChatML
Tool calling
not supported
Default temperature
0.7
Documented quirks
5
Read full kit →
Phi-4 Reasoning 14B
14B params · Microsoft
Model card
Chat template
ChatML
Tool calling
not supported
Default temperature
0.7
Documented quirks
5
Read full kit →
FAM · GEMMA

Gemma

2 kits
Gemma 3 27B
27B params · Google
Model card
Chat template
Gemma 3
Tool calling
prompted-convention
Default temperature
1
Documented quirks
5
Read full kit →
Trendyol LLM Asure 12B
11.8B params · Trendyol
✓ Tested
Chat template
Gemma 3
Tool calling
not supported
Default temperature
0
Documented quirks
3
Read full kit →
DON’T SEE YOUR MODEL?

Coverage grows as we test locally.

Every kit on this page is sourced either from the vendor’s official model card (blue badge) or verified on our own hardware (green badge). We refuse to invent system prompts or fabricate quirks. If you want a model added, tell us which one — popular requests move up the seed queue. The full model directory lives at /models; models without a kit yet just render the base spec page.