RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Advanced NLP with Local Models
  6. /Ch. 3
Advanced NLP with Local Models

03. NER Prompting vs Fine-Tuning

Chapter 3 of 18 · 15 min
KEY INSIGHT

Begin with prompting for NER tasks where schema is unstable or entity types require frequent modification. Switch to fine-tuning only after schema stabilization and when inference volume justifies training investment.

The decision between prompting-based NER and fine-tuning an LLM involves tradeoffs across cost, latency, accuracy, and maintenance overhead. Each approach suits different operational contexts.

Prompting minimizes implementation complexity. No training data is required beyond prompt refinement, and schema changes happen through prompt modification rather than model retraining. This flexibility comes at a cost—latency increases because full context must be processed during every inference call. Prompts also consume context window space, reducing available input length for actual text.

Fine-tuning produces specialized models optimized for specific entity types and output formats. Once trained, inference requires no prompt template overhead, reducing latency significantly. For high-volume NER pipelines processing thousands of documents per minute, fine-tuned models often provide better cost-per-inference economics.

from llamafactory import LlamaFactory

config = {
    "model_name": "llama3:8b",
    "dataset": "ner_dataset",
    "template": "ner",
    "output_dir": "./ner_finetuned"
}

factory = LlamaFactory()
model = factory.get_model("LLaMA3-NER-finetuned")

# Fine-tuning configuration
train_config = {
    "batch_size": 4,
    "learning_rate": 2e-4,
    "num_epochs": 3,
    "warmup_ratio": 0.1
}

Training data requirements for fine-tuning depend on entity type complexity and base model size. Effective fine-tuning typically requires 500-2000 annotated examples per entity type. Data quality matters more than quantity; consistent annotation guidelines produce better results than large noisy datasets.

Evaluation methodologies differ between approaches. Prompting allows rapid A/B testing of instruction variations on held-out examples. Fine-tuning evaluation requires monitoring validation metrics throughout training to detect overfitting before convergence.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Implement both prompting and fine-tuning pipelines for the same entity schema. Measure inference latency, annotation cost for training data, and accuracy metrics across three domain variations.

← Chapter 2
Named Entity Recognition
Chapter 4 →
Relation Extraction