Pulling structured data (entities, dates, prices, relationships) from unstructured text. Strong instruction-following + JSON-mode capability matters.
ollama pull llama3.1:8b (~5 GB).import ollama
text = "John Smith (john@example.com) is the CEO of Acme Corp. His office is at 123 Main St, San Francisco, CA 94105. Phone: (415) 555-0123."
resp = ollama.chat(model="llama3.1:8b", messages=[{
"role": "user",
"content": f"Extract from this text into JSON: name, email, job_title, company, street_address, city, state, zip, phone.\n\nText: {text}\n\nOutput ONLY valid JSON, no explanation:"
}], format="json")
print(resp["message"]["content"])
# {"name": "John Smith", "email": "john@example.com", "job_title": "CEO", "company": "Acme Corp", "street_address": "123 Main St", "city": "San Francisco", "state": "CA", "zip": "94105", "phone": "(415) 555-0123"}
format="json" constrains the output to valid JSON.Structured extraction is CPU-friendly for batch processing. Llama 3.1 8B runs at 50-80 tok/s on a used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb) — extracts entities from 100+ documents/minute. For a business automating invoice data extraction, contract clause identification, or email parsing: $400 handles thousands of documents/day. Pair with Ryzen 5 5600 + 16 GB DDR4 + 512 GB NVMe. Total: ~$360-405. For CPU-only: llama.cpp with 7B models at 20-40 tok/s — slower but adequate for nightly batch jobs.
Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs Qwen 2.5 32B or Llama 3.3 70B Q4 for complex extraction tasks — multi-entity, nested JSON, relational extraction from legal/financial documents. For enterprise document processing (100K+ documents/day): vLLM serves the model with continuous batching, processing 500+ documents/minute. For grammar-constrained generation (zero malformed JSON): llama.cpp with JSON schema grammars ensures production-grade reliability. Total: ~$1,800-2,200. For maximum throughput: dual RTX 3090 with vLLM serves extraction API for an entire organization.
The mistake: Using a 7B LLM for extracting standard entities (names, dates, addresses) from 100K documents, when spaCy + a transformer model would do it 100× faster with 99% accuracy. Why it fails: LLMs are generalists — they can extract anything, but slowly. Named entity recognition (NER) for standard types (PERSON, ORG, DATE, GPE) is a solved problem — spaCy's en_core_web_trf model achieves 95%+ F1 on these entities at 10,000+ documents/second on CPU. An LLM achieves maybe 97% at 10 documents/second. The fix: Use the right tool for the entity type. Standard entities (PERSON, ORG, DATE, LOC, MONEY, PERCENT): spaCy or GLiNER. Custom entities ("product_defect_type", "contract_renewal_clause"): LLMs with JSON mode. Hybrid pipeline: spaCy extracts standard entities (90% of fields) → LLM extracts custom entities (10% of fields) → merge. This gives you 99% speed + LLM flexibility. Don't use a sledgehammer when a scalpel is faster and more precise.
Browse all tools for runtimes that fit this workload.
Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.
The errors most operators hit when running data extraction locally. Each links to a diagnose+fix walkthrough.
Verify your specific hardware can handle data extraction before committing money.