Data Extraction

Pulling structured data (entities, dates, prices, relationships) from unstructured text. Strong instruction-following + JSON-mode capability matters.

Setup walkthrough

Install Ollama → ollama pull llama3.1:8b (~5 GB).
For JSON-mode extraction (structured output from unstructured text):

import ollama
text = "John Smith (john@example.com) is the CEO of Acme Corp. His office is at 123 Main St, San Francisco, CA 94105. Phone: (415) 555-0123."
resp = ollama.chat(model="llama3.1:8b", messages=[{
    "role": "user",
    "content": f"Extract from this text into JSON: name, email, job_title, company, street_address, city, state, zip, phone.\n\nText: {text}\n\nOutput ONLY valid JSON, no explanation:"
}], format="json")
print(resp["message"]["content"])
# {"name": "John Smith", "email": "john@example.com", "job_title": "CEO", "company": "Acme Corp", "street_address": "123 Main St", "city": "San Francisco", "state": "CA", "zip": "94105", "phone": "(415) 555-0123"}

First extraction in 2-5 seconds. Ollama's format="json" constrains the output to valid JSON.
For grammar-constrained generation (guaranteed valid JSON): use llama.cpp with a JSON schema grammar. llama-cpp ensures every token conforms to the schema — impossible to produce malformed JSON.
For high-throughput extraction (1000s of documents): batch with vLLM — 100+ documents/minute on 12 GB GPU.
For NER (named entity recognition): spaCy + transformer models (en_core_web_trf) are faster and more accurate than LLMs for standard entity types (PERSON, ORG, DATE). Reserve LLMs for custom entity types.

The cheap setup

Structured extraction is CPU-friendly for batch processing. Llama 3.1 8B runs at 50-80 tok/s on a used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb) — extracts entities from 100+ documents/minute. For a business automating invoice data extraction, contract clause identification, or email parsing: $400 handles thousands of documents/day. Pair with Ryzen 5 5600 + 16 GB DDR4 + 512 GB NVMe. Total: ~$360-405. For CPU-only: llama.cpp with 7B models at 20-40 tok/s — slower but adequate for nightly batch jobs.

The serious setup

Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs Qwen 2.5 32B or Llama 3.3 70B Q4 for complex extraction tasks — multi-entity, nested JSON, relational extraction from legal/financial documents. For enterprise document processing (100K+ documents/day): vLLM serves the model with continuous batching, processing 500+ documents/minute. For grammar-constrained generation (zero malformed JSON): llama.cpp with JSON schema grammars ensures production-grade reliability. Total: ~$1,800-2,200. For maximum throughput: dual RTX 3090 with vLLM serves extraction API for an entire organization.

Common beginner mistake

The mistake: Using a 7B LLM for extracting standard entities (names, dates, addresses) from 100K documents, when spaCy + a transformer model would do it 100× faster with 99% accuracy. Why it fails: LLMs are generalists — they can extract anything, but slowly. Named entity recognition (NER) for standard types (PERSON, ORG, DATE, GPE) is a solved problem — spaCy's en_core_web_trf model achieves 95%+ F1 on these entities at 10,000+ documents/second on CPU. An LLM achieves maybe 97% at 10 documents/second. The fix: Use the right tool for the entity type. Standard entities (PERSON, ORG, DATE, LOC, MONEY, PERCENT): spaCy or GLiNER. Custom entities ("product_defect_type", "contract_renewal_clause"): LLMs with JSON mode. Hybrid pipeline: spaCy extracts standard entities (90% of fields) → LLM extracts custom entities (10% of fields) → merge. This gives you 99% speed + LLM flexibility. Don't use a sledgehammer when a scalpel is faster and more precise.

Recommended setup for data extraction

Recommended hardware

Best GPU for local AI →

All workloads ranked across VRAM tiers.

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Budget build

AI PC under $1,000 →

Best GPU for this task

Best GPU for local AI →

Reality check

Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.

Common mistakes

Buying for spec-sheet VRAM without modeling KV cache + activation overhead
Underestimating quantization quality loss below Q4
Skipping flash-attention support (real perf gap on long context)
Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)

What breaks first

The errors most operators hit when running data extraction locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle data extraction before committing money.

Featured models

Qwen 3 32B

Related tasks

Structured Output Generation

Buyer guides

Compare hardware

Troubleshooting

Specialized buyer guides

Updated 2026 roundup

Best free local AI tools (2026) →