Vision
chart understanding
graph extraction
data viz parsing

Chart & Graph Reading

Extracting data from charts, graphs, plots, and infographics. Specialized capability for vision-language models — distinct from raw OCR.

Setup walkthrough

  1. Install Ollamaollama pull minicpm-v (~8 GB — strong multimodal model with chart understanding).
  2. Save a chart image (bar chart, line graph, pie chart) as chart.png.
  3. Python script:
import ollama
with open("chart.png", "rb") as f:
    img_data = f.read()
resp = ollama.chat(model="minicpm-v", messages=[{
    "role": "user", "content": "Extract all data points from this chart. What is the X axis? Y axis? Give me the approximate values for each data series.",
    "images": [img_data]
}])
print(resp["message"]["content"])
  1. First chart reading in 5-15 seconds. For better accuracy: ollama pull qwen2.5-vl:7b — stronger chart understanding, 128K context, handles multi-chart documents.
  2. For programmatic chart data extraction: pip install deplot (Google's DePlot — chart-to-table model, SOTA for simple charts).

The cheap setup

Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs MiniCPM-V 8B at 5-10 seconds per chart. Qwen2-VL 7B at 5-15 seconds per chart. Handles bar charts, line plots, scatter plots, and simple pie charts with reasonable accuracy. Pair with Ryzen 5 5600 + 16 GB DDR4 + 512 GB NVMe. Total: ~$360-405. Chart reading is viable at this budget — the models run comfortably in 12 GB. For DePlot (pixel-to-table mapping without LLM overhead): it runs on CPU at 1-3 seconds per chart.

The serious setup

Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Qwen2-VL 72B at 10-20 seconds per chart (50 GB, needs quantized). The 72B variant handles complex charts (multi-axis, stacked bars, logarithmic scales) with near-human accuracy. For batch chart processing (100s of charts/day): the 7B-8B models at 2-5 seconds per chart via vLLM are the throughput play. Total: ~$1,800-2,200. Chart reading benefits more from model quality than GPU speed — a 72B model on 3090 is more accurate than a 7B model on 4090.

Common beginner mistake

The mistake: Using a text-only LLM (with OCR pre-processing) to "read" charts by extracting axis labels and guessing values. Why it fails: Text-only LLMs can't see spatial relationships — they can't understand that the bar height maps to the Y-axis scale, or that a scatter plot's trend line means correlation. OCR extracts "Sales: 100, 200, 300" but can't tell which bar is which. The fix: Use a vision-language model (MiniCPM-V, Qwen2-VL, or DePlot). These models see the actual chart image and understand visual encodings (position, length, area, color). They correctly extract data because they process the chart as pixels, not text. If you need structured output (CSV/JSON), use DePlot — it's trained specifically for chart-to-table conversion.

Recommended setup for chart & graph reading

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Reality check

Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.

Common mistakes

  • Buying for spec-sheet VRAM without modeling KV cache + activation overhead
  • Underestimating quantization quality loss below Q4
  • Skipping flash-attention support (real perf gap on long context)
  • Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)

What breaks first

The errors most operators hit when running chart & graph reading locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle chart & graph reading before committing money.

Specialized buyer guides
Updated 2026 roundup