RunLocalAI Public API

JSON endpoints for the model directory, hardware specs, tool reviews, and benchmark data. Free tier: 1,000 calls per month with email signup. Build on top of us — every integration becomes a backlink, and the data improves as we run more benchmarks.

Quick start

Three steps from zero to your first response.

1. Get a free API key

POST your name, email, and intended use case. Returns your key once — save it.

curl -X POST https://runlocalai.co/api/v1/keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Your Name",
    "email": "you@example.com",
    "use_case": "Building a hardware compatibility checker for our internal devs"
  }'

2. Make a request

curl https://runlocalai.co/api/v1/models \
  -H "Authorization: Bearer rla_xxx..."

3. Read the rate-limit headers

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Tier: free

Endpoints

GET

/api/v1/models

List all published open-weight models.

Query params

family — llama, qwen, mistral, phi, gemma, deepseek, etc.
min_params_b / max_params_b — bracket by parameter count
limit — default 100, max 200

curl 'https://runlocalai.co/api/v1/models?family=qwen&min_params_b=7&limit=20' \
  -H "Authorization: Bearer $RLA_KEY"

GET

/api/v1/models/{slug}

Single model with full architecture metadata + benchmarks.

curl https://runlocalai.co/api/v1/models/llama-3.1-8b-instruct \
  -H "Authorization: Bearer $RLA_KEY"

Includes per-variant quantization sizes, KV-cache architecture (num_kv_heads, head_dim, etc.), and every benchmark we have for this model.

GET

/api/v1/hardware

List all GPUs, SoCs, and laptops in the database.

Query params

vendor — nvidia, amd, intel, apple, qualcomm
type — gpu, cpu, apu, soc, laptop, desktop
min_vram_gb — filter by minimum VRAM

curl 'https://runlocalai.co/api/v1/hardware?vendor=nvidia&min_vram_gb=16' \
  -H "Authorization: Bearer $RLA_KEY"

Returns memory bandwidth, FP16 TFLOPS, backend support, current street price.

GET

/api/v1/tools

Runners, GUIs, agents — the local-AI tooling landscape.

curl 'https://runlocalai.co/api/v1/tools?category=runner' \
  -H "Authorization: Bearer $RLA_KEY"

Categories: runner, gui, server, orchestrator, finetuner, quantizer, agent, ide.

GET

/api/v1/benchmarks

Tokens-per-second measurements with full provenance.

Query params

model, hardware, tool — filter by entity slug
verified=true or owner_only=true — only owner-run benchmarks
limit — default 100, max 500

curl 'https://runlocalai.co/api/v1/benchmarks?hardware=rtx-5080&owner_only=true' \
  -H "Authorization: Bearer $RLA_KEY"

Each benchmark includes source ('owner' / 'community' / 'official'), source URL when community-sourced, and run date.

Rate limits & tiers

Free

1,000 calls/month · email signup · all read endpoints

Pro

coming soon

100,000 calls/month · webhooks for new benchmarks · CSV bulk export · $19/month

Enterprise

Custom quota · white-label data licensing · email hello@runlocalai.co

Quota resets on the first of each month UTC. Burst rate limit is 10 requests per second. Responses are cached at the CDN edge for 5 minutes.

Attribution

We don't require attribution to use the API, but we appreciate it. A simple link works:

Data: <a href="https://runlocalai.co">RunLocalAI</a>

For research papers, please cite the page URL of the relevant model/hardware/benchmark and include the access date.

Errors

401 — Missing or invalid API key
403 — Key revoked
404 — Entity not found by slug
429 — Monthly quota exhausted (resets first of next month UTC)
500 — Database or upstream issue. Email hello@runlocalai.co with the request ID.

Building something with this API? Tell us — we'll feature good integrations on the homepage.