Eight inputs — use case, budget, scale, privacy posture — and we compose the full rig: GPU + runtime + 1-3 model picks + first-run workflow + cost rollup + ready-to-paste install script. Three tiers side-by-side so the upgrade path stays visible.
Every recommendation references rule-based scoring; measured tok/s carries a confidence chip when surfaced. We don't invent numbers — when the data isn't there we say so.
URL updates as you change fields — share or bookmark a result. Showing the balanced default — change any field to refine.
One step down on budget. What you give up; what you keep.
Your inputs, our recommendation. Read the full card below.
One step up on budget. What you'd gain; what it costs.
Default pick for most operators: one binary, automatic GPU detection, OpenAI-compatible HTTP API at `:11434`. Sufficient for solo + small-team workloads.
curl -fsSL https://ollama.com/install.sh | shollama pull <model>:<tag>http://localhost:11434Strongest general-purpose model at 14B in 2026. Multilingual tokenizer (1.7× more efficient on Turkish/Asian languages than Llama). Reasoning mode available.
Microsoft's reasoning-focused 14B trained on heavy synthetic data. Beats Llama 3.1 8B on math/code benchmarks. Weaker creative writing.
ollama pull qwen3-14bollama run qwen3-14b — type a test prompt.http://localhost:11434 (Ollama) or :8000 (vLLM).#!/usr/bin/env bash
# RunLocalAI stack installer — generated by /stack-builder
# Use case: coding
# Hardware: NVIDIA GeForce RTX 3090
# Runtime: Ollama
set -e
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull the primary model
ollama pull qwen3-14b
# 3. Sanity check
ollama run qwen3-14b "Hello — respond in one short sentence to confirm you're running."
# 4. Verify the HTTP API
curl http://localhost:11434/api/tagsJust the hardware-pick question, with side-by-side compare, price/perf scatter, and score breakdown per dimension.
Reverse direction: I have this hardware — what fits? Use this to validate the recommendation against your actual rig.
Drill into the model picks: Q4_K_M vs Q5_K_M vs Q8 on your specific VRAM, with quality curve + VRAM fit visualization.
Tune every assumption: utilization, electricity rate, cloud equivalent rate, amortization horizon.
18 hand-curated stack recipes for specific outcomes (coding agent, offline RAG, dual-3090, Mac cluster, iPhone, etc.)