Build a fully offline coding stack (May 2026)
An autonomous coding agent that runs entirely on a workstation with no outbound network egress. Pre-staged models, audited dependency chain, network-monitored verification — the stack that holds up to real air-gap audits.
- 01ToolCoding agent (offline-verified)openhands
OpenHands has the cleanest offline path of the autonomous-agent leaders. Docker container + filesystem MCP + provider-abstracted memory all work without internet once dependencies are pre-staged. OpenClaw works too but its faster release cadence makes the dependency-pinning audit harder.
- 02ToolInference enginevllm
vLLM with pre-pulled Docker image + pre-staged HuggingFace cache runs entirely offline. Continuous batching matters because the agent makes 5-15 tool calls per task. The OpenAI-compatible API plugs into OpenHands with no adapter.
- 03ModelCoding modelqwen-2.5-coder-32b-instruct
Qwen 2.5 Coder 32B AWQ-INT4 fits 24GB with 32K context — strongest open coding model in the 32B class as of May 2026. Apache 2.0 license: usable in any environment without licensing surprises. Pre-stage the AWQ weights locally before egress lockdown.
- 04ToolMCP filesystem (strict allowlist)mcp-server-filesystem
Reference Anthropic filesystem MCP. Strict directory allowlisting limits the agent's blast radius — non-optional for offline deployments where the network can't catch a destructive mistake.
- 05ToolMCP git (read-side only)mcp-server-git
Read-side git operations give the agent commit history awareness. Combined with filesystem MCP, full repo grounding without network access.
- 06ToolMemory (local-only via LanceDB)mem0
Mem0 with LanceDB backend — no hosted memory service in the loop. All consolidation runs on the local LLM (vLLM endpoint); no third-party API calls. Cross-session memory works fully offline.
- 07HardwareGPUrtx-4090
RTX 4090 24GB is the workstation default. Same hardware constraint as /stacks/local-coding-agent; the offline pivot is software + network, not GPU choice.
Why fully-offline matters
The general /stacks/local-coding-agent recipe is private — all data stays on your machine — but it isn't fully offline. Docker pulls images on first run; HuggingFace caches download models; npm-install pulls MCP server packages. Once you've done that initial setup, the stack runs locally — but the audit trail of “where did these dependencies come from” matters in regulated environments.
Fully offline means a different threat model: no outbound network calls during operation, with verifiable audit trail of every dependency. The pivot from local-coding-agent to fully-offline is in the dependency staging + network-egress verification, not the model + agent choice.
Industries where this matters: regulated finance (SOX compliance), healthcare (HIPAA-protected codebases), defense / aerospace (classified networks), legal (privileged discovery). Less common but real: export-controlled-research environments, internal company policy, contractual data-residency requirements.
Pre-staging workflow (CRITICAL)
The single most important step. Pre-stage all dependencies on a network-connected machine, then transfer to the air-gapped target. Never download dependencies on the air-gapped machine, even “just once.”
# On a network-connected staging machine, pull everything you need:
# 1. Pull the vLLM Docker image
docker pull vllm/vllm-openai:v0.17.1
docker save vllm/vllm-openai:v0.17.1 | gzip > vllm-image.tar.gz
# 2. Pull the model weights
hf download Qwen/Qwen2.5-Coder-32B-Instruct-AWQ \
--local-dir ./qwen-coder-32b
# 3. Pull MCP server packages (npm offline cache)
mkdir mcp-cache
cd mcp-cache
npm pack @modelcontextprotocol/server-filesystem
npm pack @modelcontextprotocol/server-git
cd ..
# 4. Pull OpenHands Docker image
docker pull ghcr.io/all-hands-ai/openhands:latest
docker save ghcr.io/all-hands-ai/openhands:latest | gzip > openhands-image.tar.gz
# 5. Pull Open WebUI (frontend)
docker pull ghcr.io/open-webui/open-webui:latest
docker save ghcr.io/open-webui/open-webui:latest | gzip > openwebui-image.tar.gz
# 6. Generate dependency manifest with checksums
sha256sum *.tar.gz qwen-coder-32b/*.safetensors mcp-cache/*.tgz \
> dependency-manifest.txtTransfer the resulting bundle (Docker images + model weights + MCP packages + manifest) to the air-gapped machine via USB / one-way file transfer / approved data-diode. Verify checksums on arrival.
Step-by-step setup on the air-gapped machine
1. Block egress before anything else
# Set the iptables firewall to block all outbound traffic except
# loopback and the specific local subnets you allow:
sudo iptables -P OUTPUT DROP
sudo iptables -A OUTPUT -o lo -j ACCEPT
sudo iptables -A OUTPUT -d 192.168.0.0/16 -j ACCEPT # local LAN
sudo iptables -A OUTPUT -d 10.0.0.0/8 -j ACCEPT # private RFC1918
# Verify with packet capture during a normal task — see Network
# egress verification section below.
sudo iptables -L OUTPUT -v -n2. Load Docker images from the staging bundle
# Load images locally from the pre-staged tarballs
gunzip -c vllm-image.tar.gz | docker load
gunzip -c openhands-image.tar.gz | docker load
gunzip -c openwebui-image.tar.gz | docker load
# Verify they loaded
docker images | grep -E "vllm|openhands|openwebui"3. Bring up vLLM with pre-staged model
# Mount the pre-staged model directory; vLLM uses it directly
docker run --gpus all -d --name vllm \
-p 127.0.0.1:8000:8000 \
-v /opt/models/qwen-coder-32b:/model \
--restart unless-stopped \
vllm/vllm-openai:v0.17.1 \
--model /model \
--gpu-memory-utilization 0.85 \
--max-model-len 32768 \
--enable-chunked-prefill
# Bind to 127.0.0.1 ONLY — never expose to the LAN unless you've
# audited what hits the endpoint.4. Wire OpenHands with offline MCP servers
# Install MCP servers from pre-staged npm cache (NOT npm registry):
mkdir -p ~/.npm-offline
cp mcp-cache/*.tgz ~/.npm-offline/
npm install --offline -g \
~/.npm-offline/modelcontextprotocol-server-filesystem-*.tgz \
~/.npm-offline/modelcontextprotocol-server-git-*.tgz
# OpenHands config — same as /stacks/local-coding-agent but verify
# every URL is local
[llm]
model = "openai//model"
api_base = "http://localhost:8000/v1"
api_key = "anything"
[mcp]
servers = [
{ command = "mcp-server-filesystem", args = ["/home/you/projects/active"] },
{ command = "mcp-server-git", args = ["--repository", "/home/you/projects/active"] }
]
[memory]
provider = "mem0"
config = { vector_store = { provider = "lancedb", path = "/home/you/.mem0/lancedb" } }Network egress verification
The audit step. Every offline-claim deployment should have a repeatable verification that produces a network capture during normal operation. The capture should be empty (no packets to non-loopback destinations).
# Run packet capture during a smoke-test query
sudo tcpdump -i any -w session-capture.pcap \
'not host 127.0.0.1 and not host ::1 and not net 192.168.0.0/16' &
TCPDUMP_PID=$!
# Run a representative agent task
openhands run --task "Find the bug in tests/auth.test.ts and fix it"
# Stop capture
sudo kill $TCPDUMP_PID
# Inspect the capture — should be empty or near-empty
tcpdump -r session-capture.pcap -nn | head -20
# Expected: no packets, or only DHCP / ARP / multicast-DNS
# (which are local-network only). If you see DNS lookups for
# external domains, telemetry to api-something.com, or HuggingFace
# Hub URLs — the stack is leaking. Find the leak before declaring
# the deployment audit-clean.Failure modes you'll hit
- Docker DNS lookups for image-pull on first run. Even with images pre-loaded, Docker may attempt registry lookups for tag verification. Set
--pull neveron everydocker runand configure Docker daemon offline mode. - HuggingFace cache phone-home. Some transformers code attempts to verify model hashes via the HF Hub even when the cache is local. Set
HF_HUB_OFFLINE=1andTRANSFORMERS_OFFLINE=1environment variables in every container that touches the model. - npm registry lookups during MCP server start. Some MCP servers verify their own version on startup. Pin
npm config set registry http://localhostat the OS level to make this fail loudly rather than reach out. - Time synchronization drift. Air-gapped machines often run NTP against an internal time source; without it, the system clock drifts. SSL handshakes inside the local network can break. Run a local NTP server.
- Container update prompts. Some containers display update-available banners that imply a network check happened. Investigate every banner; some are local comparisons against bundled metadata, some are real network calls.
- Mem0 consolidation pass leaks. Mem0 with cloud LLM provider for consolidation = data leaving the network. Always configure Mem0 with the local vLLM endpoint for both inference AND consolidation.
- VS Code extension auto-update. If you're using Cline or Continue, IDE extensions will attempt auto-update. Set
extensions.autoUpdate: falsein VS Code settings.
Variations and alternatives
Apple Silicon variation. Replace vLLM + RTX 4090 with MLX-LM + M3 Max. Same offline discipline applies — pre-stage MLX models, verify no network egress during operation.
RAG-instead variation. If the workflow is document-search rather than coding, see /stacks/offline-rag-workstation. Same air-gap discipline; different application surface.
OpenClaw alternative. Possible but more difficult to audit because OpenClaw moves faster than OpenHands. The dependency-pinning surface is bigger. OpenHands is the more conservative pick for offline deployments where audit time is finite.
Aider variation. If your workflow is surgical-edit-only (not autonomous), Aider is simpler to audit (smaller dependency surface). Trade autonomous-task quality for fewer audit surfaces.
Who should avoid this stack
- Anyone whose privacy needs are softer than stated. If “cloud-friendly with reasonable controls” is acceptable, the cloud-API path or the general /stacks/local-coding-agent is faster to set up and operationally cheaper. This stack costs you ergonomics for a guarantee you may not actually need.
- Anyone unable to allocate audit cycles. Offline stacks need periodic re-verification — dependencies update, OS patches need staging, configurations drift. Without monthly audit cycles, the stack stays nominally offline but slowly accretes unverified dependencies.
- Anyone who needs reasoning models. Their dependency surfaces (sometimes including external CDNs for tokenizer files) make offline staging harder. Achievable but adds significant audit overhead.
- Anyone needing IDE integration. VS Code + Cline / Continue + Copilot all phone home in various ways. Disabling all of it produces a degraded developer experience. Acceptable for some teams; deal-breaker for others.
Going deeper
- /stacks/local-coding-agent — the local-but-not-fully-offline counterpart. Same agent + runtime; different network discipline.
- /stacks/offline-rag-workstation — air-gapped RAG, with the same discipline but for document workflows.
- /systems/agent-execution-systems — the architectural depth on what the agent is actually doing.
- OpenHands operational review — the full L1.5 review covering offline-friendly architecture choices.