I want my AI conversations to stay private — what's the realistic local-first setup?

Reviewed May 15, 20262 min read
privacylocal-firstair-gappedclaude-migrationrag

The answer

One paragraph. No hedging beyond what the data actually warrants.

Local AI for privacy is real and shipping today — but you have to know which workloads it actually covers and which it doesn't.

The framing matters. "Local AI" doesn't just mean "running a model" — it means the full inference loop happens on your hardware, no network calls after setup. That distinction matters for privacy.

What stays local (genuinely):

  • The model weights (downloaded once, then loaded into VRAM)
  • Your prompts (never leave the box)
  • The model's output (rendered locally)
  • Your chat history (your filesystem, not a vendor account)
  • Embeddings of your documents (local vector store like ChromaDB or LanceDB)

What can still leak (without you noticing):

  • Telemetry from the runtime (Ollama sends nothing; some other apps do)
  • Update checks (most apps phone home for version checks)
  • Cloud-backed features (Open WebUI's web search needs a cloud search API)
  • Browser-extension AI assistants that route your text to a vendor

The realistic privacy-first stack:

Layer Pick Why
Runtime Ollama or llama.cpp Zero telemetry, source-buildable
Chat UI Jan, AnythingLLM, or PrivateGPT All open-source, all air-gappable
Coding agent Aider Terminal-native, no cloud dependency
Document RAG Khoj or PrivateGPT Local embedder + local vector index
Voice MacWhisper / Buzz Local Whisper, no cloud STT
Mobile Enchanted (iOS, talks to your home server) No vendor account, no chat upload

The migration playbook (3 steps):

  1. Set up Ollama + Jan on your daily workstation. 30 minutes. Pull Llama 3.1 8B Q4_K_M as your default model (5GB). You now have a private chat with no vendor account.

  2. Move your "ask AI about my docs" workflow to PrivateGPT or Khoj. Point it at the folders you'd otherwise paste into Claude/ChatGPT. Local embeddings + local generation = nothing leaves your machine.

  3. Move coding agent work to Aider + Qwen 2.5 Coder. If you have a 24GB GPU. If you don't, Aider against Qwen 2.5 Coder 7B is still meaningfully better than nothing — and it's all local.

What stays cloud (be honest):

  • One-shot complex reasoning that genuinely needs a frontier model (Claude Opus, GPT-5)
  • Vision tasks at high quality (open-weight vision models are catching up but not there yet)
  • 405B+ model capabilities (you can't run those locally without a datacenter)

The privacy reality check: if your threat model is "I don't want Anthropic / OpenAI / Google reading my prompts," local AI solves it completely for ~80% of typical use. If your threat model is "I want zero electronic trace of my AI use," that's a different problem (network isolation, file-system encryption, etc.) — local AI is necessary but not sufficient.

Where we got the numbers

Ollama telemetry policy: ollama.com docs + source code review. Jan license + air-gap design: janhq/jan README. The 'AI conversations never leave the box' threat model framing maps to what r/privacy threads typically ask for in 2026.

Found this via a forum search? Bookmark the URL — we update these pages as new data lands. Have a question that should live here? Open a GitHub issue.