I want my AI conversations to stay private — what's the realistic local-first setup?

Reviewed May 15, 20262 min read

privacylocal-firstair-gappedclaude-migrationrag

The answer

One paragraph. No hedging beyond what the data actually warrants.

Local AI for privacy is real and shipping today — but you have to know which workloads it actually covers and which it doesn't.

The framing matters. "Local AI" doesn't just mean "running a model" — it means the full inference loop happens on your hardware, no network calls after setup. That distinction matters for privacy.

What stays local (genuinely):

The model weights (downloaded once, then loaded into VRAM)
Your prompts (never leave the box)
The model's output (rendered locally)
Your chat history (your filesystem, not a vendor account)
Embeddings of your documents (local vector store like ChromaDB or LanceDB)

What can still leak (without you noticing):

Telemetry from the runtime (Ollama sends nothing; some other apps do)
Update checks (most apps phone home for version checks)
Cloud-backed features (Open WebUI's web search needs a cloud search API)
Browser-extension AI assistants that route your text to a vendor

The realistic privacy-first stack:

Layer	Pick	Why
Runtime	Ollama or llama.cpp	Zero telemetry, source-buildable
Chat UI	Jan, AnythingLLM, or PrivateGPT	All open-source, all air-gappable
Coding agent	Aider	Terminal-native, no cloud dependency
Document RAG	Khoj or PrivateGPT	Local embedder + local vector index
Voice	MacWhisper / Buzz	Local Whisper, no cloud STT
Mobile	Enchanted (iOS, talks to your home server)	No vendor account, no chat upload

The migration playbook (3 steps):

Set up Ollama + Jan on your daily workstation. 30 minutes. Pull Llama 3.1 8B Q4_K_M as your default model (5GB). You now have a private chat with no vendor account.
Move your "ask AI about my docs" workflow to PrivateGPT or Khoj. Point it at the folders you'd otherwise paste into Claude/ChatGPT. Local embeddings + local generation = nothing leaves your machine.
Move coding agent work to Aider + Qwen 2.5 Coder. If you have a 24GB GPU. If you don't, Aider against Qwen 2.5 Coder 7B is still meaningfully better than nothing — and it's all local.

What stays cloud (be honest):

One-shot complex reasoning that genuinely needs a frontier model (Claude Opus, GPT-5)
Vision tasks at high quality (open-weight vision models are catching up but not there yet)
405B+ model capabilities (you can't run those locally without a datacenter)

The privacy reality check: if your threat model is "I don't want Anthropic / OpenAI / Google reading my prompts," local AI solves it completely for ~80% of typical use. If your threat model is "I want zero electronic trace of my AI use," that's a different problem (network isolation, file-system encryption, etc.) — local AI is necessary but not sufficient.

Explore the numbers for your specific stack

Open the fully-offline apps directory →

Every app in our catalog with privacy_posture='fully-offline'. Filter by category to find your privacy-first chat / RAG / voice / coding stack.

Where we got the numbers

Ollama telemetry policy: ollama.com docs + source code review. Jan license + air-gap design: janhq/jan README. The 'AI conversations never leave the box' threat model framing maps to what r/privacy threads typically ask for in 2026.

Also see

When cloud pricing pushes you local →

The cost-driven migration path (same destination as the privacy-driven one, different reason for going).

PrivateGPT — air-gappable RAG →

The OG offline-RAG project. The Khoj alternative when you want maximum auditable open-source.

Jan — desktop chat with no vendor account →

Single-binary desktop AI. Curated model catalog. AGPL-3.0.

Build a privacy-first stack →

Hardware + runtime + model picks tuned to fully-offline operation.

I want my AI conversations to stay private — what's the realistic local-first setup?

The answer

Explore the numbers for your specific stack

Where we got the numbers

Also see

Other questions in this thread