Privacy - What Stays Yours — What is Local AI — And Why It Matters (Chapter 6)

The Privacy Problem with Cloud AI

When you use cloud AI, your prompts travel to a server, get processed, and a response comes back. What happens to your data after that varies by provider:

OpenAI:

Free tier: your data may be used for training unless you opt out
Paid tier: your data is not used for training (by default)
30-day retention of API calls

Anthropic:

No training on user data by default for any tier
90-day retention of conversation data (configurable)

Google:

Gemini consumer: your data may be used for training (opt-out available)
Gemini Enterprise: no training on user data

The policies are complex, change over time, and the fine print matters. The fundamental issue: even if a company doesn't train on your data, they still see it. It's in their logs. Their employees could potentially access it. Their servers are subject to subpoenas in certain jurisdictions.

For casual, non-sensitive use, this may not matter. For sensitive work, it's a real risk.

What Local AI Actually Guarantees

When you run a model locally, the data flow is different:

Your input → Processed by local model → Response
             ↳ Never leaves your machine

What stays private:

Everything you type
Your conversation history
Any documents you paste or upload
System prompts you use
Any files the model processes

What doesn't automatically stay private:

If you use an interface that sends telemetry, that telemetry may include metadata (not content)
If you connect local AI to other services (APIs, databases), those connections have their own privacy implications
If the model file itself is compromised (the file on your disk, not what goes through it)

The key point: local AI gives you strong guarantees about the primary data flow. You control the infrastructure. There's no third party.

Real Privacy Scenarios

Scenario 1: Drafting a sensitive email Cloud: Your email content goes to the provider's servers. It may be logged, retained, or used for training. Local: The model processes it on your machine. Nothing leaves.

Scenario 2: Analyzing a legal document Cloud: Legal content (potentially attorney-client privileged) goes to a third party. Local: Document stays on your machine. Only you see the results.

Scenario 3: Code review with proprietary codebase Cloud: Your proprietary code is sent to external servers. Local: Code stays local. Risk of code leakage is eliminated.

Scenario 4: Mental health or medical questions Cloud: Personal health information leaves your control. Local: Questions and responses stay on your device.

Interface Privacy Considerations

Not all local AI interfaces are equally private:

Ollama (command line): Minimal telemetry, runs fully local. Most private option.

Jan: Local-first, can run fully offline. Privacy-preserving by default.

LM Studio: Similar to Jan, runs locally. Check settings for telemetry.

Anything with "cloud sync": May send data elsewhere. Read the settings.

The good news: you can audit the code of open-source local AI tools. You can run network monitors to verify no data leaves. You have control.