06. Privacy - What Stays Yours
The Privacy Problem with Cloud AI
When you use cloud AI, your prompts travel to a server, get processed, and a response comes back. What happens to your data after that varies by provider:
OpenAI:
- Free tier: your data may be used for training unless you opt out
- Paid tier: your data is not used for training (by default)
- 30-day retention of API calls
Anthropic:
- No training on user data by default for any tier
- 90-day retention of conversation data (configurable)
Google:
- Gemini consumer: your data may be used for training (opt-out available)
- Gemini Enterprise: no training on user data
The policies are complex, change over time, and the fine print matters. The fundamental issue: even if a company doesn't train on your data, they still see it. It's in their logs. Their employees could potentially access it. Their servers are subject to subpoenas in certain jurisdictions.
For casual, non-sensitive use, this may not matter. For sensitive work, it's a real risk.
What Local AI Actually Guarantees
When you run a model locally, the data flow is different:
Your input → Processed by local model → Response
↳ Never leaves your machine
What stays private:
- Everything you type
- Your conversation history
- Any documents you paste or upload
- System prompts you use
- Any files the model processes
What doesn't automatically stay private:
- If you use an interface that sends telemetry, that telemetry may include metadata (not content)
- If you connect local AI to other services (APIs, databases), those connections have their own privacy implications
- If the model file itself is compromised (the file on your disk, not what goes through it)
The key point: local AI gives you strong guarantees about the primary data flow. You control the infrastructure. There's no third party.
Real Privacy Scenarios
Scenario 1: Drafting a sensitive email Cloud: Your email content goes to the provider's servers. It may be logged, retained, or used for training. Local: The model processes it on your machine. Nothing leaves.
Scenario 2: Analyzing a legal document Cloud: Legal content (potentially attorney-client privileged) goes to a third party. Local: Document stays on your machine. Only you see the results.
Scenario 3: Code review with proprietary codebase Cloud: Your proprietary code is sent to external servers. Local: Code stays local. Risk of code leakage is eliminated.
Scenario 4: Mental health or medical questions Cloud: Personal health information leaves your control. Local: Questions and responses stay on your device.
Interface Privacy Considerations
Not all local AI interfaces are equally private:
Ollama (command line): Minimal telemetry, runs fully local. Most private option.
Jan: Local-first, can run fully offline. Privacy-preserving by default.
LM Studio: Similar to Jan, runs locally. Check settings for telemetry.
Anything with "cloud sync": May send data elsewhere. Read the settings.
The good news: you can audit the code of open-source local AI tools. You can run network monitors to verify no data leaves. You have control.
Check the privacy policy of your current cloud AI provider. Specifically look for: (1) whether they train on your data, (2) how long they retain conversation data, and (3) whether employees can access your data. Write down the answers—you might be surprised what you find.