Theorem Proving
AI-assisted formal theorem proving in Lean, Coq, Isabelle. DeepSeek-Prover, Lean Copilot, AlphaProof-lineage.
Capability notes
If you just want to try this
Lowest-friction path to a working setup.
For production deployment
Operator-grade recommendation.
What breaks
Failure modes operators see in the wild.
Hardware guidance
Runtime guidance
Setup walkthrough
- Install Lean 4: follow the installation guide at lean-lang.org (VS Code extension + elan toolchain manager). Takes ~10 minutes.
git clone https://github.com/leanprover-community/mathlib4(Lean's mathematical library).- For AI-assisted proving: install Lean Copilot (VS Code extension) — uses a local or remote LLM to suggest proof steps.
- Write a simple theorem:
theorem add_comm (a b : Nat) : a + b = b + a := by { ... }— place cursor afterby, Lean Copilot suggests the induction + rewrite steps. - First AI-assisted proof in <30 minutes of setup — you need basic Lean syntax knowledge (1-2 hours of learning).
- For stronger proving models: DeepSeek Prover V2 can be run locally via Ollama/VLLM and called from Lean via the Lean REPL + LLM bridge.
- Alternative: Coq + CoqPilot (VS Code extension) for Coq-based formal verification.
The cheap setup
Theorem proving is CPU-bound and RAM-light. Lean 4 + mathlib4 runs on any $300 laptop (Ryzen 5/Intel i5 + 16 GB RAM). The proofs themselves compile in milliseconds. For AI-assisted proving on a budget: use a cloud API (DeepSeek API, $0.50 per 1M tokens) for proof suggestions, or run a distilled reasoning model (DeepSeek R1 Distill 7B) on a used GTX 1060 6 GB ($60). The LLM is a suggestion engine — the proof checker (Lean kernel) is the authority and it's computationally trivial. $300 + free cloud API tier is genuinely viable.
The serious setup
Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs DeepSeek Prover V2 locally — the strongest open-weight theorem proving model. Generates full Lean proofs for undergraduate-to-graduate-level mathematics. Pair with Ryzen 7 7700X + 64 GB DDR5 + 1TB NVMe. Total: ~$1,800-2,200. For research-grade proving (IMO/IMO-level problems): the field is still dominated by closed-source frontier models. But DeepSeek Prover V2 + Lean Copilot on an RTX 3090 handles most undergraduate pure math problems. Formal verification (not proof discovery) runs on CPU alone.
Common beginner mistake
The mistake: Expecting an LLM to "auto-prove" a theorem without learning Lean or Coq syntax first. Why it fails: LLMs generate proof text, but you need to understand the proof assistant's error messages to iterate. The model says rw [add_comm] — if Lean rejects it, you can't fix it without knowing what rw does. Theorem proving with AI is a collaboration, not automation. The fix: Spend 2-4 hours learning basic Lean syntax (Natural Number Game is the canonical intro — lean-lang.org/nng). Learn what intro, apply, rw, induction, cases do. Then the LLM becomes a powerful autocomplete for proofs rather than a black box you can't debug. The LLM's job is suggesting steps, not guaranteeing correctness.
Recommended setup for theorem proving
Browse all tools for runtimes that fit this workload.
Reality check
Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.
Common mistakes
- Buying for spec-sheet VRAM without modeling KV cache + activation overhead
- Underestimating quantization quality loss below Q4
- Skipping flash-attention support (real perf gap on long context)
- Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)
What breaks first
The errors most operators hit when running theorem proving locally. Each links to a diagnose+fix walkthrough.
Before you buy
Verify your specific hardware can handle theorem proving before committing money.