AI-assisted formal theorem proving in Lean, Coq, Isabelle. DeepSeek-Prover, Lean Copilot, AlphaProof-lineage.
Lowest-friction path to a working setup.
Operator-grade recommendation.
Failure modes operators see in the wild.
git clone https://github.com/leanprover-community/mathlib4 (Lean's mathematical library).theorem add_comm (a b : Nat) : a + b = b + a := by { ... } — place cursor after by, Lean Copilot suggests the induction + rewrite steps.Theorem proving is CPU-bound and RAM-light. Lean 4 + mathlib4 runs on any $300 laptop (Ryzen 5/Intel i5 + 16 GB RAM). The proofs themselves compile in milliseconds. For AI-assisted proving on a budget: use a cloud API (DeepSeek API, $0.50 per 1M tokens) for proof suggestions, or run a distilled reasoning model (DeepSeek R1 Distill 7B) on a used GTX 1060 6 GB ($60). The LLM is a suggestion engine — the proof checker (Lean kernel) is the authority and it's computationally trivial. $300 + free cloud API tier is genuinely viable.
Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs DeepSeek Prover V2 locally — the strongest open-weight theorem proving model. Generates full Lean proofs for undergraduate-to-graduate-level mathematics. Pair with Ryzen 7 7700X + 64 GB DDR5 + 1TB NVMe. Total: ~$1,800-2,200. For research-grade proving (IMO/IMO-level problems): the field is still dominated by closed-source frontier models. But DeepSeek Prover V2 + Lean Copilot on an RTX 3090 handles most undergraduate pure math problems. Formal verification (not proof discovery) runs on CPU alone.
The mistake: Expecting an LLM to "auto-prove" a theorem without learning Lean or Coq syntax first. Why it fails: LLMs generate proof text, but you need to understand the proof assistant's error messages to iterate. The model says rw [add_comm] — if Lean rejects it, you can't fix it without knowing what rw does. Theorem proving with AI is a collaboration, not automation. The fix: Spend 2-4 hours learning basic Lean syntax (Natural Number Game is the canonical intro — lean-lang.org/nng). Learn what intro, apply, rw, induction, cases do. Then the LLM becomes a powerful autocomplete for proofs rather than a black box you can't debug. The LLM's job is suggesting steps, not guaranteeing correctness.
Browse all tools for runtimes that fit this workload.
Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.
The errors most operators hit when running theorem proving locally. Each links to a diagnose+fix walkthrough.
Verify your specific hardware can handle theorem proving before committing money.