Mem0 (agent memory API)
Drop-in memory layer for LLM agents. Vector + graph memory variants (Mem0g) — the graph variant builds a directed labeled knowledge graph alongside the vector store, with conflict detection on contradictory facts. Leads the 2026 agent-memory benchmarks at 68.4% LLM Score on multi-hop questions. Works with any LLM, including local Ollama models.
Overview
Drop-in memory layer for LLM agents. Vector + graph memory variants (Mem0g) — the graph variant builds a directed labeled knowledge graph alongside the vector store, with conflict detection on contradictory facts. Leads the 2026 agent-memory benchmarks at 68.4% LLM Score on multi-hop questions. Works with any LLM, including local Ollama models.
Setup guidance
Install via pip: pip install mem0ai. Requires Python 3.10+. Mem0 is a memory layer for LLM applications — it stores, retrieves, and updates personalized user memories without building the agent infrastructure yourself. Quick start: from mem0 import Memory; m = Memory(); m.add("I like pizza and live in New York", user_id="alice"); results = m.search("Where does Alice live?", user_id="alice"). The Memory class auto-configures with an in-memory vector store for development. For production: set MEM0_API_KEY to use the managed cloud service, or configure a local vector store (Qdrant, Chroma) in config.yaml. Mem0 auto-extracts memories from conversation messages: messages = [{"role": "user", "content": "I'm a vegetarian"}, {"role": "assistant", "content": "Got it, I'll remember that"}]; m.add(messages, user_id="alice") — it extracts the vegetarian preference automatically. Serve as a REST API: from mem0 import Memory; from fastapi import FastAPI wrap endpoints around m.add(), m.search(), m.get_all(). First run: ~30 seconds for package install and model download (default: all-MiniLM-L6-v2 for embeddings, GPT-4o-mini for memory extraction). Verify: run the search example above — it returns relevant memories. Time-to-first-memory: ~5 seconds including embedding computation.
Workload fit
Best for: adding user-level memory to LLM chatbots and applications without building memory infrastructure, personalization features where the LLM should remember user preferences across sessions, customer support bots that recall user history and context, AI companions that learn about the user over time, any application where "the chat should remember what the user said last week" is a product requirement. Not suited for: document-level RAG over files and databases (use LlamaIndex), full autonomous agents with memory-driven behavior (use Letta), applications requiring memory without API call overhead for extraction (build direct vector store + embedding integration), strict data sovereignty requirements (default extraction model is OpenAI-hosted), applications where memory should be graph-structured rather than text-embedded.
Alternatives
Use Mem0 when you need memory as a service — add user_id to your API calls and Mem0 handles storage, retrieval, deduplication, and memory update without you building a memory infrastructure. It's the lightest way to add user-level memory to any LLM application. Switch to Letta when you need a full agent framework with memory as one component — Letta's OS-inspired memory paging is more sophisticated but more opinionated. Use Zep for an alternative memory service with built-in conversation summarization and graph memory. Use LlamaIndex when memory is for document retrieval (RAG) not user preferences — Mem0 is user-centric memory, LlamaIndex is document-centric. Build your own memory layer with a vector DB when you need maximum control and minimal dependency. Mem0's strength: the extraction LLM automatically identifies what's worth remembering from conversations — you don't write memory extraction logic. Its weakness: the extraction model uses an API call (OpenAI by default), adding latency and cost to every .add() call.
Troubleshooting + when to switch
Problem: Memory().add() hangs or fails silently. Fix: Mem0's default .add() sends messages to OpenAI for memory extraction (defaults to gpt-4o-mini). If OPENAI_API_KEY is not set, the call fails silently — set export OPENAI_API_KEY=sk-.... Or switch to a local model for extraction: configure "custom_fact_extraction_model" in the Memory config to use a local provider. Problem: Memory search returns irrelevant or empty results. Fix: Mem0's search uses semantic similarity. If memories are too generic, the embedding distance may not distinguish them. Check that filters={"user_id": "alice"} is applied — without user_id filtering, results mix across users. Adjust threshold=0.3 (lower = more results, higher = more relevant) in .search(). Problem: Duplicate or contradictory memories accumulate. Fix: Mem0's deduplication checks for near-duplicate memories at .add() time based on embedding similarity. If contradictory memories enter (e.g., "I live in New York" then "I live in Boston"), Mem0 doesn't auto-resolve — it stores both. Use m.update(memory_id, "I live in Boston") to manually update. For auto-resolution, implement a periodic memory reconciliation pipeline on your side.
Stack & relationships
How Mem0 (agent memory API) relates to other entries in the catalog — recommended pairings, alternatives, dependencies, and edges to avoid. Each edge carries a one-line operator note from our editorial team.
Recommended stack
- Pairs withOpenHands
The default memory pairing for OpenHands. 20 lines of config; works out of the box. The /stacks/local-coding-agent stack uses this pairing.
- Pairs withClaude Code
Mem0 hooks into Claude Code's MCP layer for cross-session memory. The integration is community-maintained; works but expect occasional config churn.
Works with
- Works withLanceDB
Mem0's default vector backend. LanceDB's embedded architecture pairs naturally with Mem0's single-process design — no additional service to firewall.
- Works withQdrant
Mem0's production-tier vector backend. Switch from LanceDB to Qdrant when memory store grows past ~500K vectors per agent.
Alternatives
- Competes withLetta (memory framework)
Mem0 is drop-in agent memory; Letta is OS-style explicit memory management. Pick Mem0 for fast wiring; Letta when you need to reason about memory state explicitly.
- Competes withZep (memory platform)
Mem0 emphasises drop-in API; Zep emphasises temporal knowledge-graph memory. Different mental models — pick by whether you want graph traversal or vector retrieval.
- Alternative toMCP Memory Server
MCP Memory is JSON-on-disk knowledge-graph memory — entry-tier. Mem0 is a richer drop-in API. Pick MCP Memory for trivial setup; Mem0 for production-grade memory.
- Alternative toLetta (memory framework)
Letta is OS-style explicit memory management (paging, archival, working memory split); Mem0 is drop-in vector memory. Pick Letta when you need deterministic memory behavior; Mem0 when you want fast wiring.
- Alternative toZep (memory platform)
Zep's temporal-graph approach handles 'what did Bob decide three sessions ago and why' better than Mem0's flat vector retrieval. Trade slower lookup for stronger multi-hop reasoning.
- Alternative toLetta (memory framework)
Different abstractions for the same need. Mem0: drop-in API with implicit memory. Letta: explicit OS-like memory hierarchy. The right choice depends on whether you want to control memory state or just have it work.
Featured in these stacks
The L3 execution stacks that pick this tool as a recommended component, with the one-line note explaining the role it plays in each.
- Stack · L3·Workstation tier·Role: Persistent memory (codebase context across sessions)Build a local coding-agent stack (May 2026)
Mem0 over Letta or Zep for this stack: dropping a memory layer into OpenHands takes 20 lines of config; Letta's OS-style explicit memory management is overkill for a single-user coding agent; Zep's temporal knowledge graph is strong but slower to wire.
- Stack · L3·Workstation tier·Role: Episodic + semantic memory layerBuild a memory-enabled local agent stack (May 2026)
Mem0 over Letta for the default memory pick: drop-in API, less ceremony, faster to wire. Letta wins when you need OS-style explicit memory management (paging in/out memory blocks for long-horizon tasks) — promote to Letta only when you've outgrown Mem0's memory model.
- Stack · L3·Workstation tier·Role: Memory (local-only via LanceDB)Build a fully offline coding stack (May 2026)
Mem0 with LanceDB backend — no hosted memory service in the loop. All consolidation runs on the local LLM (vLLM endpoint); no third-party API calls. Cross-session memory works fully offline.
Pros
- Drop-in API — minutes to integrate
- Graph memory (Mem0g) leads 2026 benchmarks
- Conflict detection on contradictory facts
- Works with local LLMs
Cons
- Cloud tier required for production scale
- Graph extraction is LLM-cost heavy
- Less control than Letta's explicit OS-style approach
Compatibility
| Operating systems | macOS Linux Windows |
| GPU backends | n/a |
| License | Open source · free (OSS) + managed cloud tiers |
Runtime health
Operator-grade signals on how actively Mem0 (agent memory API) is being maintained, how fresh its measurements are, and what failure classes operators have flagged. Every label below is anchored to a real date or count — we never infer maintainer activity we can't show.
Release cadence
Derived from the most recent editorial signal on this row.
8 days since last refresh · source: lastUpdated
Benchmark freshness
How recent the editorial measurements on this runtime are.
No editorial benchmarks for this runtime yet.
Community reproduction
Submissions that match an editorial measurement on similar hardware.
No community reproductions on file yet.
Ecosystem stability
Editorial rating from RunLocalAI — qualitative, not measured.
Get Mem0 (agent memory API)
Frequently asked
Is Mem0 (agent memory API) free?
What operating systems does Mem0 (agent memory API) support?
Does Mem0 (agent memory API) need a GPU?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.
Related — keep moving
Verify Mem0 (agent memory API) runs on your specific hardware before committing money.