What this does

Short-term memory holds the current conversation within the context window. Long-term memory persists important facts across sessions using a vector store, enabling the agent to recall user preferences and past interactions.

Steps

Implement short-term memory as a sliding window. Keep the most recent N messages.

from collections import deque

class ShortTermMemory:
    def __init__(self, max_messages: int = 10):
        self.messages = deque(maxlen=max_messages)

    def add(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})

    def get_context(self) -> list[dict]:
        return list(self.messages)

    def summarize(self, llm) -> str:
        text = "\n".join(m["content"] for m in self.messages)
        summary = llm.invoke(f"Summarize this conversation:\n{text}")
        return summary.content

Build long-term memory with a vector store. Store facts as embeddings.

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain.schema import Document

class LongTermMemory:
    def __init__(self, collection_name: str = "agent_memory"):
        self.embeddings = OllamaEmbeddings(model="nomic-embed-text")
        self.store = Chroma(
            embedding_function=self.embeddings,
            collection_name=collection_name
        )

    def remember(self, fact: str, metadata: dict = None):
        doc = Document(page_content=fact, metadata=metadata or {})
        self.store.add_documents([doc])

    def recall(self, query: str, k: int = 5) -> list[str]:
        docs = self.store.similarity_search(query, k=k)
        return [d.page_content for d in docs]

Decide what to store long-term. Use the LLM to extract salient facts.

def extract_facts(conversation: str, llm) -> list[str]:
    prompt = f"""Extract important facts from this conversation that should be remembered long-term.
Return each fact on a new line.

Conversation:
{conversation}"""
    response = llm.invoke(prompt)
    return [f.strip() for f in response.content.split("\n") if f.strip()]

Integrate both memory types into the agent loop.

class MemoryAugmentedAgent:
    def __init__(self, llm, tools, stm: ShortTermMemory, ltm: LongTermMemory):
        self.llm = llm
        self.tools = tools
        self.stm = stm
        self.ltm = ltm

    def run(self, user_input: str) -> str:
        # Retrieve relevant long-term memories
        memories = self.ltm.recall(user_input)
        memory_context = "\n".join(f"Past fact: {m}" for m in memories)

        # Build prompt with both memory types
        prompt = f"{memory_context}\n\nConversation:\n"
        for msg in self.stm.get_context():
            prompt += f"{msg['role']}: {msg['content']}\n"
        prompt += f"user: {user_input}"

        response = self.llm.invoke(prompt)

        # Update short-term memory
        self.stm.add("user", user_input)
        self.stm.add("assistant", response.content)

        # Periodically extract long-term facts
        if len(self.stm.messages) >= 5:
            facts = extract_facts(prompt, self.llm)
            for fact in facts:
                self.ltm.remember(fact)

        return response.content

Verification

python -c "
from collections import deque
stm = deque(maxlen=3)
stm.append('msg1')
stm.append('msg2')
stm.append('msg3')
stm.append('msg4')
print(len(stm))
# Expected: 3 (sliding window)
"

Common failures

Memory staleness. Long-term memory stores facts that become outdated. Add an expiration timestamp or allow the user to delete memories.
Context window overflow from short-term memory. Even a sliding window of 20 messages can overflow a small context. Store summaries instead of raw messages.
Fact extraction captures trivial details. The LLM stores "User said 'OK'" as a fact. Use a stricter extraction prompt that requires informational value.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

How to Use Vector Store as Agent Memory
How to Manage Agent Context Window Limits