Memory: ConversationBuffer — LangChain for Local AI (Chapter 9)

Memory components maintain state across multiple LLM calls within a session. LangChain provides several memory types: ConversationBufferMemory (raw message history), ConversationBufferWindowMemory (rolling window), ConversationEntityMemory (tracks subject-level entities), SummaryMemory (condensed summaries), and more. This chapter covers ConversationBufferMemory, which stores the full message list and passes it as a context window.

The pattern: create a memory object, attach it to a ConversationChain (or pass it manually to any chain), and call the chain multiple times. The message history accumulates:

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.2:3b", temperature=0.7)

# Initialize memory with return_messages=True for ChatMessage format
memory = ConversationBufferMemory(
    return_messages=True,
    memory_key="history",  # key used in prompt template
)

# ConversationChain wraps memory + chain logic
chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True,
)

# Turn 1
result1 = chain.invoke({"input": "My name is Alex and I work in finance."})
print(result1["response"])

# Turn 2
result2 = chain.invoke({"input": "What is my name and what field do I work in?"})
print(result2["response"])

# Turn 3
result3 = chain.invoke({"input": "Summarize everything I've told you."})
print(result3["response"])

With return_messages=True, memory stores HumanMessage and AIMessage objects rather than plain strings, which is the correct format for ChatPromptTemplate. With return_messages=False, it stores string tuples, suitable for PromptTemplate.

Inspect the memory state directly:

print(memory.buffer)
# [HumanMessage(content='My name is Alex and I work in finance.'),
#  AIMessage(content="Hi Alex! It's nice to meet someone from the finance...),
#  HumanMessage(content='What is my name and what field do I work in?'),
#  AIMessage(content='Your name is Alex and you work in the finance...)]

Save and load memory for session persistence:

import json

# Save to disk
memory.save_context(
    {"input": "My favorite color is teal."},
    {"output": "Teal is a great choice!"}
)

serialized = memory.buffer
with open("session_memory.json", "w") as f:
    # Convert message objects to serializable format
    json.dump([m.to_dict() for m in serialized], f)

# Load into a new memory instance
with open("session_memory.json") as f:
    loaded_messages = json.load(f)

new_memory = ConversationBufferMemory(
    return_messages=True,
    memory_key="history",
)
for msg_dict in loaded_messages:
    new_memory.chat_memory.add_message(
        # Reconstruct from dict
        HumanMessage(content=msg_dict["content"])
    )

If you need the full history as a string for a custom prompt (instead of passing message objects), call memory.load_memory_variables({}) which returns a dict with your memory_key:

vars = memory.load_memory_variables({})
print(vars["history"])
# Human: My name is Alex...
# AI: Hi Alex...

Common failure mode: context window overflow. ConversationBufferMemory stores every message indefinitely. A long conversation eventually exceeds the model's context window ( Llama 3.2 3B handles ~4,096 tokens; 70B handles ~128,000). When the context overflows, the model stops attending to older messages. The model's response quality degrades without an explicit error. Use ConversationBufferWindowMemory with k=10 to limit to the last 10 messages, or SummaryMemory to condense history into a summary string.

Example with ConversationBufferWindowMemory:

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(
    k=3,                          # keep only last 3 exchanges
    return_messages=True,
    memory_key="history",
)

# After 5+ turns, only the last 3 are in the memory buffer
chain = ConversationChain(llm=llm, memory=memory)

for i in range(6):
    result = chain.invoke({"input": f"This is exchange number {i}."})
    print(f"Turn {i}: buffer has {len(memory.buffer) // 2} exchanges")
# Turn 0: buffer has 1 exchanges
# Turn 5: buffer has 3 exchanges  (capped at k=3)

When manually adding messages to memory outside the chain (if you're calling llm.invoke() directly rather than through the chain), use memory.save_context():

memory.save_context(
    {"input": "User message here"},
    {"output": "Assistant response here"}
)

This is necessary because ConversationChain manages the save_context call internally; standalone LLMChain calls do not.