09. Memory: ConversationBuffer
Memory components maintain state across multiple LLM calls within a session. LangChain provides several memory types: ConversationBufferMemory (raw message history), ConversationBufferWindowMemory (rolling window), ConversationEntityMemory (tracks subject-level entities), SummaryMemory (condensed summaries), and more. This chapter covers ConversationBufferMemory, which stores the full message list and passes it as a context window.
The pattern: create a memory object, attach it to a ConversationChain (or pass it manually to any chain), and call the chain multiple times. The message history accumulates:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3.2:3b", temperature=0.7)
# Initialize memory with return_messages=True for ChatMessage format
memory = ConversationBufferMemory(
return_messages=True,
memory_key="history", # key used in prompt template
)
# ConversationChain wraps memory + chain logic
chain = ConversationChain(
llm=llm,
memory=memory,
verbose=True,
)
# Turn 1
result1 = chain.invoke({"input": "My name is Alex and I work in finance."})
print(result1["response"])
# Turn 2
result2 = chain.invoke({"input": "What is my name and what field do I work in?"})
print(result2["response"])
# Turn 3
result3 = chain.invoke({"input": "Summarize everything I've told you."})
print(result3["response"])
With return_messages=True, memory stores HumanMessage and AIMessage objects rather than plain strings, which is the correct format for ChatPromptTemplate. With return_messages=False, it stores string tuples, suitable for PromptTemplate.
Inspect the memory state directly:
print(memory.buffer)
# [HumanMessage(content='My name is Alex and I work in finance.'),
# AIMessage(content="Hi Alex! It's nice to meet someone from the finance...),
# HumanMessage(content='What is my name and what field do I work in?'),
# AIMessage(content='Your name is Alex and you work in the finance...)]
Save and load memory for session persistence:
import json
# Save to disk
memory.save_context(
{"input": "My favorite color is teal."},
{"output": "Teal is a great choice!"}
)
serialized = memory.buffer
with open("session_memory.json", "w") as f:
# Convert message objects to serializable format
json.dump([m.to_dict() for m in serialized], f)
# Load into a new memory instance
with open("session_memory.json") as f:
loaded_messages = json.load(f)
new_memory = ConversationBufferMemory(
return_messages=True,
memory_key="history",
)
for msg_dict in loaded_messages:
new_memory.chat_memory.add_message(
# Reconstruct from dict
HumanMessage(content=msg_dict["content"])
)
If you need the full history as a string for a custom prompt (instead of passing message objects), call memory.load_memory_variables({}) which returns a dict with your memory_key:
vars = memory.load_memory_variables({})
print(vars["history"])
# Human: My name is Alex...
# AI: Hi Alex...
Common failure mode: context window overflow. ConversationBufferMemory stores every message indefinitely. A long conversation eventually exceeds the model's context window ( Llama 3.2 3B handles ~4,096 tokens; 70B handles ~128,000). When the context overflows, the model stops attending to older messages. The model's response quality degrades without an explicit error. Use ConversationBufferWindowMemory with k=10 to limit to the last 10 messages, or SummaryMemory to condense history into a summary string.
Example with ConversationBufferWindowMemory:
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(
k=3, # keep only last 3 exchanges
return_messages=True,
memory_key="history",
)
# After 5+ turns, only the last 3 are in the memory buffer
chain = ConversationChain(llm=llm, memory=memory)
for i in range(6):
result = chain.invoke({"input": f"This is exchange number {i}."})
print(f"Turn {i}: buffer has {len(memory.buffer) // 2} exchanges")
# Turn 0: buffer has 1 exchanges
# Turn 5: buffer has 3 exchanges (capped at k=3)
When manually adding messages to memory outside the chain (if you're calling llm.invoke() directly rather than through the chain), use memory.save_context():
memory.save_context(
{"input": "User message here"},
{"output": "Assistant response here"}
)
This is necessary because ConversationChain manages the save_context call internally; standalone LLMChain calls do not.
Run a five-turn conversation with ConversationBufferMemory. Print the buffer after each turn. Then switch to ConversationBufferWindowMemory with k=2 and observe which messages are retained after five turns. Document which messages dropped.