RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /LangChain for Local AI
  6. /Ch. 9
LangChain for Local AI

09. Memory: ConversationBuffer

Chapter 9 of 18 · 25 min
KEY INSIGHT

`ConversationBufferMemory` persists message history across turns by storing and replaying it with each chain invocation, enabling stateful multi-turn conversations.

Memory components maintain state across multiple LLM calls within a session. LangChain provides several memory types: ConversationBufferMemory (raw message history), ConversationBufferWindowMemory (rolling window), ConversationEntityMemory (tracks subject-level entities), SummaryMemory (condensed summaries), and more. This chapter covers ConversationBufferMemory, which stores the full message list and passes it as a context window.

The pattern: create a memory object, attach it to a ConversationChain (or pass it manually to any chain), and call the chain multiple times. The message history accumulates:

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.2:3b", temperature=0.7)

# Initialize memory with return_messages=True for ChatMessage format
memory = ConversationBufferMemory(
    return_messages=True,
    memory_key="history",  # key used in prompt template
)

# ConversationChain wraps memory + chain logic
chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True,
)

# Turn 1
result1 = chain.invoke({"input": "My name is Alex and I work in finance."})
print(result1["response"])

# Turn 2
result2 = chain.invoke({"input": "What is my name and what field do I work in?"})
print(result2["response"])

# Turn 3
result3 = chain.invoke({"input": "Summarize everything I've told you."})
print(result3["response"])

With return_messages=True, memory stores HumanMessage and AIMessage objects rather than plain strings, which is the correct format for ChatPromptTemplate. With return_messages=False, it stores string tuples, suitable for PromptTemplate.

Inspect the memory state directly:

print(memory.buffer)
# [HumanMessage(content='My name is Alex and I work in finance.'),
#  AIMessage(content="Hi Alex! It's nice to meet someone from the finance...),
#  HumanMessage(content='What is my name and what field do I work in?'),
#  AIMessage(content='Your name is Alex and you work in the finance...)]

Save and load memory for session persistence:

import json

# Save to disk
memory.save_context(
    {"input": "My favorite color is teal."},
    {"output": "Teal is a great choice!"}
)

serialized = memory.buffer
with open("session_memory.json", "w") as f:
    # Convert message objects to serializable format
    json.dump([m.to_dict() for m in serialized], f)

# Load into a new memory instance
with open("session_memory.json") as f:
    loaded_messages = json.load(f)

new_memory = ConversationBufferMemory(
    return_messages=True,
    memory_key="history",
)
for msg_dict in loaded_messages:
    new_memory.chat_memory.add_message(
        # Reconstruct from dict
        HumanMessage(content=msg_dict["content"])
    )

If you need the full history as a string for a custom prompt (instead of passing message objects), call memory.load_memory_variables({}) which returns a dict with your memory_key:

vars = memory.load_memory_variables({})
print(vars["history"])
# Human: My name is Alex...
# AI: Hi Alex...

Common failure mode: context window overflow. ConversationBufferMemory stores every message indefinitely. A long conversation eventually exceeds the model's context window ( Llama 3.2 3B handles ~4,096 tokens; 70B handles ~128,000). When the context overflows, the model stops attending to older messages. The model's response quality degrades without an explicit error. Use ConversationBufferWindowMemory with k=10 to limit to the last 10 messages, or SummaryMemory to condense history into a summary string.

Example with ConversationBufferWindowMemory:

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(
    k=3,                          # keep only last 3 exchanges
    return_messages=True,
    memory_key="history",
)

# After 5+ turns, only the last 3 are in the memory buffer
chain = ConversationChain(llm=llm, memory=memory)

for i in range(6):
    result = chain.invoke({"input": f"This is exchange number {i}."})
    print(f"Turn {i}: buffer has {len(memory.buffer) // 2} exchanges")
# Turn 0: buffer has 1 exchanges
# Turn 5: buffer has 3 exchanges  (capped at k=3)

When manually adding messages to memory outside the chain (if you're calling llm.invoke() directly rather than through the chain), use memory.save_context():

memory.save_context(
    {"input": "User message here"},
    {"output": "Assistant response here"}
)

This is necessary because ConversationChain manages the save_context call internally; standalone LLMChain calls do not.

EXERCISE

Run a five-turn conversation with ConversationBufferMemory. Print the buffer after each turn. Then switch to ConversationBufferWindowMemory with k=2 and observe which messages are retained after five turns. Document which messages dropped.

← Chapter 8
LangChain Output Parsers
Chapter 10 →
Memory: Summary