07. Conversation Memory
Conversation memory means sending the full message history with every request so Ollama has context. Store messages in a JavaScript array:
let conversationHistory = [];
function addMessage(role, content) {
conversationHistory.push({ role, content });
}
When the user sends a message, add it to history, send to backend, then append the assistant's response:
addMessage("user", userText);
await sendMessage(userText);
addMessage("assistant", assistantText);
The backend stores history server-side per session, so if the user refreshes the page the history survives. This is covered in Chapter 8. For now, the frontend just maintains a local array.
A failure mode: without a message limit, the context window eventually overflows. Ollama returns an error when the prompt exceeds the model's context length. For llama3 8B, the limit is 8192 tokens—roughly 6000 words of conversation history. Implement a simple truncation:
const MAX_MESSAGES = 20; // ~10 turns, manageable for most models
if (conversationHistory.length > MAX_MESSAGES) {
conversationHistory = conversationHistory.slice(-MAX_MESSAGES);
}
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Add a message counter to the UI showing Messages: N / 20. When it hits 20, show a warning that older messages will be dropped on the next send.