08. Semantic Memory

Chapter 8 of 24 · 15 min

Semantic memory stores facts, knowledge, and learned patterns—the "what the agent knows" rather than "what the agent did." It's the longest-lived memory tier, designed for retrieval when the agent needs domain knowledge or learned procedures.

Design: knowledge triples and embeddings

@dataclass
class KnowledgeEntry:
    id: str
    content: str
    embedding: list[float]
    metadata: dict[str, Any]  # source, timestamp, confidence, category
    tags: list[str]

class SemanticMemory:
    def __init__(self, storage: KnowledgeStorage, embedder: Embedder):
        self.storage = storage
        self.embedder = embedder
    
    async def add(
        self, 
        content: str, 
        metadata: dict[str, Any] = None,
        tags: list[str] = None
    ) -> KnowledgeEntry:
        embedding = await self.embedder.embed(content)
        entry = KnowledgeEntry(
            id=str(uuid.uuid4()),
            content=content,
            embedding=embedding,
            metadata=metadata or {},
            tags=tags or []
        )
        await self.storage.save(entry)
        return entry
    
    async def retrieve(self, query: str, limit: int = 5) -> list[KnowledgeEntry]:
        query_embedding = await self.embedder.embed(query)
        return await self.storage.search(query_embedding, limit)
    
    async def retrieve_by_tags(
        self, 
        tags: list[str], 
        limit: int = 10
    ) -> list[KnowledgeEntry]:
        return await self.storage.search_by_tags(tags, limit)

Embedding abstraction:

from abc import ABC, abstractmethod

class Embedder(ABC):
    @abstractmethod
    async def embed(self, text: str) -> list[float]:
        pass

class OpenAIEmbedder(Embedder):
    def __init__(self, client: OpenAIClient, model: str = "text-embedding-3-small"):
        self.client = client
        self.model = model
    
    async def embed(self, text: str) -> list[float]:
        response = await self.client.embeddings.create(
            model=self.model,
            input=text
        )
        return response.data[0].embedding

Failure mode: embedding drift. If you switch embedding models, old embeddings are incompatible with new queries. Either version your embeddings or implement a re-embedding pipeline.

Failure mode: semantic drift. Over time, similar concepts get stored as separate entries with slightly different phrasings. Implement periodic deduplication based on cosine similarity.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Implement a semantic memory store for your agent's domain. Add 20 knowledge entries. Test retrieval with three queries and evaluate whether the most relevant entries come back first.