08. Semantic Memory
Semantic memory stores facts, knowledge, and learned patterns—the "what the agent knows" rather than "what the agent did." It's the longest-lived memory tier, designed for retrieval when the agent needs domain knowledge or learned procedures.
Design: knowledge triples and embeddings
@dataclass
class KnowledgeEntry:
id: str
content: str
embedding: list[float]
metadata: dict[str, Any] # source, timestamp, confidence, category
tags: list[str]
class SemanticMemory:
def __init__(self, storage: KnowledgeStorage, embedder: Embedder):
self.storage = storage
self.embedder = embedder
async def add(
self,
content: str,
metadata: dict[str, Any] = None,
tags: list[str] = None
) -> KnowledgeEntry:
embedding = await self.embedder.embed(content)
entry = KnowledgeEntry(
id=str(uuid.uuid4()),
content=content,
embedding=embedding,
metadata=metadata or {},
tags=tags or []
)
await self.storage.save(entry)
return entry
async def retrieve(self, query: str, limit: int = 5) -> list[KnowledgeEntry]:
query_embedding = await self.embedder.embed(query)
return await self.storage.search(query_embedding, limit)
async def retrieve_by_tags(
self,
tags: list[str],
limit: int = 10
) -> list[KnowledgeEntry]:
return await self.storage.search_by_tags(tags, limit)
Embedding abstraction:
from abc import ABC, abstractmethod
class Embedder(ABC):
@abstractmethod
async def embed(self, text: str) -> list[float]:
pass
class OpenAIEmbedder(Embedder):
def __init__(self, client: OpenAIClient, model: str = "text-embedding-3-small"):
self.client = client
self.model = model
async def embed(self, text: str) -> list[float]:
response = await self.client.embeddings.create(
model=self.model,
input=text
)
return response.data[0].embedding
Failure mode: embedding drift. If you switch embedding models, old embeddings are incompatible with new queries. Either version your embeddings or implement a re-embedding pipeline.
Failure mode: semantic drift. Over time, similar concepts get stored as separate entries with slightly different phrasings. Implement periodic deduplication based on cosine similarity.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Implement a semantic memory store for your agent's domain. Add 20 knowledge entries. Test retrieval with three queries and evaluate whether the most relevant entries come back first.