10. Row-Level Security
Row-level security (RLS) goes deeper than document-level access. Within a single document, different users see different content. A manager sees all rows in a performance report; an individual contributor sees only their own rows. A regional manager sees only rows for their region.
RLS in vector databases requires embedding permission boundaries directly in the index structure. Instead of a single index with post-filtering, create separate indexes per security boundary, or store permission metadata as part of the vector metadata for fast pre-filtering.
# Row-level security metadata embedding
def chunk_with_rls(
content: str,
row_identifiers: dict,
permission_dimensions: list[str]
) -> VectorChunk:
"""
Row identifiers: {employee_id, department, region, team}
Permission dimensions: dimensions used for access control
"""
# Embed permission context in metadata for fast filtering
metadata = {
"content": content,
"employee_ids": row_identifiers.get("employee_ids", []),
"departments": row_identifiers.get("departments", []),
"regions": row_identifiers.get("regions", []),
"visibility_level": calculate_visibility(row_identifiers)
}
return VectorChunk(
id=generate_chunk_id(),
vector=embedding_model.embed(content),
metadata=metadata,
searchable_dimensions=permission_dimensions
)
Query rewriting translates user context into filter predicates. When a user queries "performance metrics," the system automatically adds filters for their employee ID, department, and region. This is transparent to the user—they see only relevant results.
# Query rewriting for RLS
def rewrite_query_with_rls(query: str, user: User) -> tuple[str, Filter]:
filter_predicates = []
# Always filter by organization
filter_predicates.append(Filter(
field="organization_id",
operator="equals",
value=user.organization_id
))
# Add role-based filters
if user.role == "employee":
filter_predicates.append(Filter(
field="employee_ids",
operator="contains",
value=user.employee_id
))
elif user.role == "department_head":
filter_predicates.append(Filter(
field="departments",
operator="contains",
value=user.department
))
elif user.role == "executive":
# No additional filters—executives see all
pass
return query, Filter.and_all(filter_predicates)
Performance at scale is the hard problem. A table with 1 million rows becomes 1 million vectors with overlapping permission metadata. A query from a global executive must scan vectors tagged with hundreds of regions. A query from a field employee must narrow to their exact subset.
Index design patterns for RLS include: hierarchical indexes (region → department → team → employee), bitmask permissions (each row stores a permission bitmap checked against user groups), and inverted permission indexes (maintain a reverse index from user to visible rows).
The failure mode most likely to cause security incidents: incorrect permission boundaries where two employees' data overlaps. An employee sees another employee's sensitive information because their department codes differ by one character and the string match is buggy.
Design an RLS strategy for a healthcare RAG system where patients see their own records, doctors see records for patients they are assigned to, and compliance officers can see de-identified summaries only. Consider how to handle a query about "all patients with diabetes" across these permission boundaries.