Row-Level Security — Enterprise-Scale RAG (Chapter 10)

Row-level security (RLS) goes deeper than document-level access. Within a single document, different users see different content. A manager sees all rows in a performance report; an individual contributor sees only their own rows. A regional manager sees only rows for their region.

RLS in vector databases requires embedding permission boundaries directly in the index structure. Instead of a single index with post-filtering, create separate indexes per security boundary, or store permission metadata as part of the vector metadata for fast pre-filtering.

# Row-level security metadata embedding
def chunk_with_rls(
    content: str,
    row_identifiers: dict,
    permission_dimensions: list[str]
) -> VectorChunk:
    """
    Row identifiers: {employee_id, department, region, team}
    Permission dimensions: dimensions used for access control
    """
    # Embed permission context in metadata for fast filtering
    metadata = {
        "content": content,
        "employee_ids": row_identifiers.get("employee_ids", []),
        "departments": row_identifiers.get("departments", []),
        "regions": row_identifiers.get("regions", []),
        "visibility_level": calculate_visibility(row_identifiers)
    }
    
    return VectorChunk(
        id=generate_chunk_id(),
        vector=embedding_model.embed(content),
        metadata=metadata,
        searchable_dimensions=permission_dimensions
    )

Query rewriting translates user context into filter predicates. When a user queries "performance metrics," the system automatically adds filters for their employee ID, department, and region. This is transparent to the user—they see only relevant results.

# Query rewriting for RLS
def rewrite_query_with_rls(query: str, user: User) -> tuple[str, Filter]:
    filter_predicates = []
    
    # Always filter by organization
    filter_predicates.append(Filter(
        field="organization_id",
        operator="equals",
        value=user.organization_id
    ))
    
    # Add role-based filters
    if user.role == "employee":
        filter_predicates.append(Filter(
            field="employee_ids",
            operator="contains",
            value=user.employee_id
        ))
    elif user.role == "department_head":
        filter_predicates.append(Filter(
            field="departments",
            operator="contains",
            value=user.department
        ))
    elif user.role == "executive":
        # No additional filters—executives see all
        pass
    
    return query, Filter.and_all(filter_predicates)

Performance at scale is the hard problem. A table with 1 million rows becomes 1 million vectors with overlapping permission metadata. A query from a global executive must scan vectors tagged with hundreds of regions. A query from a field employee must narrow to their exact subset.

Index design patterns for RLS include: hierarchical indexes (region → department → team → employee), bitmask permissions (each row stores a permission bitmap checked against user groups), and inverted permission indexes (maintain a reverse index from user to visible rows).

The failure mode most likely to cause security incidents: incorrect permission boundaries where two employees' data overlaps. An employee sees another employee's sensitive information because their department codes differ by one character and the string match is buggy.