12. GDPR Compliance

Chapter 12 of 16 · 15 min

The EU General Data Protection Regulation applies to local AI deployments if the system processes data of EU residents, regardless of where the deployment is physically located. Compliance requires understanding what data you collect, how you process it, and what rights individuals have.

GDPR applies to AI systems that:

  • Process data of EU residents (employees, customers, website visitors)
  • Offer goods or services to EU residents
  • Monitor behavior of EU residents

Key GDPR requirements for AI:

Lawful basis (Article 6): You need a valid legal basis for processing. For AI processing, common bases are: consent, contract performance, legitimate interest, or legal obligation. Document which basis applies to each processing activity.

Data minimization (Article 5): Collect only data necessary for the purpose. For AI, this means: don't store prompts longer than needed, don't retain conversation logs indefinitely, and don't use personal data in training unless explicitly permitted.

Purpose limitation (Article 5): Use data only for the stated purpose. If you collect customer queries for support, you cannot later use them to fine-tune models without separate consent.

Right to erasure (Article 17): Individuals can request deletion of their data. This means: you must be able to identify and delete their conversation history, retrieved documents containing their data, and any derived data.

Data protection impact assessment (Article 35): Required for high-risk processing. AI systems making automated decisions, processing sensitive data, or profiling individuals likely require a DPIA.

Implementing GDPR controls:

# Example: Data retention policy enforcement
from datetime import datetime, timedelta

class ConversationStore:
    def __init__(self, retention_days: int = 90):
        self.retention_days = retention_days
    
    def store(self, user_id: str, prompt: str, response: str):
        # Check for personal data and apply retention
        record = {
            "user_id": user_id,
            "prompt": prompt,
            "response": response,
            "created_at": datetime.utcnow(),
            "personal_data": self.contains_personal_data(prompt)
        }
        self.db.insert("conversations", record)
    
    def delete_user_data(self, user_id: str) -> int:
        """GDPR Article 17: Right to erasure"""
        deleted = self.db.delete(
            "conversations",
            where={"user_id": user_id}
        )
        return deleted
    
    def cleanup_old_records(self):
        """Automated retention enforcement"""
        cutoff = datetime.utcnow() - timedelta(days=self.retention_days)
        self.db.delete(
            "conversations",
            where={"created_at": {"<": cutoff}}
        )
EXERCISE

Map your AI data flows: where does data enter, where is it stored, who can access it, and when is it deleted? Identify any flows that lack clear legal basis or retention policies.