RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to implement token-level logging for LLM prompt auditing and compliance
HOW-TO · OPS

How to implement token-level logging for LLM prompt auditing and compliance

advanced·25 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

LLM with token-level access, compliance logging target

What this does

This guide captures every input and output token exchanged with an LLM, along with metadata for each interaction, into an append-only audit log. The log records the exact token sequences sent and received, model version, user identifier, timestamp, and session context. This enables post-hoc compliance audits, data-subject access requests, and forensic analysis of model behavior for regulated environments.

Steps

  1. Define the audit record schema. Each record must contain:

    audit_record = {
        "event_id": str(uuid.uuid4()),
        "timestamp": datetime.utcnow().isoformat(),
        "session_id": session_id,
        "user_id": user_id,
        "model": model_name,
        "model_version": model_version,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "input_hash": hashlib.sha256(json.dumps(input_tokens).encode()).hexdigest(),
        "output_hash": hashlib.sha256(json.dumps(output_tokens).encode()).hexdigest(),
        "request_duration_ms": request_duration_ms,
        "stop_reason": stop_reason,
    }
    
  2. Capture token sequences. If the API returns token-level data, store the full sequence. If only token counts are available, log the text with a hash for integrity:

    audit_record["input_text_hash"] = hashlib.sha256(prompt_text.encode()).hexdigest()
    audit_record["output_text_hash"] = hashlib.sha256(response_text.encode()).hexdigest()
    
  3. Write records to an append-only S3 prefix with object lock. Configure the bucket:

    aws s3api put-object-lock-configuration --bucket audit-logs --object-lock-configuration '{"ObjectLockEnabled":"Enabled","Rule":{"DefaultRetention":{"Mode":"GOVERNANCE","Days":2555}}}'
    
  4. Implement an atomic write function:

    async def write_audit_record(record):
        key = f"audit/{record['timestamp'][:10]}/{record['event_id']}.json"
        session = aioboto3.Session()
        async with session.client("s3") as s3:
            await s3.put_object(
                Bucket="audit-logs",
                Key=key,
                Body=json.dumps(record),
                ContentType="application/json",
                ObjectLockMode="GOVERNANCE",
                ObjectLockRetainUntilDate=datetime.utcnow() + timedelta(days=2555)
            )
    
  5. Add a synchronous fallback for deployment environments without asyncio support. Use boto3 with ThreadPoolExecutor for non-blocking behavior.

  6. Verify audit trail integrity. Write a validation script that recomputes hashes:

    for record in fetch_audit_records(start_date, end_date):
        expected_hash = hashlib.sha256(json.dumps(record["input_tokens"]).encode()).hexdigest()
        assert expected_hash == record["input_hash"], f"Integrity violation at {record['event_id']}"
    
  7. Run the integrity check as a cron job:

    0 6 * * * python validate_audit.py --days 1 >> /var/log/audit-validation.log 2>&1
    

Verification

aws s3 ls s3://audit-logs/audit/$(date -u +%Y-%m-%d)/ --recursive | wc -l

Expected output: number of audit records written today, greater than 0.

Common failures

  • Duplicate audit records on retry — generate a deterministic event_id by hashing the request body, or use idempotency keys from the calling service.
  • S3 object lock fails — the bucket must have object lock enabled at creation time. It cannot be retroactively enabled. Create a new bucket with aws s3api create-bucket --bucket audit-logs --object-lock-enabled-for-bucket.
  • Token-level data not available from API — some LLM providers do not expose per-token output. In this case, store the text representation with a cryptographic hash as a substitute.
  • High write costs at scale — batch audit records into larger objects (one per minute) to reduce S3 PUT costs. Ensure batch writes complete within the retention window.

Related guides

  • Implement async prompt logging to S3 without blocking inference latency
  • Build a structured prompt/response logging pipeline with Fluentd
  • Log conversational context windows for AI agent debugging
← All how-to guidesCourses →