Guardrails
Guardrails are runtime constraints or filters applied to an LLM's input and output to enforce safety, compliance, or formatting rules. In local AI, guardrails are typically implemented as a separate validation layer that intercepts prompts and responses before they reach the user. Operators configure guardrails to block harmful content, enforce output structure (e.g., JSON-only), or prevent prompt injection. Unlike cloud APIs where guardrails are built-in, local setups require operators to add them via tools like Guardrails AI, NVIDIA NeMo Guardrails, or custom scripts that run alongside the inference server.
Deeper dive
Guardrails operate at two points: pre-processing (input guardrails) and post-processing (output guardrails). Input guardrails scan prompts for disallowed topics, jailbreak attempts, or sensitive data (e.g., PII). Output guardrails validate the model's response against rules—for example, ensuring it doesn't contain profanity, hallucinated facts, or non-compliant formatting. In local deployments, guardrails are often implemented as a middleware layer between the user interface and the LLM runtime (e.g., Ollama, vLLM). Operators can use open-source libraries like Guardrails AI, which allows defining 'rails' as XML-like specs, or NeMo Guardrails, which uses Colang scripting. Performance matters: each guardrail check adds latency (typically 10-100 ms per check), so operators must balance safety with throughput. Guardrails are not a substitute for fine-tuning; they are a runtime safety net.
Practical example
An operator runs a local chatbot for customer support using Llama 3.1 8B via Ollama. They want to block the model from generating refund amounts above $500. They add a guardrail using Guardrails AI: a Python script that intercepts the response, parses any dollar amounts, and if >500, replaces the response with 'I cannot process that amount. Please contact a supervisor.' This guardrail runs on the same machine, adding ~50 ms latency per request.
Workflow example
In a typical local setup, the operator runs Ollama serving the model on port 11434. They then run a separate Python script that uses the guardrails-ai library. The script defines a rail (e.g., output_rail = Rail.from_string(...)) and wraps the Ollama API call: guard = Guard.from_rail(...), then guard(ollama.chat(model='llama3.1', messages=[...])). The guard checks the output before returning it to the user. If it fails, the script returns a fallback message instead of the model's raw output.
Reviewed by Fredoline Eruo. See our editorial policy.