AI
    May 17, 2026

    Securing LLM Apps: Guardrails for Production

    LLM features open new attack surfaces — prompt injection, data leakage, unsafe tool use. A practical guardrails checklist for shipping AI to production safely.

    Share

    An LLM in production is a new kind of attack surface: it takes untrusted natural-language input, reasons over sensitive context, and — increasingly — calls tools that do things. Traditional input validation doesn't cover it. Shipping AI safely means adding guardrails at the boundaries, with the same rigor you'd apply to any other untrusted-input system.

    The threats, concretely

    • Prompt injection — user (or retrieved) text that hijacks the model's instructions ("ignore previous instructions and…"). The #1 LLM-specific risk.
    • Data leakage — the model exposes secrets, other users' data, or internal context it shouldn't.
    • Unsafe tool use — an agent calls a destructive or money-moving tool with attacker-influenced arguments.
    • Harmful / off-policy output — content that violates your policy or brand.
    • Cost/DoS abuse — adversarial inputs that drive huge token usage.

    Guardrails by layer

    Input. Treat all input as untrusted — including retrieved documents (indirect injection hides there). Separate instructions from data with clear delimiters, keep the trusted system prompt isolated, and don't blindly concatenate user text into a position of authority.

    Tool / action. This is where injection turns into damage. Enforce permissions in code, not the prompt: allowlist callable tools, validate every argument, require confirmation/human approval for irreversible actions, and make writes idempotent. The model proposes; your code decides.

    Output. Validate before use — schema-check structured output, scan for leaked secrets/PII, and apply content filtering. Never render model output as trusted HTML or execute it without sanitization.

    Context / data. Scope what the model can see to the current user (no cross-tenant context), redact PII before it enters the prompt where possible, and keep provenance so you can audit what informed an answer.

    Operational. Rate-limit and budget per user to blunt cost/DoS abuse, and log every prompt, tool call, and output for audit and incident response.

    A pre-launch checklist

    • Is user and retrieved text treated as untrusted (injection-aware)?
    • Are tool permissions enforced in code, with validation + approval gates on risky actions?
    • Is structured output schema-validated and scanned for secrets/PII before use?
    • Is context scoped per user (no cross-tenant leakage)?
    • Are there per-user rate limits and token budgets?
    • Are prompts, tool calls, and outputs logged for audit?
    • Do you have a human-escalation path for low-confidence / high-stakes cases?

    Wrap-up

    LLM security isn't a model setting — it's guardrails at every boundary: untrusted input handling, code-enforced tool permissions, validated output, scoped context, and full auditability. Add them before launch; bolting them on after an incident is the expensive path.

    Ask about this article

    Get answers grounded in this post. AI-generated — based on this article, and may be imperfect.

    Scaled AI Weekly

    Enjoyed this? Get more like it every Monday.

    Real architecture decisions, LLMOps patterns that survive production, and engineering leadership advice — from 12+ years of building at enterprise scale. Free. No spam. Unsubscribe anytime.

    Join engineers building production AI systems