FeaturesAgent firewall

Agent firewall

The guardrail proxy decides whether prompt/response text is safe. The agent firewall decides whether an agent action — a tool call your model wants to make — is allowed. It runs inline on the hosted proxy, per target, and is off by default.

What it gates

When you point an app at a target’s guardrail proxy and enable the firewall, every tool call the model returns is evaluated against a policy:

ActionEffect
allowforwarded unchanged
blockthe whole response is refused (403), so your app never receives the dangerous tool call
require_approvalheld for human confirmation (refused at the gateway)
redact_argscredential-shaped argument values are masked, then forwarded

Built-in policy (always on)

RuleActionCatches
ssrf-cloud-metadatablock169.254.169.254, metadata.google.internal, … (SSRF / credential theft)
sensitive-file-readblock/etc/shadow, ~/.ssh/id_*, ~/.aws/credentials, .env
destructive-shellblockrm -rf, mkfs, dd if=, shutdown, …
secret-in-argsredact_argsghp_…, AKIA…, sk-…, xox[baprs]-…

A custom rule with action allow, placed before these, can whitelist a specific case (first match wins).

Enforcement seam (read this)

The hosted proxy is a chat-completions gateway. It sees tool calls only in the model’s response and tool results in the next request. So the gateway firewall gates the model’s intent to call a tool and the data flowing back — it cannot stop an app from executing a tool it never routed through the proxy. True execution-time action blocking is the job of the in-process SDK / sidecar (on the roadmap). The same policy engine powers both seams.

Configure

UI: Targets → (your LLM target) → Edit → Agent firewall — toggle it on, set the default action for unmatched calls, and add custom rules.

API:

curl -X PUT https://api.pencheff.com/targets/<TARGET_ID>/firewall \
  -H "Authorization: Bearer <PENCHEFF_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "firewall": {
      "enabled": true,
      "default_action": "allow",
      "rules": [
        {"id": "no-prod-deletes", "action": "require_approval",
         "tools": ["delete_*", "drop_*"], "arg_patterns": ["prod"],
         "reason": "destructive op on prod needs sign-off"}
      ]
    }
  }'

Each rule needs at least one tools glob or arg_patterns regex. Invalid regexes are rejected at write time (400) so the proxy never faults at request time.

Block response

A blocked tool call returns a uniform 403:

{
  "error": {
    "message": "Pencheff Sentry blocked: tool call 'http_get' blocked by the agent firewall: targets a cloud metadata endpoint (SSRF / credential theft)",
    "type": "guardrail_block",
    "code": "sentry_blocked_response",
    "pencheff_sentry": { "category": "LLM06", "detector": "firewall:ssrf-cloud-metadata" }
  }
}

Firewall decisions also show up in runtime traces as a firewall.block span.

Notes

  • Same buffered-only limitation as the response detectors: streaming (SSE) responses are forwarded without tool-call gating today.
  • Engine: pencheff_sentry/firewall.py — pure Python, MIT.