Agent firewall

The guardrail proxy decides whether prompt/response text is safe. The agent firewall decides whether an agent action — a tool call your model wants to make — is allowed. It runs inline on the hosted proxy, per target, and is off by default.

What it gates

When you point an app at a target’s guardrail proxy and enable the firewall, every tool call the model returns is evaluated against a policy:

Action	Effect
allow	forwarded unchanged
block	the whole response is refused (`403`), so your app never receives the dangerous tool call
require_approval	held for human confirmation (refused at the gateway)
redact_args	credential-shaped argument values are masked, then forwarded

Built-in policy (always on)

Rule	Action	Catches
`ssrf-cloud-metadata`	block	`169.254.169.254`, `metadata.google.internal`, … (SSRF / credential theft)
`sensitive-file-read`	block	`/etc/shadow`, `~/.ssh/id_*`, `~/.aws/credentials`, `.env`
`destructive-shell`	block	`rm -rf`, `mkfs`, `dd if=`, `shutdown`, …
`secret-in-args`	redact_args	`ghp_…`, `AKIA…`, `sk-…`, `xox[baprs]-…`

A custom rule with action allow, placed before these, can whitelist a specific case (first match wins).

Enforcement seam (read this)

The hosted proxy is a chat-completions gateway. It sees tool calls only in the model’s response and tool results in the next request. So the gateway firewall gates the model’s intent to call a tool and the data flowing back — it cannot stop an app from executing a tool it never routed through the proxy. True execution-time action blocking is the job of the in-process SDK / sidecar (on the roadmap). The same policy engine powers both seams.

Configure

UI: Targets → (your LLM target) → Edit → Agent firewall — toggle it on, set the default action for unmatched calls, and add custom rules.

API:

curl -X PUT https://api.pencheff.com/targets/<TARGET_ID>/firewall \
  -H "Authorization: Bearer <PENCHEFF_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "firewall": {
      "enabled": true,
      "default_action": "allow",
      "rules": [
        {"id": "no-prod-deletes", "action": "require_approval",
         "tools": ["delete_*", "drop_*"], "arg_patterns": ["prod"],
         "reason": "destructive op on prod needs sign-off"}
      ]
    }
  }'

Each rule needs at least one tools glob or arg_patterns regex. Invalid regexes are rejected at write time (400) so the proxy never faults at request time.

Block response

A blocked tool call returns a uniform 403:

{
  "error": {
    "message": "Pencheff Sentry blocked: tool call 'http_get' blocked by the agent firewall: targets a cloud metadata endpoint (SSRF / credential theft)",
    "type": "guardrail_block",
    "code": "sentry_blocked_response",
    "pencheff_sentry": { "category": "LLM06", "detector": "firewall:ssrf-cloud-metadata" }
  }
}

Firewall decisions also show up in runtime traces as a firewall.block span.

Notes

Same buffered-only limitation as the response detectors: streaming (SSE) responses are forwarded without tool-call gating today.
Engine: pencheff_sentry/firewall.py — pure Python, MIT.

Sentry runtime guardrail Runtime traces