Pencheff Sentry
Sentry is a runtime LLM guardrail that drops between your application and the model provider. It blocks prompt injection, PII / secret exfiltration, unsafe HTML in model output, and unbounded consumption as they happen — instead of catching them post-hoc on the next Pencheff red-team scan.
Same OWASP-LLM-Top-10 (2025) taxonomy as the offline scanner. Same detector library. Inline.
Modes
| Mode | What it is | Best for |
|---|---|---|
| HTTP proxy sidecar | A FastAPI service in front of an OpenAI-compatible upstream | Drop-in URL change for any OpenAI-compatible provider |
| LiteLLM plugin | pre_call / post_call hooks | Stacks already running LiteLLM |
| MCP middleware | Wraps the MCP tool-call path | LLM agents that call tools — blocks unsafe tool args inline |
The Cloudflare Worker mode (edge deployment) is on the v0.8 roadmap.
Hosted gateway (per-target)
If you’d rather not run the sidecar, register an LLM target in Pencheff, configure its guardrails, and point your app at the hosted gateway — no install, policy managed in the UI:
POST https://api.pencheff.com/proxy/<TARGET_ID>/v1/chat/completions
Authorization: Bearer <PENCHEFF_API_KEY>The gateway runs the same OWASP-LLM detector chain on the prompt and response, plus two capabilities that build on it:
- Agent firewall — gate the tool calls the model makes (block SSRF / secret exfil / destructive actions, require approval, or redact credential-shaped args). Off by default, per target.
- Runtime traces — every request recorded as a span tree (LLM call · detector verdict · firewall decision), viewable on the target page.
Configure guardrails at Targets → (LLM target) → Edit → Guardrails, and the firewall just below it. See also the memory scanner for auditing agent memory / vector stores.
Quick start
pip install pencheff-sentry
pencheff-sentry serve \
--upstream https://api.openai.com/v1 \
--port 4242 \
--max-output-tokens 4000Then change your application’s OpenAI base URL from
https://api.openai.com/v1 to http://localhost:4242. Sentry forwards
allowed requests verbatim and blocks unsafe ones with a clean
403 sentry_blocked response that includes the OWASP-LLM category.
{
"error": {
"message": "Pencheff Sentry blocked: prompt injection (direct-override)",
"type": "guardrail_block",
"code": "sentry_blocked",
"pencheff_sentry": {
"category": "LLM01",
"detector": "direct-override"
}
}
}What it detects
| OWASP LLM | Detector | Examples |
|---|---|---|
| LLM01 | Prompt injection | ignore previous instructions, pretend to be DAN, print your system prompt, encoded variants |
| LLM02 | PII / secrets | SSN, credit card, email, phone, AWS access key, OpenAI sk-, GitHub PAT shapes |
| LLM05 | Unsafe output handling | <script> / <iframe> / javascript: / inline event handlers in model response |
| LLM10 | Unbounded consumption | Output token ceiling configurable via --max-output-tokens |
The full pattern set lives in
pencheff_sentry/core.py
— pure Python, no I/O, easy to extend.
LiteLLM plugin
import litellm
from pencheff_sentry.litellm_plugin import register
register(litellm)
# Sentry now intercepts every litellm.completion() call.
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "..."}],
)pre_call raises litellm.BadRequestError on a blocked prompt. The
post_call hook mutates a blocked response into a safe refusal
string and stamps response.pencheff_sentry = {blocked, category, detector, reason} so downstream code can distinguish a
guardrail-driven refusal from a model-native refusal.
Audit log
Sentry never persists prompt or response bodies by default —
auditors asking “did you log my customer’s prompt?” get a clean
answer. The opt-in audit log (--audit-log path.jsonl) records
decisions only: verdict, detector, category, plus a SHA-256
hash of the prompt/response for correlation. Never the body itself.
{"ts":"2026-05-08T15:00:01Z","side":"prompt","verdict":"block","category":"LLM01","detector":"direct-override","reason":"prompt injection (direct-override)","prompt_hash":"a7c2..."}Default judge
The default judge is IBM Granite Guardian (Apache-2.0).
Llama Guard 3 is opt-in via PENCHEFF_LLAMA_GUARD_ENABLED=1 — it
ships under the Llama Community License (≤700 M MAU + attribution
required), and Pencheff surfaces the license notice in every
JudgeResult.reason so downstream consumers can reproduce it.
See features/llm-redteam for the
full judge ensemble.
Extending the detector chain
from pencheff_sentry.core import GuardrailConfig, evaluate_prompt
cfg = GuardrailConfig(
extra_patterns=[
# (regex, detector_name, owasp_category)
(r"(?i)\binternal[- ]doc:[a-z0-9-]+\b", "internal-doc-leak", "LLM02"),
],
)
decision = evaluate_prompt(user_prompt, config=cfg)
if decision.verdict == "block":
refuse(decision.reason)Source
- Package:
pencheff-sentryon PyPI (separate from the mainpencheffpackage). - Source tree:
plugins/sentry/. - License: MIT.