QuickstartLLM red team

Quickstart: LLM red team

Pencheff treats an LLM endpoint as a third kind of asset alongside URL (DAST) and Repo (SAST/SCA). Register a chat-completions URL once, fire a curated suite of black-box adversarial probes at it, get OWASP LLM Top 10 (2025) findings in the same unified queue as everything else.

1. Get the right endpoint URL

The red-team module talks to the chat-completions endpoint, not the model info page. Examples that work:

Provider presetEndpoint URL
openai-chathttps://api.openai.com/v1/chat/completions
openai-chat (OpenRouter)https://openrouter.ai/api/v1/chat/completions
azure-openaihttps://<resource>.openai.azure.com/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01
bedrockhttps://bedrock-runtime.<region>.amazonaws.com/model/<model>/invoke
vertexhttps://<region>-aiplatform.googleapis.com/v1/projects/<project>/locations/<region>/publishers/google/models/<model>:generateContent
customAny HTTPS URL — you supply the request-body template + response JSONPath
executablecmd: URL — local subprocess, JSON over stdin/stdout
websocketwss://&hellip;
browserPlaywright drives the chat UI

Cloud-native auth re-signs / refreshes tokens per request without touching the credential blob. Optional extras pull the right SDK: pip install pencheff[bedrock] / [vertex] / [azure].

2. Pick a profile

ProfilePayloadsWall time @ 18 RPM
quick25~5 min (10 min budget)
standard75~15 min (30 min budget)
deep250~60–90 min (2 hour budget — fits tier-4 surface + always-on TAP/GOAT/Hydra)

Round-robin across techniques means a quick profile never starves any single technique class.

3. Run it

  1. /targets/new → pick LLM endpoint.
  2. Endpoint URL = the chat-completions URL.
  3. Provider preset: OpenAI-compatible or one of the cloud-native shapes.
  4. Add an Authorization header row with the literal value Bearer sk-…. Add any provider-specific extras (HTTP-Referer, OpenAI-Organization, x-api-key).
  5. Optionally paste your deployed system prompt baseline so probes exercise the deployed configuration, not a bare model.
  6. Pick a profile (quick / standard / deep) and submit.

Coverage at a glance

The runner fires payloads across every OWASP LLM Top 10 (2025) category in one shot, and automatically loads the tier-4 add-on plugin packs and dataset seeds that augment each module:

IDModuleAuto-loaded pluginsAuto-loaded datasets
LLM01Prompt Injectioncoding-agent:repo-prompt-injection(none)
LLM02Sensitive Information Disclosurecoding-agent:secret-handling, coding-agent:procfs-credential-read, coding-agent:steganographic-exfil, coding-agent:delayed-ci-exfil, rag:exfiltration(none)
LLM03Supply Chain(none)(none)
LLM04Data and Model Poisoningrag:poisoning(none)
LLM05Improper Output Handlingcoding-agent:generated-vulnerabilities, coding-agent:terminal-output-injectionaegis (S3 / S7), unsafebench (phishing-art), harmbench
LLM06Excessive Agencycoding-agent:automation-poisoning, coding-agent:network-egress-bypass, coding-agent:sandbox-escape, coding-agent:verifier-sabotage, coding-agent:core, mcp:tool-poisoning, mcp:tool-name-collision, mcp:untrusted-server-prompt, mcp:resource-exfil(none)
LLM07System Prompt Leakage(none)(none)
LLM08Vector and Embedding Weaknesses(none)(none)
LLM09Misinformationbias:age, bias:disability, bias:gender, bias:race, rag:source-attributionaegis (S1, S2, S4, S5, S6), unsafebench (hate-iconography, graphic-violence, NSFW-CSAM, weapon-howto, doxx), xstest (8 over-refusal probes — verdict inverted), harmbench, donotanswer, beavertails, toxic-chat
LLM10Unbounded Consumption(none)(none)

When an attacker LLM is configured on the target, every base case is also marked for TAP + GOAT + Hydra iterative search — the dispatcher routes those marker cases to the matching attacker-driven loop at scan time.

Every finding is mapped to six compliance frameworks: OWASP LLM Top 10 · MITRE ATLAS · NIST AI Risk Management Framework · EU AI Act · GDPR · ISO/IEC 42001:2023.

Reasoning models (Nemotron, DeepSeek-R1, QwQ, …) emit <think>...</think> traces that confuse regex judges. Set --judge-provider openai-moderation — it scores the visible output, not the chain-of-thought.

Cost & rate ceilings

The token-bucket rate limiter is shared across every probe targeting the same endpoint, so 10 OWASP modules dispatching concurrently respect a single per-key cap. Defaults:

max_rpm: 18              # OpenRouter free tier ≈ 20 RPM
max_cost_usd: 5.0
max_calls: 2000
max_latency_ms: 30000    # emits LLM10 finding when exceeded

429 responses honour the upstream Retry-After header automatically; the shared limiter stalls all concurrent dispatchers until the provider’s window resets so retries don’t thunder-herd.

Next