Release notes
v0.7.0 — IP-clean expansion (2026-05-08)
Closes the four IP-risk surfaces that existed in v0.6 (CodeQL CLI on
customer code, Semgrep --config=auto, Llama Guard licence
acknowledgement, no DCO / license-audit CI) and ships the
twelve-category gap matrix from the strategic plan: vuln-DB
aggregator with AI enrichment, partner-pentest integrations, OSS
probe + DAST rule libraries, runtime LLM guardrail, runtime API
discovery, GitHub Check Run + SARIF, container admission webhook,
and supporting docs/UI for everything.
Phase 0 — IP-risk fixes
- CodeQL ripped and replaced — Semgrep OSS (pinned packs only) + Bandit + gosec + Brakeman + ESLint-security as the new SAST pack.
- Semgrep config tightened to an explicit OSS Registry pack list;
override via
PENCHEFF_SEMGREP_PACKS. - Llama Guard 3 hardened: opt-in only via
PENCHEFF_LLAMA_GUARD_ENABLED=1, license notice surfaced in everyJudgeResult.reason, default judge falls through to Granite Guardian (Apache-2.0). - DCO bot enforced on every commit (
.github/workflows/dco.yml). - License-audit CI + auto-generated
THIRD_PARTY_NOTICES.md(tools/license_audit.py). - SPDX header check for new/changed files
(
tools/spdx_check.py --changed-only). NOTICEandCONTRIBUTING.mdpublished.
Phase 1 — Foundation
- Refactored CVE feed to a pluggable
BulkFeedSourceprotocol; new RustSec (CC0) and GoVulnDB (BSD-3) feeds via theOsvBulkSourceskeleton (more ecosystems trivial to add). GET /advisories/{id}andGET /advisories?package=&ecosystem=with AI-enriched exploit walkthrough + fix recipe (Pencheff’s answer to Snyk’s curated DB; provenance JSONL on every run).- Partner pentest integrations — HackerOne / Bugcrowd / Cobalt — with HMAC webhook signing primitive shared with the generic webhook integration.
- Per-release SBOM published to GitHub Releases on every
v*.*.*tag, signed with cosign keyless via Sigstore.
Phase 2 — Probe & rule libraries
pencheff-probescommunity LLM red-team corpus with permissive- only JSONL schema + DoNotAnswer importer (tools/import_donotanswer_probes.py); HarmBench / AgentHarm / BeaverTails explicitly excluded for license reasons.pencheff-rulescommunity DAST rule library — Pencheff Pulse JSON format with the Nuclei→Pulse converter (tools/nuclei2pulse.py) plus AI rule synthesiser with strict validator (rejects destructive payloads, disallowed methods, non-permissive PoCs).- SAST tree-sitter pack with Solidity sub-pack (4 hand-curated rules); Lua / Scala / Dart / Kotlin / Swift / COBOL / Erlang scaffolded.
Phase 3 — Runtime + integration surfaces
- Pencheff Sentry — runtime LLM guardrail. HTTP proxy sidecar +
LiteLLM plugin + MCP middleware. Blocks prompt injection / PII /
unsafe HTML / token-ceiling violations inline. Separate package
pencheff-sentryon PyPI. (Docs) - API discovery from runtime traffic — synthesises OpenAPI 3.1
from captured
ProxyFlowrows; drift detector emitsapi_driftfindings (shadow / phantom / method-drift). (Docs) - GitHub Check Run + SARIF + Pencheff Suggest — Check Run with inline annotations on every PR scan, SARIF upload to Security → Code scanning, PR-comment suppression command parser. (Docs)
Phase 4 — Container, support, certs
- Container registry push webhooks for DockerHub / ECR / GCR / ACR (Pub/Sub envelope auto-decoded, Event Grid validation handshake handled). Each push enqueues a Trivy scan.
- Kubernetes
ValidatingAdmissionWebhook(Go) — refuses pods whose images carry unfixed critical CVEs. Helm chart published tooci://ghcr.io/balasriharsha-ch/charts/pencheff-admission. Fail-closed by default. (Docs) - “Verify with humans” finding-card flow — submit any finding to
HackerOne / Bugcrowd / Cobalt; partner callback flips
verification_statusbased on the triager’s verdict. (Docs) - Procedural items (trademark searches, GitHub Secret-Scanning
Partner program application, SOC 2 + ISO 27001:2022, support-
tier hires) tracked in
docs/procedural-checklist.md.
Migration — what to do when upgrading
- Repo-scan stats keys shift:
stats.codeql→stats.semgrep,stats.bandit,stats.gosec,stats.brakeman,stats.eslint. Oldstats.codeqlrows from pre-v0.7 scans stay in the DB; the UI filters them as legacy SAST. - If you opted in to Llama Guard before v0.7, set
PENCHEFF_LLAMA_GUARD_ENABLED=1to keep using it — the default is now Granite Guardian. - The toolchain Docker image picks up Bandit / gosec / Brakeman / ESLint-security on next rebuild. CodeQL artefacts are dropped.
- Run
tools/license_audit.py --write-noticesbefore your first PR — the auto-generatedTHIRD_PARTY_NOTICES.mdis now the source of truth. - New env vars:
PENCHEFF_SEMGREP_PACKS(override SAST pack list),PENCHEFF_LLAMA_GUARD_ENABLED(opt-in Llama Guard judge).
v0.8.6 — Threat model on every scan, automatically (2026-05-08)
The v0.8.5 work made threat modeling a reusable engagement asset, but operators still had to manually generate a model before they got the adaptive scan benefit. This release closes the loop: every scan now gets a threat model, with two paths chosen by profile.
Auto-engagement on the deep profile
Every --profile deep scan against a URL with no engagement_id:
- Finds or creates an engagement keyed by
deep-{target_id[:8]}— one canonical engagement per target, deterministic slug. - Generates and persists a DREAD threat model on that engagement on first run.
- Pins the scan to that engagement and uses the model for module priority biasing.
Subsequent deep scans of the same target reuse the same engagement and the same threat model — findings accumulate, threat-model edits stick across runs.
Fly-by threat model on every other scan
quick, standard, api-only, compliance, cicd: when no engagement
is supplied, the dispatcher synthesises a DREAD model from the target
URL on the fly (~1 ms — pure-Python matrix lookup), uses it for the
module priority bias, and does not persist it. The bias is stamped
into Scan.summary.threat_model_bias for the dashboard, but no
engagement is touched.
Source label on every scan
Scan.summary.threat_model_source records which path generated the
bias for forensic clarity:
"engagement"— operator-supplied engagement carried a model."auto_engagement"— deep scan auto-created or reused the engagement."fly_by"— non-deep scan, no persistence.
5 new tests (apps/api/tests/test_auto_threat_model.py) cover the
helper that finds-or-creates the deep-scan engagement, slug-collision
safety, closed-engagement skipping, and missing-target-metadata
fallbacks.
v0.8.5 — Threat modeling, ThreatModelAgent, markdown viewer (2026-05-08)
Threat modeling — engagement-scoped STRIDE / DREAD with adaptive scan profile
- New:
POST /engagements/{id}/threat-modelgenerates a deterministic STRIDE or DREAD model from a target URL or explicit asset list.GET/PUT/DELETEcomplete the CRUD. - New:
Engagement.threat_modelJSONB column (migration 0040) andEngagement.threat_model_updated_atfor staleness signals. - Adaptive scan profile — when a scan is started against an
engagement that has a threat model, the dispatcher reorders the
profile’s modules so highest-DREAD categories run first. The chosen
bias is stamped into
Scan.summary.threat_model_biasso the dashboard can show why a particular module fired first. ThreatModelAgentadded to the swarm’s Phase 2 — runs in parallel with the breaker agents as a “lens” (no exclusive scan tools, only the sharedget_findings/test_endpoint). Emits an INFO-severity finding summarising threat coverage per asset.- Web UI at
/engagements/[id]/threat-model— table view (STRIDE rows or DREAD scored threats), markdown view, raw-JSON view; one-click Generate / Regenerate / Clear; surfaces the module priority bias. - Report inclusion — markdown report renders a
## Threat modelsection between executive summary and findings when the underlying scan was scoped to an engagement with a model. - 18 service tests — STRIDE/DREAD output shape, asset inference, scoring thresholds, module-bias deterministic ordering, markdown rendering, matrix completeness check.
Markdown viewer in the dashboard
Finding descriptions, executive summaries, and threat-model output now render as proper Markdown:
- GitHub-flavoured tables, strikethrough, task lists (via
remark-gfm). - Fenced code blocks with syntax highlighting (via
rehype-highlight). ```mermaidblocks render as SVG diagrams (viamermaidv11, dynamic-imported on the client so SSR is unaffected).<Markdown>is a reusable component (apps/web/components/markdown.tsx) used on the scan-detail and finding-detail pages.
Fixes the bug where the Assessments view rendered ## Proof of impact,
pipe-delimited tables, and bullet lists as plain text.
Pre-existing test fix as a side-effect
ActiveDirectoryAgent and MobileAppAgent from v0.8.0 were missing
entries in BREAKER_TOOL_ALLOCATIONS, which made
test_admin_access_agent.py fail with KeyError: 'ActiveDirectoryAgent'.
Empty allocations added; the swarm orchestrator + session-cleanup
tests are updated for the new total of 13 breakers.
v0.8.4 — Live CVE / NVD / EPSS / KEV data on every SCA scan (2026-05-08)
The SCA module already queried OSV.dev live per dependency, but EPSS and
KEV feeds were only refreshed when an operator manually called
refresh_cve_feed, and per-package OSV results were cached forever once
seen. Now every scan pulls live:
- NVD 2.0 enrichment per CVE — CWE list, CPE URIs, NVD-issued CVSS
v3.1 score & vector, canonical advisory URL. Cached 14 days
(
PENCHEFF_NVD_TTL_DAYS). SetNVD_API_KEYto raise the rate limit from 5/30 s to 50/30 s. - OSV per-package cache now has a 24 h TTL (
PENCHEFF_OSV_TTL_HOURS, set to0for always-live). - EPSS + CISA KEV are auto-refreshed at the start of every SCA scan
when the local cache is older than
PENCHEFF_FEED_TTL_HOURS(default 24 h, set to0for always-live). - Fail-open semantics — a network failure during refresh returns the stale-but-known row rather than dropping all SCA findings. Live-data intent fails open, not closed.
- Structured finding fields —
epss,epss_percentile,kev,kev_short_desc,kev_due_date,cwe_ids,advisory_url,nvd_cvss_score,nvd_cvss_vector,fix_version,package,ecosystemare now onFinding.metadata(no longer buried in description text). The canonical NVD URL is promoted to position 0 ofreferencesso DOCX / PR comment / finding card renderers link to NVD before OSV.
36 unit tests cover the NVD parser, TTL caching, fail-open paths, and the SCA scan-time refresh contract.
v0.8.3 — pencheff CLI is the canonical entry point (2026-05-08)
After pip install pencheff the package installer now puts a
pencheff executable on the user’s PATH — the same shape as aws
or kubectl. The [project.scripts] entry was already present; this
release makes it the documented form everywhere.
- Added
pencheff --version/-Vfor parity withaws --version. Reads the installed package metadata viaimportlib.metadata. - Replaced every
python -m pencheff …reference across the GitHub Action, GitLab CI template, Azure DevOps pipeline, Jenkins doc, root- plugin READMEs, and 17 doc pages with the bare
pencheffform.
- plugin READMEs, and 17 doc pages with the bare
- The legacy
python -m pencheff …invocation continues to work unchanged — the package keeps a valid__main__module. - Installation docs now show
which pencheff+pencheff --versionas the post-install verification.
v0.8.2 — API key scope coverage to every public router (2026-05-08)
The default-deny scope layer introduced in v0.8.1 is now wired into
every public-facing FastAPI router — repos, sboms,
dependencies, repeater, intruder, proxy, traffic,
engagements, schedules, notes, comments, fix-proposals,
dashboard, and unified-findings join the v0.8.1 set
(scans, findings, targets, reports, assets, integrations).
The advertised scope catalog (37 scopes, 20 categories) now matches exactly what the dependency layer enforces — no silent 403s on a route that didn’t opt in.
last_used_atwrites are debounced to one update per 60 s per key — a busy CI key polling every few seconds no longer issues a write per request.- Auth-flow integration tests added (21 cases) covering revoked,
expired, cross-org, detached-membership, and mismatched-workspace
paths, plus
require_scopeandsession_onlyinvariants. /repos/install-urlis correctly marked session-only (interactive GitHub App handshake); the/repos/callbackredirect was already unauthenticated.
v0.8.1 — Programmatic access: PENCHEFF_API_KEY with scoped permissions (2026-05-07)
PENCHEFF_API_KEY — per-user API keys with fine-grained permissions
Every user can now mint API keys for scripts, CI pipelines, and scheduled jobs. Manage them at Settings → API keys in the dashboard.
- Format —
pcf_live_<43-char-secret>. Stored as SHA-256; the plaintext is shown exactly once at creation. - Org-pinned — every key names exactly one organisation.
- Workspace-pinned — keys may be scoped to a specific workspace
(any member can mint these), or left org-wide (
workspace_id: null, owners and admins only). - Fine-grained scopes —
category:actionstrings. Wildcards:scans:*,*:read,*:*. - Default-deny — endpoints opt in to scope checks; routers without
a
require_scopedeclaration reject API-keyed callers regardless of scopes held. - Session-only endpoints — billing, branding, org admin / member management, and the API-key router itself never accept a key. A leaked key cannot mint more keys, change billing, or modify membership.
- Membership re-check on every request — if the issuing user is removed from the org, all of their keys for that org stop working immediately (no cache).
- Audit logged —
api_key.create,api_key.update,api_key.revokeare written toaudit_logswith the key ID and prefix.
See the API keys reference for the full scope catalog, recipes (CI/CD, SIEM forwarders, fan-out automation), and security notes.
v0.8.0 — AD/mobile/ASM MCP tools, production hardening, GitLab CI & Azure DevOps (2026-05-07)
New MCP tools (3)
-
scan_active_directory(session_id, domain, username, password, dc_ip?, modules?)— Orchestrated Active Directory enumeration: BloodHound relationship graph, Certipy ESC1–ESC15 certificate template abuse, CrackMapExec/NetExec SMB enumeration, Impacket secretsdump/Kerberoast/AS-REP roast. Selectable via themoduleslist — run one or all four. See Active Directory docs. -
scan_mobile_app(session_id, apk_path, platform?, modules?, mobsf_url?)— Static analysis of Android APKs and iOS IPAs: MobSF REST API enrichment, apktool decompile, AndroidManifest.xml security checks (debuggable, allowBackup, cleartext, exported components, minSdkVersion), and jadx-based secrets sweep (15+ patterns including AWS, GCP, Firebase, Stripe, GitHub, JWTs, PEM keys). See Mobile Security docs. -
scan_asm(session_id, org, root_domain, modules?)— Continuous Attack Surface Monitoring: passive subdomain discovery (subfinder- crt.sh), certificate transparency log watch (new issuances in last 7 days),
and asset inventory change detection (diffs vs. last snapshot). Results
persisted to
~/.pencheff/asm_inventory.db.
- crt.sh), certificate transparency log watch (new issuances in last 7 days),
and asset inventory change detection (diffs vs. last snapshot). Results
persisted to
Agent swarm: 10 → 12 Phase 2 breakers
-
ActiveDirectoryAgent— firesscan_active_directorywhen AD credentials are present; analyses BloodHound attack paths, Certipy ESC chains, and SMB share exposure; emits structured findings with step-by-step PoC commands. -
MobileAppAgent— firesscan_mobile_appagainst any APK/IPA supplied at session creation; triages MobSF findings by severity; flags hardcoded secrets with smali/Java class path and line number.
Production API hardening
-
The FastAPI app now refuses to start in
ENVIRONMENT=productionmode ifJWT_SECRETis still the insecure default orFERNET_KEYis empty. This prevents silent misconfiguration in operator deployments. -
Unhandled exception handler now returns
"Internal server error."in production instead of the fullExceptionType: messagestring, preventing internal stack details from leaking to clients.
CI/CD integrations
-
GitLab CI — reusable
.gitlab-ci.ymltemplate inapps/gitlab-ci/. Include it in any GitLab project; configure viaPENCHEFF_*CI/CD variables. Runs on MR events and default-branch pushes; report artifact retained 30 days. See GitLab CI docs. -
Azure DevOps — parameterized
azure-pipelines.ymltask inapps/azure-devops/. Use viaextends:or copy thesteps:section inline. Publishes the report as a build artifact. See Azure DevOps docs.
ASM dashboard tab
- New
/asmroute in the web dashboard (apps/web/app/asm/page.tsx) — shows total asset count, new subdomains in last 24 h, expiring certs, and an asset table with type badges. “Run Discovery” button ready for backend wiring.
PyPI
- Published as
pencheff==0.5.0—pip install --upgrade pencheff. - MCP tool count: 49 → 52.
v0.7.0 — AI agent swarm, consent screen, LLM trace persistence, evidence screenshots (2026-05-06)
Pencheff’s single-agent loop is replaced as the default execution path by a 17-agent parallel swarm. Every scan now requires explicit operator consent, and every LLM call made by every agent is persisted for audit and reproduction.
AI agent swarm
- New default scan mode: one
ReconAgent→ 10 parallel breaker agents → 6 parallel synthesis agents, all coordinated by the swarm orchestrator inapps/api/pencheff_api/services/agent_runner.py. - The 10 Phase 2 breakers fan out concurrently from a frozen
ReconSnapshot:InjectionAgent,ClientSideAgent,AuthAgent,AuthzAgent,APIAgent,InfraAgent,CloudAgent,LLMRedTeamAgent,SupplyChainAgent,K8sAgent. - The 6 Phase 3 synthesis agents read the merged findings in parallel:
ChainAgent,ComplianceAgent,ProofOfImpactAgent,PayloadCraftingAgent,EvidenceCaptureAgent,AdminAccessAgent. - Typical deep-scan numbers: ~33 min wallclock, ~411 K input / ~86 K output tokens, ~109 LLM calls.
- See AI agent swarm for full operator documentation.
Consent screen at scan creation
- Every
POST /scansnow requires aconsent_payloadfield: an authorization statement (≥ 50 chars) and an acknowledged checkbox. The API returns422if either is absent. - Consent is stored on
Scan.consent_payload(JSONB) and included in audit exports. - The scan-creation UI in the web dashboard presents the disclosed-actions catalogue per agent class before accept.
LLM trace persistence
- Every LLM call made by every swarm agent is written to the new
scan_llm_tracestable (agent name, turn, request messages, response, token counts, optional reasoning block). - New endpoint
GET /scans/{id}/llm-tracesreturns the full trace array for a completed scan. Useful for cost auditing, reproduction, and debugging. - Compact summary lines appear in the assessment log per call.
Evidence screenshots
EvidenceCaptureAgent(Phase 3) takes a Playwright screenshot per verified high/critical finding with PII redacted.- Stored at
~/.pencheff/evidence/<scan_id>/<finding_id>.pnginside the worker container; served viaGET /scans/{id}/evidence/{finding_id}.png(auth required, 404 if missing).
New pencheff MCP tools
capture_evidence— Playwright screenshot of a vulnerable URL with PII redaction.scan_llm_red_team— probe an AI/LLM endpoint for prompt injection, jailbreak, and system-prompt extraction using the OWASP LLM Top-10 payload library.playwright_navigate— GET-only page navigation inheriting session auth cookies.playwright_screenshot— screenshot the current page state.playwright_enumerate_links— read-only enumeration of visible links on the active page.playwright_logout— log out and close the browser context.set_auth_state(orchestrator-internal),attach_oast(orchestrator-internal),import_endpoints(orchestrator-internal),copy_finding(orchestrator-internal),pentest_destroy(orchestrator-internal) — used by the swarm orchestrator to manage breaker sessions; not callable by agents.
Killswitch
- Set
SWARM_ENABLED=falseon the API container to revert all new scans to the legacy single-agent path immediately. In-flight scans are unaffected.
What didn’t change
- No breaking changes to the scan creation API request shape beyond the new
required
consent_payloadfield. Existing integrations (CI scripts, SDK callers) need to add this field; all other fields and defaults are unchanged. - The
GET /scans,GET /scans/{id},GET /scans/{id}/findings,GET /scans/{id}/progress, andDELETE /scans/{id}endpoints are unchanged. - Deterministic scan profiles (
deterministic_only) are unaffected — the swarm only replaces the LLM-driven phase.
v0.6.0 — Auto-fix PRs, IDE extensions, Triage 2.0, unified findings (2026-05-02)
Closes the Snyk-parity gap on the defensive surface while keeping Pencheff’s offensive lead.
Auto-fix PRs for SCA
- New deterministic version-bump patcher across 9 manifest formats:
requirements.txt,pyproject.toml,Pipfile,package.json,go.mod,Cargo.toml,Gemfile,composer.json,pom.xml. SCA findings flow through the existingpropose_fix→apply→ PR pipeline with no LLM cost. Lockfiles deliberately not edited — the PR body instructs the developer to run the right installer. - See Auto-fix PRs.
IDE extensions (VSCode + JetBrains)
- New
pencheff lspCLI command starts a hand-rolled Language Server over stdio. Tails~/.pencheff/history/*.jsonand republishes diagnostics whenever scan results change. - VSCode extension at
apps/vscode/; JetBrains plugin atapps/jetbrains/(Kotlin + LSP4IJ). Any LSP-aware editor (Neovim, Emacs, …) works viapencheff lspdirectly. - See IDE extensions.
EPSS + KEV + SSVC + reachability prioritisation
- Every finding gets
risk_score(0–100),ssvc_decision(act/attend/track_star/track), andreachability(exploited/reachable/present/unknown) computed at insert from CVSS × EPSS × KEV × SSVC × reachability. - Dashboard sorts by
risk_score DESC NULLS LAST. The Priority Strip surfaces the components inline on every finding card. - See EPSS, KEV & SSVC and Reachability classifier.
Triage 2.0
- Pro-tier
POST /findings/{id}/triagereturns a structured walkthrough —walkthrough/blast_radius/exploit_scenario/fix_outline/confidence— anchored on the live evidence on the finding (DAST request/response, taint trace, EPSS/KEV/SSVC). - Cached on
finding.ai_triage. Reuses theFIX_LLM_API_KEYalready configured for the auto-fix proposer. - See Triage 2.0.
Unified findings stream
- New
GET /unified-findingsmerges DAST / SAST / SCA / IaC / secrets into a single sortable, filterable queue. Replaces the scan-by-scan navigation for the “what should I fix first” use case. - New dashboard page at
/findings. Filter chips for source, severity, reachability; pagination with stable order across pages. - See Unified findings stream and the API reference.
Repository SBOMs
- New
POST /repos/{repo_id}/sbomgenerates an SBOM for the latest commit on the repository’s default branch and stores it on the repository. - New
GET /repos/{repo_id}/sbomreturns the latest stored SBOM. - Repository pages display the SBOM in both a Table view and a raw JSON view, with one-click JSON download.
- A new generation replaces any previous SBOM for that repository.
- See SBOM generation and the Repos API.
Migrations
0026_ssvc_decision—findings.ssvc_decision+ index.0027_reachability—findings.reachability+ composite index.0028_ai_triage—findings.ai_triageJSONB.0029_drop_unused_tables— drops legacy tables (no-op for fresh deploys; safety net for partial-migration recovery).
Run alembic upgrade head (or rebuild the API container — it runs
the migration step automatically).
v0.5.0 — LLM red team: OWASP LLM Top 10 + Crescendo + PAIR + judges + cloud auth (2026-04-29)
A major release. Pencheff gains a third target kind — llm — that
turns a chat-completions endpoint into a fully-instrumented red-team
target with full OWASP LLM Top 10 (2025) coverage, multi-turn
escalation, iterative attacker-driven search, optional judge models
(Llama Guard / Granite Guardian / OpenAI Moderation / executable),
embedding-similarity grading, KB-grounded factuality checks, and
mappings to MITRE ATLAS / NIST AI RMF / EU AI Act alongside OWASP.
New target kind: llm
POST /targetsacceptskind: "llm"with anllm_configblock. Provider presets:openai-chat,custom(request body template + response JSONPath),executable(local command, JSON over stdin/stdout),websocket,bedrock(SigV4 via boto3),vertex(Google ADC token caching),azure-openai(Entra OAuth),browser(Playwright drives a chat UI). Auth headers ride undercredentials.headers— any number of arbitrary K-V pairs, Fernet-encrypted.- The web UI’s
/targets/newand/targets/{id}/editboth expose the full LLM form: provider preset, model, system-prompt baseline, dynamic header rows, redteam config, judge / attacker / embedder JSON blocks, thresholds, budget, retries, RPS/RPM caps.
OWASP LLM Top 10 (2025) coverage
- New MCP tool
scan_llm_red_team(session_id, categories?, techniques?, max_payloads?). Runs all 10 categories: LLM01 prompt injection, LLM02 sensitive information disclosure, LLM03 supply chain, LLM04 data and model poisoning, LLM05 improper output handling, LLM06 excessive agency, LLM07 system prompt leakage, LLM08 vector / embedding weaknesses, LLM09 misinformation, LLM10 unbounded consumption. Each category ships a curated YAML payload library; each finding aggregates by(category, technique)so reports show one Finding per technique with up to 5 evidence rows rather than N near-duplicate clones. - New scan profile shape for LLM kind:
quick= 25 payloads,standard= 75,deep= 250. Round-robin across techniques so quick profiles never starve any single technique class.
Multi-turn Crescendo + PAIR iterative search
- The
crescendostrategy is now a real 5-turn TestCase that builds context turn-by-turn. The dispatcher carries assistant replies forward asmessages[]history; an optional judge can short- circuit a clearly-refusing escalation to save budget. - New
redteam.iterative: "pair"mode — Prompt Automatic Iterative Refinement. With an attacker LLM configured, the loop sends the base prompt, reads the target’s reply, asks the attacker to refine, and re-sends until VULNERABLE orpair_iterationsexhausted. Static-template fallback (iterative: "static") remains for air-gapped environments.
Strategies + composite stacking
- 21 deterministic prompt transforms:
base64,hex,rot13,morse,leetspeak,homoglyph,jailbreak,authoritative- markup,citation,best-of-n,ascii-smuggling,emoji-smuggling,image-markdown,audio-transcript,video-transcript,camelcase,pig-latin,crescendo, plus user-defined plugin strategies. composite_strategieschains transforms left-to-right (base64+leetspeak,jailbreak+ascii-smuggling, …). Languages wrap each prompt with a target-language directive — non-English locales typically have weaker safeguards.
Judges + grading
LlmJudgesupports five providers:openai-chat(any OpenAI-compatible JSON-grading model),executable(local command),llama-guard(Llama Guard 3 with the officialsafe/unsafe S1..S14parser → OWASP LLM mapping),granite-guardian(IBM Granite Guardian 3.x Yes/No protocol), andopenai-moderation(OpenAI/moderationsAPI — recommended for reasoning-model targets because it scores the visible output rather than the chain-of-thought).- New
redteam.embedderblock adds embedding-similarity grading. TestCases declaresuccess_embeddings: [...]; cosine match against any anchor at ≥ threshold promotes AMBIGUOUS verdicts to VULNERABLE. v1 supports OpenAI-compat/embeddingsand Cohereembed. - New
redteam.factualityblock (LLM09 only). KB-grounded contradiction check via the configured judge. KB can be inline,file://path, or HTTP URL.
Attacker-LLM driven synthesis
redteam.llm_synthesis: { enabled: true, n: 10 }plus anattackerblock generates novel TestCases targeted at the discovered profile — purpose, limitations, tools, user context. One attacker call per scan; cached by profile hash.
Datasets, guardrails, variables, intents
- Built-in datasets:
donotanswer,harmbench,beavertails,cyberseceval,toxic-chat. External datasets viafile://or HTTPS URL (JSON / YAML list). - Built-in guardrails:
pii,secrets,unsafe-code,tool-authz.guardrail_bypass: trueadds active bypass-template variants. redteam.variables: {...}substitutes{{var}}placeholders in prompts, turns, system, success indicators, refusal patterns, description, remediation. Useful for application-specific probes.redteam.policiesandredteam.intentsaccept user-defined policy violations and (multi-turn) intent strings — first-class TestCases dispatched alongside the OWASP modules.
Operational / cost controls
- Token-bucket rate limiter is shared per (endpoint, RPS) so 10
OWASP modules dispatching concurrently respect a single per-key
cap. 429 responses honour the upstream
Retry-Afterheader automatically and stall every concurrent dispatcher to prevent thundering-herd retries. - Per-scan budget:
max_calls,max_tokens,max_cost_usd— hard kill switch. Per-callmax_latency_msandmax_tokens_per_callthresholds emit explicit LLM10 findings when violated. - Retry with exponential backoff (
retries,backoff_s) on 429 / 500 / 502 / 503 / 504. In-process LRU cache deduplicates identical probes (cache,cache_size). - New CRITICAL finding
LLM endpoint unreachable / unauthorisedfires when ≥50% of probes return non-2xx (401/403 → CRITICAL, 404/429 → HIGH, others → MEDIUM). Closes the “Grade A despite every probe 401’d” silent-fail bug. - PII redaction: emails, SSNs, cards, phone numbers, common API
key patterns (
sk-…,xoxb-…) are masked in evidence snippets before they reach Findings or the share-by-link route.
Compliance: AI frameworks
- Every LLM finding maps to MITRE ATLAS, NIST AI RMF, and EU AI
Act alongside OWASP LLM Top 10. Tables in
plugins/pencheff/pencheff/config.py(MITRE_ATLAS_MAP,NIST_AI_RMF_MAP,EU_AI_ACT_MAP).
Reporting
- New renderers:
render_html(self-contained, embedded CSS, no JS — email-able),render_csv(stable columns, Excel-friendly),render_red_team_markdown,render_junit_xml,render_prometheus_metrics. Diff helperdiff_red_team_findingspowers regression detection across runs. - New API route
GET /scans/{a}/compare/{b}returns the structured diff (regressions, fixes, common failures) plus per-side summaries. Web UI at/scans/compare?a=…&b=…includes a JUnit-XML download for the regressions list. - New API route
POST /scans/{id}/share?ttl_seconds=Nissues a Fernet-encrypted token. Public routeGET /share/llm/{token}renders HTML / Markdown / CSV / JSON without auth — only valid forkind: "llm"scans. - Canonical Grafana dashboard at
docs/grafana/pencheff-llm-redteam.json— eight panels consuming the Prometheus exporter.
Integrations
- Slack / webhook / Jira payloads now include a per-OWASP-LLM
category breakdown and the top failed techniques when
target.kind == "llm". The same generic integration matchers apply (per-target scoping, per-event filtering, severity gating). - Scheduled scans now accept LLM targets (validates
llm_configon schedule create).
Plugin SDK
- Three new discovery directories under
~/.pencheff/:custom_llm_strategies/,custom_llm_judges/,custom_llm_providers/. Drop a Python file with anameclass attribute and a method matching the protocol; gate discovery onPENCHEFF_ENABLE_CUSTOM_MODULES=1. Plugins win over built-ins on name collision so a deployment can override the canonicaljailbreaktemplate with a deployment-specific one.
CLI
- New subcommand
pencheff llm-redteamwith--strategies,--datasets,--guardrails,--judge-{provider,endpoint,model},--max-rps,--max-cost-usd,--retries,--fail-on,--output-format {markdown,json,junit,csv,html,prometheus},--output-file, and--compare-to PRIOR_JSONfor CI-friendly regression gating.
Bug fixes
- Headers from the
Credentials.headersschema field now flow correctly into LLM probes. Previously,CredentialStore.add_from_dictread from thecustom_headersdict key but the API schema exposed it asheaders, causing every LLM probe to ship with no Authorization header → silent 401s on every request.
Schema migration
- Migration
0022addskind(string, indexed) andllm_config(JSONB) to thetargetstable; backfillskind = 'repo'for any row whoserepository_id IS NOT NULL. Existing URL targets remainkind = 'url'. Adds composite indexix_targets_workspace_kind_created.
See LLM red team feature page for the full walkthrough, and the Plugin SDK guide for custom strategies / judges / providers.
v0.4.1 — Mobile static analysis, search + pagination across the SaaS UI, Engagements removed (2026-04-28)
A targeted release. Pencheff gains an OWASP-Mobile-Top-10-aware static analyzer for APK/IPA files; the SaaS UI gets paginated, searchable target and assessment lists everywhere; and the Engagements feature (experimental in v0.4.0) is fully removed in favor of the simpler target → assessment workflow.
Mobile static analysis (Phase 1)
- New MCP tool
scan_mobile_static(session_id, apk_path?, ipa_path?, types?, use_mobsf?)— analyzes an Android APK or iOS IPA without an emulator or rooted device. Decompiles viaapktool+jadx(Android) or unzips and parsesInfo.plist(iOS), then sweeps for OWASP Mobile Top 10 issues:- AndroidManifest —
debuggable=true,allowBackup=true,usesCleartextTraffic=true, exported activities/services/receivers/ providers withoutpermission, missingnetworkSecurityConfig, dangerously lowminSdkVersion. - Hardcoded secrets in jadx-decompiled Java — AWS / Google / Firebase / Slack / GitHub / Stripe / Twilio / SendGrid / Mailgun keys, JWTs, PEM private keys, password assignments.
- Insecure crypto — DES, 3DES, RC4, ECB mode, MD5, SHA-1,
hardcoded
SecretKeySpec/IvParameterSpec,java.util.Random. - Cleartext URLs in compiled code.
- iOS Info.plist —
NSAllowsArbitraryLoadsand ATS exceptions for media / WebView, custom URL schemes (deeplink hijacking risk), embedded provisioning profiles. - iOS binary hardening — missing PIE flag (via
otool -hv, macOS only).
- AndroidManifest —
- New scan profile
mobile-static. Passpentest_init(profile= "mobile-static")thenscan_mobile_static(apk_path=...). - Compliance maps for
mobile_misconfig,mobile_secrets,mobile_crypto,mobile_storage,mobile_communication, andmobile_binarycategories added to PCI-DSS, NIST 800-53, SOC 2, ISO 27001:2022, and HIPAA. NewOWASP_MOBILE_TOP_10(M1–M10) name resolution on every finding. - Hardening:
defusedxmlfor the manifest parser (no XXE / billion- laughs), zip-slip guard on IPA extraction, 5 MB cap on per-file scans with possessive-quantifier JWT regex (no ReDoS). - Tools:
apktool,jadx,mobsfscan,qark,aapt/aapt2,androguard,otool,class-dump, andplistutilare allow-listed forrun_security_tool. SetMOBSF_API_KEYto opt into MobSF enrichment viause_mobsf=true.
Dynamic instrumentation (Frida / objection / drozer) is Phase 2 and
remains out of scope for scan_mobile_static.
SaaS UI: search + pagination on every list
/dashboard,/targets,/scans,/targets/{id}, and/repos/{id}now ship a search input (filtering name / URL / kind for targets, and report № / status / grade / target name for assessments) and a paginator on the same row, opposite the search.- Targets paginate at 6 per page, assessments at 20.
- The paginator is always visible alongside the search — even
single-page result sets render
Page 1 of 1with disabled Prev / Next, so users see the same control whether the workspace has 4 assessments or 400.
Engagements — removed
- The entire
/engagementsroute, the Workbench dropdown entry, and the engagement selector inside the Commission Scan modal are gone. Scans now POST without anengagement_id. Findings collected against an Engagement in v0.4.0 are still queryable through the Scans / Targets surface. - The Workbench dropdown’s
Assetslink is also removed; the/assetspage itself remains for direct linking and ASM API consumers.
Tool count: 49 → 50 MCP tools, attack modules 53 → 57.
v0.4.0 — Engage swarm, lifecycle integrations, repos as targets, PAT private repos (2026-04-28)
Major release. Pencheff gains a 9-phase autonomous engagement, a unified target/repo model, and integrations that fire on the full finding lifecycle — not just scan completion.
pencheff engage — 9-phase autonomous swarm
- 30 specialist playbooks registered in
pencheff.playbooks.REGISTRY. 28 are adapted from 0xSteph/pentest-ai-agents; two new ones —crawl_firstandapi_authenticator— own the HTTP-first reconnaissance + login-discovery flow. - 9 phases (was 7): scope → crawl → auth → recon → vuln →
exploit → postex → detect → report. The two new phases populate
session.discovered.endpointswith the real surface before auth runs, so the auth phase picks a discovered login URL instead of guessing from a static 14-path list, and every downstream module tests the actual endpoints rather than just the base URL. - Subdomain fan-out:
pencheff engage --max-subdomains 100runs crawl + auth + vuln + exploit on each discovered subdomain, with findings merged back into the master session. - Tier 1 / Tier 2 model + OPSEC noise tagging (quiet / moderate / loud) + MITRE ATT&CK mapping on every finding.
- Engagement DB at
~/.pencheff/engagements.dbfor cross-session state (engagements, hosts, services, vulns, credentials, chains, session_log).
SaaS UI: Engage profile
- “Engage (full swarm)” added to the commission-scan modal — drives the same 9-phase pipeline from the dashboard.
- Live progress streaming: each phase + each playbook + each subdomain emits a scan-log line and an SSE event as it runs. The progress bar moves visibly across the 9 phases instead of frozen at 5% for ~10 minutes.
API-first authentication
- Default credential-based login replaced Playwright with HTTP API
probing across 14 common login endpoints. ~2-second login vs
15–30s, no Chromium dep, no SPA hydration races, no Cloudflare
Turnstile triggers. Playwright stays as the escape hatch for SSO /
SAML / MFA / CAPTCHA flows when explicit
login_stepsare supplied.
Integrations: lifecycle events + per-target scope
- Two new destinations: Google Chat
(webhook) and Jira (creates one issue per
finding_new, comments on the existing issue forfinding_changedwhen the issue key is on the finding’sexternal_refs). - Per-target scope — every integration carries a
target_idsarray. NULL = all targets; populated = only fire for scans against those targets. Targets here include both DAST URL targets and repo-mirror targets. - Per-event filter —
events: ["scan_started", "scan_done", "scan_failed", "finding_new", "finding_changed"]. Wire e.g. a PagerDuty integration scoped toscan_failed+finding_newfor one production target, while a Slack channel takes the full firehose for everything. - Five lifecycle hooks instead of one. The Celery
notify_event(scan_id, event_type, finding_id?, change_summary?, error?)task is the single dispatch surface; hooks at scan start / done / failed and at every finding-mutation endpoint (verify, suppress, unsuppress, recheck) enqueue it.
Repos as first-class Targets
- New column
targets.repository_id UUID NULL FK → repositories(id) ON DELETE CASCADE. Every Repository auto-mirrors as a Target row on registration; deleting the Repository cascades to the mirror. - Repo-mirror targets show up everywhere URL targets do — the
Targets dashboard, the integrations target multi-select,
GET /targets. They carrykind: "repo"so the UI can render a badge and route the commission-scan modal to/repos/{id}/scaninstead of/scans. - DAST scan against a repo-mirror target → 400 with a clear pointer to the repo-scan endpoint.
PAT-authenticated private repos
- New column
repositories.token_encryptedfor Fernet-encrypted Personal Access Tokens. POST /repos/githubaccepts an optionaltokenfield. With a token → validates it against the GitHub REST API, persists it encrypted, setsprivate=True. Without a token → existing public-clone behaviour.- Repo-scan worker decrypts the PAT and uses it as the
x-access-tokenpassword forgit clone. Re-registering the same repo URL with a new token rotates the stored credential without disturbing scan history or the mirror Target.
/targets/new and /repos redesign
- “Local folder” registration removed entirely from both pages. The worker can’t honestly know which paths it’ll see at scan time, so every repo path is now GitHub-based.
- 3-source picker on
/targets/new(Repository): Public GitHub URL · Private GitHub (PAT) · Pencheff GitHub App. - Same model on
/reposwith a 2-tab toggle (Public / Private PAT) plus the always-on GitHub App card at the top. - Detailed inline collapsible instructions for both flows: how to create a fine-grained or classic PAT (with exact scope/permission recommendations) and how to install the Pencheff GitHub App (step-by-step + permissions table + adding more repos later + removing access).
Migrations
0018— addintegrations.target_ids(UUID[]) +integrations.events(varchar[]) + GIN indexes.0019— addtargets.repository_id(UUID FK CASCADE) + idempotent backfill of one mirror Target per existing Repository.0020— addrepositories.token_encrypted(bytea NULL).
v1.0 — Expanded security workflows (2026-04-21)
Major release. Pencheff now covers the full enterprise DAST + AppSec surface in one tool.
SCA + SBOM + IaC + container
scan_dependencies— parse manifests for npm, PyPI, Go, crates.io, RubyGems, Packagist, Maven → OSV.dev CVE query → EPSS + CISA KEV enrichment.generate_sbom— produce SPDX 2.3 + CycloneDX 1.5 natively; preferssyftwhen installed.check_licenses— policy-driven license compliance (allows, denies, unknown behaviour).reachability.annotate— mark unimported deps as low-reachability to suppress noise.scan_dockerfile,scan_kubernetes,scan_terraform,scan_helm,scan_container_image.
Network VA
scan_host_vulns— Pencheff service detection → CVE lookup.scan_network_misconfig— Redis, Mongo, Elastic, Memcached, Docker, MySQL, PG, SNMP.scan_authenticated_host— SSH / WinRM / SMB package audit.scan_industrial_protocols— Modbus, BACnet, S7, EtherNet/IP, DNP3.- Local SQLite CVE cache with EPSS + CISA KEV refresh.
Intercepting proxy + fuzzer + YAML automation
start_proxy/stop_proxy— mitmproxy + pure-Python fallback.fuzz_parameter— request-template differential fuzzer with bundled XSS / SQLi / dir / param wordlists and 7 encoders.run_policy— full YAML ScanPolicy schema v1, assertions, thresholds, reports, schedule.- New passive scanner with 25+ regex rules across flows + active traffic.
Attack Surface Management + scheduling + collaboration
asm_discover— subfinder + crt.sh + optional Shodan.asm_diff/asm_cert_watch— change detection + CT log watch.- Cron-driven scheduled scans (Celery Beat).
- Finding SLA tracking (severity → due date → hourly breach monitor).
- Comments, assignment, tags, first-class collab endpoints.
- 7 integrations: Slack, Teams, Discord, PagerDuty, Opsgenie, Splunk HEC, signed generic webhook.
Risk scoring
- EPSS + CISA KEV enrichment on every finding.
risk_score = cvss × (1 + epss) × (2 if kev else 1)sorts reports by actual exploit likelihood.
Plugin SDK
BaseTestModuleformalised with lifecycle hooks.- Auto-discovery from
~/.pencheff/custom_modules/behindPENCHEFF_ENABLE_CUSTOM_MODULES=1. pencheff init-modulescaffold generator.
API + dashboard
- 9 new DB tables: schedules, assets, integrations, sboms, dependencies, proxy_sessions, finding_comments, finding_assignments, finding_tags.
- 7 new routers, 4 new Celery tasks (scheduled dispatcher, asset discovery, SLA monitor, integration fan-out).
- 5 new dashboard pages: /schedules, /assets, /integrations, /sbom/[scanId], /dependencies/[scanId].
- Nav bar updated with all new links.
Total
- MCP tools: 49 → 81
- Scan profiles: 6 → 13
- External tool allowlist: +14
- DB tables: +9
- Next.js pages: +6
- Compliance frameworks: 6 (OWASP, PCI-DSS, NIST, SOC 2, ISO 27001, HIPAA)
v0.2.1 — (2026-02-15)
Baseline release — DAST + exploit-first pentest agent.