Tutorial: Monorepo repo scan

Six scanners fan out in parallel against a clone of the connected repo. This tutorial covers the bits that matter when the repo is a big polyglot monorepo: language detection, exclude paths, default branch pinning, and reading the unified findings.

Scenario

Repo. github.com/acme-co/platform — ~1.5M LOC across Python (50%), TypeScript (35%), Go (10%), Terraform / Helm (5%).
Constraint. The vendored copy of node_modules/ and a giant Python pip cache should never be scanned.
Goal. A scan that finishes inside the 30-min CI budget and routes findings to the right team.

1. Connect via the GitHub App

The GitHub App is the only path that surfaces Dependabot alerts and fix-PRs. PAT and public-URL paths work too but ship without the webhook + fix-PR features.

Sign in at app.pencheff.com, open Repos, click Install Pencheff on GitHub.
Pick the acme-co organisation, Only select repositories → platform. Approve.
The new install card shows up under Connected GitHub accounts. Click Sync repos.

The repo auto-mirrors as a Target row with kind: "repo", so it appears in the dashboard, the integrations target multi-select, and the unified findings stream.

2. Pin the default branch

curl -X PATCH -H "Authorization: Bearer $PENCHEFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"default_branch": "main"}' \
  "$PENCHEFF_API_BASE/repos/$REPO_ID"

Vendor pruning is automatic — every scanner honours .gitignore. For files that aren’t gitignored but should still be ignored (vendored code, generated dirs, large fixtures), add them to .gitignore or use a per-scanner suppression on the findings they produce.

3. Trigger the scan

curl -X POST -H "Authorization: Bearer $PENCHEFF_API_KEY" \
  "$PENCHEFF_API_BASE/repos/$REPO_ID/scan"

After this point every push to the default branch fires a webhook that auto-triggers a re-scan with the same settings.

4. Read the unified findings

The scan opens under /repos/scans/{id}. The page lists the six scanners that ran and the count each produced; the unified findings table de-duplicates rows that two scanners flagged for the same root cause.

Scanner	Typical signal in this monorepo
Semgrep OSS	Insecure JWT verification in the TypeScript edge service
Bandit	Use of `subprocess.shell=True` in a Python admin script
gosec	Weak rand seeded with `time.Now()` in a Go session id generator
Trivy (SCA)	A handful of HIGH CVEs from `urllib3` < 1.26.18
Trivy IaC + Checkov	EKS pod-security-policy violations under `infra/k8s/`
gitleaks	One private key checked in to a test fixture
YARA	Known JS loader signature in a vendored chunk — suppress with an exclude path

5. Open the compliance + SBOM

/repos/scans/{id}/compliance — the per-scan rollup, same six frameworks as URL DAST scans. RepoFinding rows infer their category from the scanner that produced them, so the rollup speaks the OWASP-Top-10 / PCI-DSS dialect even though the source data is SAST + SCA + IaC.
Generate SBOM on the repo page — CycloneDX 1.5 + SPDX 2.3 from the same manifest parsers that drove SCA.

6. Route by team

Configure two Slack integrations — one per team — under Settings → Integrations, each scoped to the repo’s target id and the relevant event filter (finding_new, finding_changed) plus a severity gate. Per-target scoping is already supported by the integrations layer; the routing rule is “target id + event + severity.”

For finer-grained routing — e.g. services/api/ to #platform-eng and services/web/ to #frontend — the integration matcher today operates at target granularity, so split the monorepo across two Pencheff repo targets that share the same GitHub repo if the finer routing matters more than the unified view.

Deliverable

A repo scan that runs in under 30 minutes on every push.
A compliance rollup the security review checklist can consume directly.
A live SBOM matching every SCA finding.
Per-team Slack routing.

IoT device scanning SCA + supply chain