Local repository scanning
The web app scans GitHub repositories by cloning them into Pencheff’s
cloud workers. Studio runs the same scanners — semgrep, gitleaks,
trivy_fs, osv-scanner — on the laptop where the code already lives,
then streams only the findings (not the source) back to your workspace.
Use cases:
- Private monorepos that don’t have a GitHub App installation or that policy forbids cloning off-network.
- Pre-commit local checks before you’ve pushed anything.
- Air-gapped or offline workstations — start a scan, finish later, ingest results when you reconnect.
Workflow
-
Register the folder as a local-provider repo. In Studio → Repos → Add repository → Local folder, point at the path on disk. Studio mirrors it as a
Targetso it shows up alongside cloud repos in scans and findings. -
Open a desktop-local scan row. Click Run scan on the local repo. Studio calls:
POST /repos/{repo_id}/scan/local Content-Type: application/json { "commit_sha": "abc1234…", // optional, defaults to current HEAD "scanners": ["semgrep", "gitleaks", "trivy_fs", "osv-scanner"] }The API returns a
RepoScanrow inrunningstate withtrigger="desktop_local"so it appears in the Assessments list immediately. -
Studio runs the scanners locally. Each tool is invoked as a separate
Process(no shell), with stdout parsed into the canonical finding schema. Progress streams into the bottom-of-window status tray. -
Studio ingests the findings. On completion (or failure), Studio calls:
POST /repos/scans/{scan_id}/ingest Content-Type: application/json { "findings": [ /* canonical RepoFinding[] */ ], "stats": { "files_scanned": 1234, "duration_ms": 47210 }, "sbom": { /* optional CycloneDX */ }, "error": null }The scan row transitions to
done(orfailed). Findings now show up everywhere — the Studio Findings view, the web Assessments page, the unified-findings stream, and any wired integrations (Slack, Jira, GitHub check runs).
The ingest endpoint refuses to accept findings for any scan whose
trigger is not desktop_local — cloud-worker scans are owned by Celery
and cannot be hijacked.
Required scanners on your Mac
Studio doesn’t bundle the scanner binaries; it expects them on PATH.
brew install semgrep gitleaks trivy osv-scannerThe Studio Settings → Local Data section shows which scanners resolved and which are missing. A missing scanner is non-fatal — it just gets skipped for the affected scan, and the missing tool is recorded on the scan row.
What gets sent to Pencheff’s cloud, and what doesn’t
| Data | Sent to cloud? |
|---|---|
| Finding metadata (rule ID, severity, title, file path, line range, snippet) | ✔ Yes — that’s the whole point |
| Source code at large — full files, untouched lines | ✘ No |
| Repository git history, blobs, .git directory | ✘ No |
| SBOM (CycloneDX / SPDX), if you opt in | ✔ Optional, controlled per-scan |
| Scanner stdout / stderr (for debugging failures) | ✘ No, kept local |
Code snippets attached to findings are bounded to the lines the finding references (typically 5-20 lines of context), never whole files.
Sandbox
Pencheff Studio is not sandboxed. The
com.apple.security.app-sandbox entitlement is intentionally absent
because sandboxed apps cannot:
- Read or execute Homebrew-installed binaries outside their container — every scanner would resolve as “not installed”.
- Read arbitrary user paths — the local repository scanning use-case fundamentally requires reading the folder you point at.
To compensate, Studio ships with:
- Hardened Runtime enabled (
-o runtimecodesign flag). - Developer ID Application signing + Apple notarisation + a stapled
ticket — Gatekeeper accepts the binary without
xattr -dr com.apple.quarantine. - Only two entitlements granted:
network.clientandnetwork.server(the second is for the loopback OAuth callback listener). - No
~/Librarywrites outside~/Library/Application Support/com.pencheff.studio/. - A local SwiftData mirror that you can wipe from Settings → Local Data → Clear local data.
We treat this as the appropriate trade-off for a developer-installed security tool distributed outside the App Store. If your organisation requires sandboxed binaries for installed software, the web app + MCP server combination covers every other workflow.
Limitations
- Single repo per scan — chain runs with
Schedulesif you want a batch. - No live progress percentage — scanners stream findings as they
arrive, but a sub-1s ETA isn’t computable for
semgreprule packs. - Local scans don’t trigger GitHub check runs — the repo isn’t cloud-attached, so there’s no commit to annotate. Use a cloud-trigger scan if you need check-run integration.