Auditing a governed agent: what YCAudit found in Yopa
We pointed YCAudit at Yopa - a Foundry agent we built specifically to be governed and human-approved - and it still surfaced 48 findings, including a committed .env and zero automated tests on the drawer that is the agent's only write path. Here is the full run.
Yopa is a Foundry agent we built to be governed by default: it reconciles conflicting order systems, proposes a cited next action, and refuses to write anything back until a human approves it in a drawer. It is open-sourced as Yopa_Palantir. Governance is the whole point of it - which makes it the most interesting possible target for YCAudit. If our auditor finds governance holes in the agent we designed to be governed, that tells you something about shipping AI-generated code in general.
So we ran the full audit - no cherry-picking, the same pipeline we run on a customer repo. The target was 40 source files, ~30,495 lines across a six-package TypeScript monorepo. The run ranked 48 findings by severity, each with a confirmed root cause and a fix spec.
40
Source files30,495
Lines of code11
Dimensions48
FindingsHow a run is structured
The pipeline is four linear stages with a fan-out in the middle. It builds context first, raises findings across many dimensions, then forces every finding through a root-cause gate before anything is written down.
1. Understand
Reads the repo cold and writes a codebase brief - what the system is, its entry points, data shapes, and the seams worth probing - before a single finding is raised.
2. Review + specialty audits
A core review pass plus parallel lanes for security, performance, QA, architecture, maintainability, AI-codegen, docs, API contracts, data migration, frontend, and runtime verification each surface findings in their own dimension.
3. Investigate
Every candidate finding is run down to a confirmed root cause. The Iron Law of the pipeline: no fix without an explanation of exactly why the bug happens.
4. Spec
Each confirmed issue becomes a precise fix spec plus a reproduction script - a handoff a coding agent can apply - never an in-place edit made by the auditor.
The severity spread
Most findings cluster in High and Medium - the band where a system works in the demo and breaks under an edge case, a race, or an adversarial input. Two rose to Critical: a committed .env and zero automated tests on the approval drawer that is the agent's only write path. We are keeping the specific findings to ourselves while they are addressed; what is worth sharing is the shape of the run, not a checklist of what to poke at in a public repo.
2
Critical
12
High
26
Medium
8
Low
The findings spread across ten of the eleven dimensions, with core review, QA, and documentation leading the count. Only the dedicated security lane came back empty - and not because the code is spotless: the credential and injection issues surfaced under the core-review and AI-codegen lanes instead. The report names the empty dimension explicitly rather than going quiet - a useful property in itself: a silent dimension and a clean one look identical unless the tool tells you which it is.
Why this is the good news
Running our own auditor over our own governed agent and surfacing 48 findings is not an embarrassment - it is the point. Every finding ships as a spec: the root cause, the precise change, and a reproduction script, never an in-place edit by the auditor. The generator already moved fast; the audit is the slow, skeptical second pass that explains why something is wrong before anyone touches it. A governed agent is exactly the kind of system that earns that second pass.