What an AI code auditor actually finds: patterns from 164 findings
Across four AI-generated codebases - 181 files and ~45k lines - YCAudit has raised 164 findings. The same failure modes keep coming back: secrets in .env, unvalidated LLM output reaching real actions, and missing tests for the one invariant that mattered. Here are the numbers.
Across four AI-generated codebases - 181 source files, roughly 45,000 lines - YCAudit has raised 164 findings. The same handful of mistakes shows up almost every time. This post is the aggregate: the numbers, and the bugs that repeat.
The mix skews toward High and Medium - the band where code passes a demo and fails on an edge case, a race, or an adversarial input. YCAudit reads each repository the way a skeptical senior engineer would, then hands a coding agent a stack of root-caused fix specs.
4
Codebases audited181
Source files~45k
Lines of code164
Total findingsThe four runs
They span different domains and very different sizes - the smallest is barely a thousand lines, the largest thirty thousand. We keep all four anonymous here; the point is the pattern that holds across them, not any one project.
| Codebase | Files | Lines | Findings | C / H / M / L |
|---|---|---|---|---|
| Codebase A | 101 | 10,481 | 62 | 3 / 24 / 27 / 8 |
| Codebase B | 40 | 30,495 | 48 | 2 / 12 / 26 / 8 |
| Codebase C | 21 | 2,485 | 47 | 1 / 16 / 20 / 10 |
| Codebase D | 19 | 1,150 | 7 | 0 / 2 / 3 / 2 |
Severity, in aggregate
Criticals are rare - six across all four runs - but they showed up in three of the four codebases, almost always as a secret on disk or a broken identity boundary. The volume lives in High and Medium.
6
Critical
54
High
76
Medium
28
Low
The findings that repeat
This is the part worth your attention. These are not exotic bugs - they are the same six failure modes, surfacing across unrelated codebases, because they come from how LLMs generate code rather than from any one project.
Secrets committed to .env
LLMs fill credential fields with concrete values during the first generation pass and defer env-var extraction to a step treated as external to the code task - so it never happens. We find live keys and tokens on disk, sometimes in git history.
Unvalidated LLM output reaching real actions
Model-generated JSON is cast to a TypeScript type instead of parsed with a schema, then its fields are passed straight into an action dispatch. Hallucinated or injected parameter values sail through every "typed action" claim.
Approval gates enforced at the UI, not the dispatch
The confirmation guard wraps the call site while the underlying dispatch function stays exported and callable without it. The invariant looks enforced in a click-through demo and is bypassable from any other caller.
Prompt injection at the data-to-prompt boundary
External, attacker-influenced data is interpolated into prompts verbatim. The model generates the happy path - clean data, good answer - and skips the adversarial path entirely.
Missing tests for the one invariant that mattered
Coverage looks healthy because tests exercise the happy path through the UI. The guarantees that actually matter - "rejection must not write back", "a double-click must be dropped" - go untested.
Fixture / production bleed and non-atomic writes
Demo fixtures and production code share a boundary nobody documented, and the audit-log write is not atomic with the action it records - so a mid-sequence failure leaves a writeback with no log, or a log with no writeback.
What a Critical looks like
The six Criticals are not exotic - they are a few patterns that recur. The examples below are representative and illustrative - renamed and simplified, not tied to any one codebase - but the shape of each is exact.
A live secret committed to version control
A working provider API key sitting in a committed .env. A .gitignore rule was added later, but the key stays in git history - recoverable from every clone long after it looks deleted.
An approval gate that only guards the UI
The human-in-the-loop check is enforced in the component that renders the confirm button. The function that actually performs the action stays exported and callable - with no guard - from anywhere else in the codebase.
function ApprovalDrawer() {
if (!confirmed) return null;
dispatchAction(payload); // guarded only here, at the call site
}
// ...but the same function is exported and reachable unguarded
export function dispatchAction(p) { /* writes to the system of record */ }Illustrative — renamed and simplified, not from any specific codebase.Model output reaching a privileged action unvalidated
LLM-generated JSON is cast straight to a typed object instead of being parsed and validated, then its fields are handed to a state-changing call. A hallucinated or injected value flows through untouched.
// model output trusted as a typed object — no schema check
const action = JSON.parse(llmResponse) as ActionRequest;
applyAction(action.target, action.args); // runs whatever the model producedIllustrative — renamed and simplified, not from any specific codebase.Which dimensions light up
The findings spread across twelve audit dimensions, but the weight is consistent: Core review, Documentation, QA & coverage, Security, and AI-codegen surface the most on nearly every run. The lighter dimensions - data migration, runtime verification, frontend - often come back empty, and YCAudit says so explicitly instead of going silent, which is its own kind of signal.
What the numbers say
AI writes the happy path well and skips the adversarial one almost completely. It fills secrets inline and defers the cleanup that never comes. It builds the guardrail at the call site and leaves the underlying function exposed. None of these are visible in a demo - they are visible in an audit. That gap, between "it works" and "it holds", is the whole reason to run a slow second pass before AI-generated code reaches anything real.