Blog
YCAudit
AI Codegen
Security
Code Quality

What an AI code auditor actually finds: patterns from 164 findings

Across four AI-generated codebases - 181 files and ~45k lines - YCAudit has raised 164 findings. The same failure modes keep coming back: secrets in .env, unvalidated LLM output reaching real actions, and missing tests for the one invariant that mattered. Here are the numbers.

June 11, 2026
·6 min read·Yeda AI Team

Across four AI-generated codebases - 181 source files, roughly 45,000 lines - YCAudit has raised 164 findings. The same handful of mistakes shows up almost every time. This post is the aggregate: the numbers, and the bugs that repeat.

The mix skews toward High and Medium - the band where code passes a demo and fails on an edge case, a race, or an adversarial input. YCAudit reads each repository the way a skeptical senior engineer would, then hands a coding agent a stack of root-caused fix specs.

4

Codebases audited

181

Source files

~45k

Lines of code

164

Total findings

The four runs

They span different domains and very different sizes - the smallest is barely a thousand lines, the largest thirty thousand. We keep all four anonymous here; the point is the pattern that holds across them, not any one project.

CodebaseFilesLinesFindingsC / H / M / L
Codebase A10110,481623 / 24 / 27 / 8
Codebase B4030,495482 / 12 / 26 / 8
Codebase C212,485471 / 16 / 20 / 10
Codebase D191,15070 / 2 / 3 / 2

Severity, in aggregate

Criticals are rare - six across all four runs - but they showed up in three of the four codebases, almost always as a secret on disk or a broken identity boundary. The volume lives in High and Medium.

6

Critical

54

High

76

Medium

28

Low

The findings that repeat

This is the part worth your attention. These are not exotic bugs - they are the same six failure modes, surfacing across unrelated codebases, because they come from how LLMs generate code rather than from any one project.

Secrets committed to .env

LLMs fill credential fields with concrete values during the first generation pass and defer env-var extraction to a step treated as external to the code task - so it never happens. We find live keys and tokens on disk, sometimes in git history.

Unvalidated LLM output reaching real actions

Model-generated JSON is cast to a TypeScript type instead of parsed with a schema, then its fields are passed straight into an action dispatch. Hallucinated or injected parameter values sail through every "typed action" claim.

Approval gates enforced at the UI, not the dispatch

The confirmation guard wraps the call site while the underlying dispatch function stays exported and callable without it. The invariant looks enforced in a click-through demo and is bypassable from any other caller.

Prompt injection at the data-to-prompt boundary

External, attacker-influenced data is interpolated into prompts verbatim. The model generates the happy path - clean data, good answer - and skips the adversarial path entirely.

Missing tests for the one invariant that mattered

Coverage looks healthy because tests exercise the happy path through the UI. The guarantees that actually matter - "rejection must not write back", "a double-click must be dropped" - go untested.

Fixture / production bleed and non-atomic writes

Demo fixtures and production code share a boundary nobody documented, and the audit-log write is not atomic with the action it records - so a mid-sequence failure leaves a writeback with no log, or a log with no writeback.

What a Critical looks like

The six Criticals are not exotic - they are a few patterns that recur. The examples below are representative and illustrative - renamed and simplified, not tied to any one codebase - but the shape of each is exact.

Critical
A live secret committed to version control

A working provider API key sitting in a committed .env. A .gitignore rule was added later, but the key stays in git history - recoverable from every clone long after it looks deleted.

Critical
An approval gate that only guards the UI

The human-in-the-loop check is enforced in the component that renders the confirm button. The function that actually performs the action stays exported and callable - with no guard - from anywhere else in the codebase.

function ApprovalDrawer() {
  if (!confirmed) return null;
  dispatchAction(payload);   // guarded only here, at the call site
}

// ...but the same function is exported and reachable unguarded
export function dispatchAction(p) { /* writes to the system of record */ }
Illustrative — renamed and simplified, not from any specific codebase.
Critical
Model output reaching a privileged action unvalidated

LLM-generated JSON is cast straight to a typed object instead of being parsed and validated, then its fields are handed to a state-changing call. A hallucinated or injected value flows through untouched.

// model output trusted as a typed object — no schema check
const action = JSON.parse(llmResponse) as ActionRequest;
applyAction(action.target, action.args);  // runs whatever the model produced
Illustrative — renamed and simplified, not from any specific codebase.

Which dimensions light up

The findings spread across twelve audit dimensions, but the weight is consistent: Core review, Documentation, QA & coverage, Security, and AI-codegen surface the most on nearly every run. The lighter dimensions - data migration, runtime verification, frontend - often come back empty, and YCAudit says so explicitly instead of going silent, which is its own kind of signal.

Core review · logic, error paths, dead code
Security · secrets, trust boundaries, injection
Performance · hot paths, redundant work
QA & coverage · gaps, brittle tests
Architecture · layering, coupling
Maintainability · complexity, duplication
AI-codegen · fixture/prod bleed, hallucinated APIs
Documentation · drift, missing contracts
API contract · schema + interface mismatches
Data migration · schema changes, backfills, FK integrity
Frontend · UI state, rendering, a11y
Runtime verification · live probes, repro scripts

What the numbers say

AI writes the happy path well and skips the adversarial one almost completely. It fills secrets inline and defers the cleanup that never comes. It builds the guardrail at the call site and leaves the underlying function exposed. None of these are visible in a demo - they are visible in an audit. That gap, between "it works" and "it holds", is the whole reason to run a slow second pass before AI-generated code reaches anything real.