Claude Code's Source Got Leaked Through a Build Pipeline. The Agent Was Governed. The Pipeline Wasn't.

Anthropic built Undercover Mode to stop their AI from leaking internal information. A source map in an npm package leaked everything. The action that published it was never evaluated.

What Happened

On March 31, 2026, the entire source code of Anthropic's Claude Code CLI — their flagship AI coding agent — was found sitting in plain sight on the npm registry. A source map file bundled into the published package exposed 512,000 lines of TypeScript, complete with system prompts, feature flags, internal model codenames, and unreleased capabilities. Within hours, the code was mirrored across GitHub and analyzed by thousands.

The irony: Anthropic had built an entire subsystem called "Undercover Mode" specifically designed to prevent their AI from leaking internal information. They governed the agent's words. Nobody governed the action that leaked everything.

Claude Code is Anthropic's official CLI tool for AI-assisted development — editing files, running commands, managing git workflows from the terminal. When you build a JavaScript/TypeScript package, the build toolchain can generate source map files that bridge minified production code back to the original source. These exist for debugging. They should never ship to production.

Version 2.1.88 of @anthropic-ai/claude-code was published to npm with the source map included. The file was 59.8 MB.

What was exposed:

Complete agent logic, tool-calling system, and permission guardrails. Full system prompts governing Claude's behavior. 44 feature flags for unreleased capabilities. Internal model codenames ("Capybara," "Tengu"). Telemetry that tracks user frustration signals. The "Undercover Mode" system preventing AI attribution in public commits. Permission bypass and approval flows. An unreleased autonomous agent system called "KAIROS." A planning mode called "ULTRAPLAN" that offloads to remote containers.

This is not the model weights or training data. It's the orchestration layer — how Claude Code decides what to do, what it's allowed to do, and how it executes. For competitors, it's a blueprint. For attackers, it's a map.

The Pattern

This is the second supply chain incident we've covered. The first was the LiteLLM PyPI compromise on March 24 — a malicious package version that stole every credential on the machine. Different attack, same structural failure.

LiteLLM: An agent's environment was compromised through a dependency. Credentials were stolen because they lived in the same environment as untrusted code.

Claude Code: A build pipeline published sensitive internal code to a public registry. The action executed without business context evaluation.

In both cases, the action that caused the damage was technically valid. The build passed. The package was well-formed. npm accepted it. Every automated check said yes.

Nobody asked: "Should this specific package, with this specific content, be published to a public registry right now?"

That's a business decision. Not a CI check.

What a Control Layer Would Have Caught

If the npm publish action had flowed through a control layer like Surfit, the sequence looks different:

1. Action classification. Publishing to a public package registry is an external-facing, irreversible action. A publish to npm with a 59.8 MB source map touching internal systems would classify as Wave 4 or 5 — held for review.

2. Content evaluation. Surfit evaluates context. A package containing system prompts, internal codenames, feature flags, and debug artifacts would trigger content sensitivity modifiers. The wave score goes up before anyone looks at it.

3. Credential separation. If the npm publish token lived in Surfit instead of the build pipeline, the pipeline couldn't publish without Surfit in the path. The pipeline proposes. Surfit decides. The token never leaves the control layer.

4. Cross-system correlation. Code build in one system followed by external publish in another is a detectable pattern. Surfit's correlation engine watches for exactly these sequences — internal changes followed by external-facing actions.

None of this requires AI. It requires architecture. The publish action needed to flow through a layer that evaluates business context before execution. That layer didn't exist.

The Undercover Mode Paradox

This is the part worth sitting with.

Anthropic built "Undercover Mode" — a system that injects instructions into Claude's prompt to prevent the AI from revealing internal information in public commits. No model codenames. No internal project names. No AI attribution. The system works. Claude follows the rules.

But the system prompt itself — including the Undercover Mode instructions — shipped in the package that was published without governance. The agent was governed. The pipeline that published the agent wasn't.

They built controls for what the AI says. They didn't build controls for what the pipeline does.

This is the gap. Every team building AI agents will hit it. You can govern model outputs. You can sandbox execution environments. You can separate credentials. But if the actions that move code, publish packages, deploy infrastructure, and touch production systems don't flow through a decision layer — the governance is incomplete.

Output validation answers: "Is this text safe?"

Sandboxes answer: "Can this process run here?"

CI/CD answers: "Does this build pass?"

Nobody answers: "Should this action happen right now — given what's at stake for the business?"

What This Means

The Claude Code leak didn't compromise user data. No model weights were exposed. Anthropic pulled the package quickly. But the source is permanently available — mirrored across GitHub, analyzed by competitors, studied by security researchers.

The damage isn't the code. The damage is the precedent. If a company with Anthropic's resources, security focus, and safety reputation can't prevent a build pipeline from publishing internal source code to a public registry, the problem isn't process discipline. The problem is architecture.

Every team deploying AI agents has build pipelines, publish workflows, and deployment automation that touch production systems. Every one of those actions is a business decision executing without business context.

The question isn't whether your build pipeline will make a mistake. It's whether anything in the execution path will catch it before the action is irreversible.

Surfit is the control layer for AI agent actions. Every action is classified, evaluated in business context, and either auto-executed or held. The agent proposes. Surfit decides.

Watch the Demo
← Back to Blog