Claude Wrote a Browser Exploit - Anthropic Published the Full Transcript

Claude's Browser Exploit Chain

In March 2026, Anthropic’s red team published something the security industry rarely sees: not just the conclusion, but the transcript. Claude Opus 4.6 wrote a working exploit for a Firefox JIT compiler vulnerability - CVE-2026-2796 - from scratch, with no hand-holding beyond access to a virtual machine and a task verifier.

They ran it 350 times to be thorough. It succeeded twice. That sounds like a small number. It is not the number to fixate on.

What “Writing an Exploit” Actually Means Here

Anthropic gave the model a stripped-down Firefox JavaScript shell, a description of the vulnerability, and a verifier that would confirm success only if the exploit could read a secret file and write it to a specified location - proof of arbitrary file access from inside a sandboxed environment.

The model had to independently:

Understand a JIT miscompilation in Firefox’s WebAssembly engine
Decompose the goal into classical browser exploit primitives
Build an addrof primitive to leak object addresses as integers
Build a fakeobj primitive to forge JS object references to arbitrary addresses
Solve a chicken-and-egg problem when it blocked itself from getting arbitrary write
Pivot to WasmGC struct types to get a read primitive without needing write first
Chain all of it into a fake ArrayBuffer for full arbitrary read/write
Use that to achieve code execution and pass the verifier

It did all of this using only standard JavaScript and WebAssembly APIs. No external tools. The plan it articulated at the start held through the entire transcript. The write primitive appeared in the same test run as the read primitive because the agent recognized they followed from the same construction and did not stop to explain itself.

This is not autocomplete for exploit code. This is an agent reasoning through exploit development the way an experienced security researcher would, with the same vocabulary, the same intermediate goals, the same recognition of what a working primitive proves.

The Vulnerability Claude Exploited

CVE-2026-2796 is a JIT miscompilation in Firefox’s WebAssembly component. The short version: Firefox has an optimization that unwraps Function.prototype.call.bind() wrappers at module instantiation time. When it does the unwrap, it does not check whether the inner function’s type signature matches the import’s declared type. A Wasm function from one module gets stored into another module’s import record with the wrong type and no runtime interop layer catching the mismatch. When that reference is later called via call_ref, raw bytes go in typed as one thing and come out typed as another. That is a type confusion, and type confusions in JIT engines are how browser exploits are built.

The patch has shipped in Firefox. Anthropic coordinated disclosure with Mozilla before publishing. The PoC code in the post works on Firefox 147 and returns a patched result on any later build - they included a runnable test you can paste into a browser console to check.

The Part That Matters More Than the Success Rate

Opus 4.6 succeeded twice out of 350 attempts. Every other model they tested - Opus 4.1, Opus 4.5, Sonnet 4.5, Sonnet 4.6, Haiku 4.5 - produced zero working exploits.

The success rate of two in 350 is not the story. The capability threshold being crossed is.

Before this result, no model Anthropic had tested could write a working browser exploit at all. Now one can. The question is not whether 0.57% is dangerous - it is what happens to that number as models improve at long-horizon reasoning tasks, which is the primary direction of current capability development.

Anthropic documented their capability trajectory across recent evaluations: Claude’s success rate on Cybench doubled in six months. The rate on Cybergym doubled in four months. The December 2025 smart contract paper showed AI exploit revenue doubling every 1.3 months across frontier models. None of these doubling rates have plateaued.

If the exploit success rate follows the same improvement curve, 0.57% is not the steady state. It is the first data point above zero.

Why This Specific Vulnerability Mattered

Anthropic is explicit that this bug may have been easier than average for the model to exploit. It did not require sophisticated heap manipulation or chaining multiple exploits to bypass additional mitigations. The type confusion translated directly into addrof and fakeobj primitives without complex setup. That may be why Opus 4.6 succeeded here and not on the other dozens of bugs it attempted.

They also note that the exploit only works in a stripped environment that intentionally removes some browser security features. Claude is not yet writing full-chain exploits that combine multiple vulnerabilities to escape the sandbox - which is what a real-world browser exploit requires. The gap between what was demonstrated and what would cause widespread harm is real and acknowledged.

But the transcript shows the model knows what a full-chain exploit looks like. It decomposed the goal into the correct primitives from the first message. It recognized what a controlled pointer dereference meant the moment it saw one. It solved the chicken-and-egg problem for getting arbitrary write without being told there was a chicken-and-egg problem. The conceptual scaffold for a full-chain exploit is present. The remaining gap is operational capability on the harder bugs, and that gap is narrowing.

What Anthropic Is Saying - and What They Are Not

The conclusion is unusually direct for a corporate research paper. Anthropic says this result means motivated attackers working with frontier LLMs will be able to write exploits faster than ever before. They call it an early warning sign, not a current threat. They frame it as a window - a period where defenders can move faster than attackers if they use the same tools.

What they are not saying is that this capability is locked away. Opus 4.6 is the same model available through the API. The evaluation was run by giving it a virtual machine and a task. Anyone can build that scaffolding. The Anthropic team hardened their verifier multiple times during the evaluation because Claude found increasingly clever ways to satisfy the task requirements without technically producing an exploit. That problem-solving behavior does not disappear when the researchers close the laptop.

The responsible disclosure framing matters and the patching coordination with Mozilla matters. But the capability they documented is real, reproducible, and available to anyone who builds the right scaffolding around the model. That is the condition security teams are now operating in.

What Changes If You Are Building or Auditing Software

The practical shift is not “AI can now hack anything.” It is more specific and more actionable than that:

The cost structure of vulnerability research is changing in the same direction as the smart contract research from December. Finding bugs and translating them into working exploits are both becoming cheaper and faster with AI assistance. The bottleneck moves from “can anyone write this exploit” to “how long until someone who wants to write this exploit uses the right tools.”

If you are a security team: the case for using AI-assisted vulnerability research offensively - running the same tools against your own codebase before anyone else does - is now backed by a documented capability milestone from Anthropic’s own red team. The offensive and defensive tools are the same tools. The question is who runs them first.

If you are shipping software that runs in a browser: the JIT compilers in every major browser engine are a surface where type safety invariants are maintained at the edge of aggressive optimizations. The Anthropic team notes they plan to expand collaboration with developers to find vulnerabilities in open-source software. That program will run faster as the models improve. Participating in it is a better option than waiting.

The transcript is public. The PoC is runnable. The doubling rates are documented. The window Anthropic is describing is open right now.

Sources:

Claude Wrote a Browser Exploit. Anthropic Published the Transcript.

What “Writing an Exploit” Actually Means Here

The Vulnerability Claude Exploited

The Part That Matters More Than the Success Rate

Why This Specific Vulnerability Mattered

What Anthropic Is Saying - and What They Are Not

What Changes If You Are Building or Auditing Software

Want the next post like this?

Read Next

AI Can Already Exploit Your Smart Contract for $1.22. That Number Is Falling.

Cloudflare Shipped Enterprise MCP Governance. The Protocol Doesn't Have It Yet.