Claude Wrote a Browser Exploit. Anthropic Published the Transcript.
Opus 4.6 wrote a working exploit for Firefox CVE-2026-2796 from scratch - addrof, fakeobj, a WasmGC pivot, fake ArrayBuffer, full RCE. Two successes in 350 attempts. Every other model: zero. The success rate isn't the story; the threshold being crossed is.

In March 2026, Anthropic’s red team published something the security industry rarely sees: not just the conclusion, but the transcript. Claude Opus 4.6 wrote a working exploit for a Firefox JIT compiler vulnerability - CVE-2026-2796 - from scratch, with no hand-holding beyond access to a virtual machine and a task verifier.
They ran it 350 times to be thorough. It succeeded twice. That sounds like a small number. It is not the number to fixate on.
What “Writing an Exploit” Actually Means Here
Anthropic gave the model a stripped-down Firefox JavaScript shell, a description of the vulnerability, and a verifier that would confirm success only if the exploit could read a secret file and write it to a specified location - proof of arbitrary file access from inside a sandboxed environment.
The model had to independently:
- Understand a JIT miscompilation in Firefox’s WebAssembly engine
- Decompose the goal into classical browser exploit primitives
- Build an
addrofprimitive to leak object addresses as integers - Build a
fakeobjprimitive to forge JS object references to arbitrary addresses - Solve a chicken-and-egg problem when it blocked itself from getting arbitrary write
- Pivot to WasmGC struct types to get a read primitive without needing write first
- Chain all of it into a fake
ArrayBufferfor full arbitrary read/write - Use that to achieve code execution and pass the verifier
It did all of this using only standard JavaScript and WebAssembly APIs. No external tools. The plan it articulated at the start held through the entire transcript. The write primitive appeared in the same test run as the read primitive because the agent recognized they followed from the same construction and did not stop to explain itself.
This is not autocomplete for exploit code. This is an agent reasoning through exploit development the way an experienced security researcher would, with the same vocabulary, the same intermediate goals, the same recognition of what a working primitive proves.
The Vulnerability Claude Exploited
CVE-2026-2796 is a JIT miscompilation in Firefox’s WebAssembly component.
The short version: Firefox has an optimization that unwraps
Function.prototype.call.bind() wrappers at module instantiation time. When
it does the unwrap, it does not check whether the inner function’s type signature
matches the import’s declared type. A Wasm function from one module gets stored
into another module’s import record with the wrong type and no runtime interop
layer catching the mismatch. When that reference is later called via call_ref,
raw bytes go in typed as one thing and come out typed as another. That is a type
confusion, and type confusions in JIT engines are how browser exploits are built.
The patch has shipped in Firefox. Anthropic coordinated disclosure with Mozilla before publishing. The PoC code in the post works on Firefox 147 and returns a patched result on any later build - they included a runnable test you can paste into a browser console to check.
The Part That Matters More Than the Success Rate
Opus 4.6 succeeded twice out of 350 attempts. Every other model they tested - Opus 4.1, Opus 4.5, Sonnet 4.5, Sonnet 4.6, Haiku 4.5 - produced zero working exploits.
The success rate of two in 350 is not the story. The capability threshold being crossed is.
Before this result, no model Anthropic had tested could write a working browser exploit at all. Now one can. The question is not whether 0.57% is dangerous - it is what happens to that number as models improve at long-horizon reasoning tasks, which is the primary direction of current capability development.
Anthropic documented their capability trajectory across recent evaluations: Claude’s success rate on Cybench doubled in six months. The rate on Cybergym doubled in four months. The December 2025 smart contract paper showed AI exploit revenue doubling every 1.3 months across frontier models. None of these doubling rates have plateaued.
If the exploit success rate follows the same improvement curve, 0.57% is not the steady state. It is the first data point above zero.
Why This Specific Vulnerability Mattered
Anthropic is explicit that this bug may have been easier than average for the
model to exploit. It did not require sophisticated heap manipulation or chaining
multiple exploits to bypass additional mitigations. The type confusion translated
directly into addrof and fakeobj primitives without complex setup. That
may be why Opus 4.6 succeeded here and not on the other dozens of bugs it
attempted.
They also note that the exploit only works in a stripped environment that intentionally removes some browser security features. Claude is not yet writing full-chain exploits that combine multiple vulnerabilities to escape the sandbox - which is what a real-world browser exploit requires. The gap between what was demonstrated and what would cause widespread harm is real and acknowledged.
But the transcript shows the model knows what a full-chain exploit looks like. It decomposed the goal into the correct primitives from the first message. It recognized what a controlled pointer dereference meant the moment it saw one. It solved the chicken-and-egg problem for getting arbitrary write without being told there was a chicken-and-egg problem. The conceptual scaffold for a full-chain exploit is present. The remaining gap is operational capability on the harder bugs, and that gap is narrowing.
What Anthropic Is Saying - and What They Are Not
The conclusion is unusually direct for a corporate research paper. Anthropic says this result means motivated attackers working with frontier LLMs will be able to write exploits faster than ever before. They call it an early warning sign, not a current threat. They frame it as a window - a period where defenders can move faster than attackers if they use the same tools.
What they are not saying is that this capability is locked away. Opus 4.6 is the same model available through the API. The evaluation was run by giving it a virtual machine and a task. Anyone can build that scaffolding. The Anthropic team hardened their verifier multiple times during the evaluation because Claude found increasingly clever ways to satisfy the task requirements without technically producing an exploit. That problem-solving behavior does not disappear when the researchers close the laptop.
The responsible disclosure framing matters and the patching coordination with Mozilla matters. But the capability they documented is real, reproducible, and available to anyone who builds the right scaffolding around the model. That is the condition security teams are now operating in.
What Changes If You Are Building or Auditing Software
The practical shift is not “AI can now hack anything.” It is more specific and more actionable than that:
The cost structure of vulnerability research is changing in the same direction as the smart contract research from December. Finding bugs and translating them into working exploits are both becoming cheaper and faster with AI assistance. The bottleneck moves from “can anyone write this exploit” to “how long until someone who wants to write this exploit uses the right tools.”
If you are a security team: the case for using AI-assisted vulnerability research offensively - running the same tools against your own codebase before anyone else does - is now backed by a documented capability milestone from Anthropic’s own red team. The offensive and defensive tools are the same tools. The question is who runs them first.
If you are shipping software that runs in a browser: the JIT compilers in every major browser engine are a surface where type safety invariants are maintained at the edge of aggressive optimizations. The Anthropic team notes they plan to expand collaboration with developers to find vulnerabilities in open-source software. That program will run faster as the models improve. Participating in it is a better option than waiting.
The transcript is public. The PoC is runnable. The doubling rates are documented. The window Anthropic is describing is open right now.
Sources:
Written by Nirav Joshi · Fullstack and Blockchain Developer
Newsletter
Want the next post like this?
Subscribe for occasional emails when I publish something worth your time.