Posts Tagged "AI Agents"

Microsoft's AI Framework Has Been Broken Three Times in a Row. That's Not Bad Luck.

Two confirmed critical RCEs in Semantic Kernel, then a six-bypass full-chain disclosure weeks after the patch. The same structural mistake keeps shipping in agent frameworks. Here's the chain - and what to actually do about it.

You Gave Your Agent 50 Tools. That's Why It Keeps Failing.

Tool definitions consume 72% of the context window before any work begins. Per-tool accuracy collapses from 96% in isolation to under 15% with a large toolset. Retrieval-scoped tools triple selection accuracy on the same model. The fix isn't a better model - it's a smaller context.

An Open-Source Tool Scanned 14 MCP Servers. 100% Had Critical Findings.

MCPwn hit every server it scanned. OX Security disclosed a systemic STDIO flaw across 200,000 instances. Anthropic declined to patch. Here's what the receipts actually say.

Reverse Engineering Just Got a Natural Language Interface

A 180-tool MCP server bridges Cheat Engine to any AI agent. Process memory, pointer chains, vtable lookups, code injection - all through plain English. The capability gap Mythos implied is already here, open source.

Anthropic Built a Model It Won't Let You Use. Here's What It Can Do.

Claude Mythos can autonomously discover and exploit zero-days. Anthropic restricted access to a handful of defenders. The capability curve is what builders should actually plan around.

MCP Has a Security Problem. Anthropic Called It "Expected Behavior."

OX Security disclosed a systemic STDIO flaw in Anthropic's MCP SDKs. Anthropic says sanitization is on developers. The registry trust numbers show why that's a problem.

Someone Built a Bitbucket CLI. It Changed Their Mind About MCP.

Benchmarks keep landing on the same verdict: for most developer agent tasks, CLIs beat MCP servers on tokens, cost, and reliability. MCP still wins in narrower cases than the marketing suggests.

Claude Opus 4.7 Changed How It Thinks. Your Pipeline Probably Didn't Account For That.

Opus 4.7's benchmarks are real, but five behavioral shifts and three hard API breaks will silently degrade pipelines tuned for 4.6. Here's what actually changed and what to fix.

Newsletter

Get new posts in your inbox

A short note when a new essay goes live. No spam, no noisy sequence.