The Models Too Powerful to Ship
The most interesting thing happening in AI right now isn’t what labs are releasing — it’s what they’re withholding.
On April 7, Anthropic announced Claude Mythos Preview, their most capable model, with no plans for general availability. Instead, it’s being given to 11 vetted organizations via Project Glasswing — a controlled cybersecurity research program. The numbers explain why: on the Firefox 147 benchmark, Mythos developed working exploits 181 times, compared to just 2 for Claude Opus 4.6. That’s a 90x jump in offensive cyber capability in a single generation.
OpenAI is following the same playbook. This week, they announced GPT-5.5-Cyber would roll out in limited preview to vetted EU cybersecurity teams only — not the general API. Anthropic is still holding Mythos back from even that level of access.
This is new. For years, labs competed to release more. Now the frontier is bifurcating: a public tier and a restricted tier that only governments, researchers, and select enterprises can touch.
What Did Ship: Opus 4.7 and Gemma 4
Claude Opus 4.7 (April 16) is Anthropic’s publicly available flagship. It’s a genuine step forward on agentic coding — users report handing off their hardest software tasks with less supervision needed. Vision is also substantially better, handling higher-resolution images. It’s priced the same as Opus 4.6.
On the open-source side, Google’s Gemma 4 (April 2) is the release that matters most. Four model sizes — E2B, E4B, 26B MoE, and 31B Dense — all under Apache 2.0, meaning unrestricted commercial use. The 31B currently ranks #3 among open models on the Arena AI leaderboard. It supports multimodal inputs (text + image across all sizes, audio on edge variants), 256K token context windows, and 140 languages natively. For anyone building applications without paying API fees, Gemma 4 is a serious option.
The Agent Race Heats Up
Every major lab is now building personal agents — software that operates continuously on your behalf rather than answering one question at a time.
- Google is testing a personal agent codenamed “Remy” — described as a 24/7 assistant for work, school, and daily life that acts autonomously. Google also shut down its Mariner browser agent on May 4, folding the technology into Gemini Agent.
- Meta is developing “Hatch” internally and building an agentic shopping tool for Instagram (internal testing expected by end of June). Meta also launched Incognito Chat on WhatsApp — conversations with Meta AI that even Meta cannot read, using private processing.
- Google and Meta are broadly described as chasing Anthropic and OpenAI on agents, not leading.
The pattern: OpenAI and Anthropic set the agentic ceiling, and everyone else is building products on top of or alongside them.
Research Worth Noting
AlphaEvolve, from Google DeepMind, combines Gemini with an evolutionary algorithm to discover new algorithms from scratch. It’s already found more efficient ways to manage power consumption in Google’s data centers and optimize TPU chip operations — real-world impact, not a benchmark.
TurboQuant is a Google research algorithm targeting the KV cache memory bottleneck in LLMs. It uses two-step compression (PolarQuant + Johnson-Lindenstrauss) to cut memory overhead significantly — relevant for anyone running inference at scale.
The Bigger Picture
A new IBM study finds 76% of organizations now employ a Chief AI Officer in 2026, up from 26% last year. That’s not gradual adoption — that’s a structural shift in how companies are organized around AI.
Meanwhile, OpenAI’s B2B Signals research found that frontier-adopting companies now use 3.5x more AI intelligence per employee than typical firms. The gap between companies moving fast and those moving slow is compounding.
The story of May 2026 isn’t one breakthrough — it’s a system maturing: open-source catching up, personal agents becoming real infrastructure, and frontier capability outpacing what anyone is willing to put in public hands.