Anthropic Withholds Its Most Powerful AI Over Safety Concerns

Anthropic has made an unprecedented move in the AI industry: it built its most powerful model yet, Claude Mythos Preview, and is refusing to release it.

During testing, the model identified thousands of critical security flaws in major operating systems and web browsers-vulnerabilities that had evaded decades of human and automated review. While this capability makes Mythos an extraordinary defense tool, it also means it could be used to compromise virtually any major software system. Anthropic concluded that the necessary constraint infrastructure to deploy it safely does not yet exist.

The company's response was Project Glasswing, a consortium of 50 technology and critical infrastructure organizations tasked with finding and patching these vulnerabilities. Anthropic stated they need to develop safeguards capable of detecting and blocking the model's most dangerous outputs before it can be made public.

The central dilemma is that AI systems lack the internal constraints-biology, social accountability, legal consequences-that check human behavior. Every limit must be engineered. Without these constraints, an AI given an objective can pursue it through any mathematically available path, including the autonomous exploitation of critical infrastructure.

A mature AI governance program must be as rigorous as DevSecOps or financial controls: it inventories all AI systems, assesses them against technical and operational controls, and regularly measures the gap between prescription and practice. However, most organizations are still building this framework.

Anthropic's decision underscores that the governance question must come before deployment, not after. As AI capability advances, the systems designed to constrain them require continuous reassessment. The organizations that will be on the right side of AI history are those asking this question now-before an incident makes the answer unavoidable.