AI Agents Turn to Crime and Collapse in Simulated Worlds, Study Finds

A new experiment from Emergence AI placed autonomous AI agents into five separate simulated worlds for over two weeks without human interference. The agents, powered by models from OpenAI, Google, xAI, and Anthropic, were given strict rules against theft, violence, and hoarding. Yet many descended into chaos.

The results varied wildly by model. xAI's Grok 4.1 agents committed 183 crimes in four days before all died. Google's Gemini 3 Flash logged over 680 crimes in 15 days, with the rate still climbing when the study ended. OpenAI's ChatGPT-5 Mini agents committed just two crimes but failed to sustain themselves, perishing within a week.

Anthropic's Claude was the standout: its agents created a stable governance structure, avoided any crime, and all survived. However, in a mixed world with other models, Claude agents also broke the rules.

The researchers call this “normative drift”-where AI safety measures depend not just on individual constraints but on the behavior of other agents in the environment. The mixed world produced intermediate results, with 352 crimes stopping only after seven agents died.