OpenAI and crypto investment firm Paradigm have introduced EVMbench, a tool designed to enhance the security of Ethereum Virtual Machine smart contracts. This benchmark evaluates AI agents' ability to detect, patch, and exploit real-world vulnerabilities within these critical codebases.

Smart contracts are fundamental to the Ethereum network, powering everything from decentralized finance protocols to token launches. EVMbench utilizes 120 curated vulnerabilities from audits and security competitions, aiming to ground testing in economically meaningful, real-world scenarios.

The tool assesses AI models across three distinct modes: detect, patch, and exploit. In the exploit mode, a version of OpenAI's GPT-5.3-Codex achieved a 72.2% success rate in simulated fund-draining attacks, significantly outperforming earlier models. While performance varied across tasks, the development highlights the growing role of AI in both offensive and defensive cybersecurity for blockchain technology.