OpenAI Tests AI Agents Against Smart Contract Vulnerabilities

OpenAI has developed a new benchmark, EVMbench, to assess how effectively AI models can identify, fix, and exploit security vulnerabilities in cryptocurrency smart contracts.

The benchmark paper, released in collaboration with crypto investment firm Paradigm and security firm OtterSec, evaluated AI agents on 120 smart contract vulnerabilities. Anthropic’s Claude Opus 4.6 achieved the highest average "detect award," followed by OpenAI’s OC-GPT-5.2 and Google’s Gemini 3 Pro.

OpenAI emphasizes the importance of evaluating AI in "economically meaningful environments," noting that smart contracts secure significant assets and AI agents will impact both attackers and defenders. The company anticipates growth in AI-driven stablecoin payments and believes crypto could become the native currency for AI agents.

This initiative comes as crypto attackers stole $3.4 billion in 2025. EVMbench utilizes vulnerabilities from numerous smart contract audits, aiming to track AI progress in detecting and mitigating these risks at scale.

Separately, Dragonfly’s Haseeb Qureshi suggested that AI-intermediated, self-driving wallets are the future of crypto transactions, addressing user fears and complex operations. He compares AI's role for crypto to GPS for smartphones or the browser for TCP/IP, suggesting AI agents are the missing complement for crypto's widespread adoption.