A fundamental challenge in self-improving AI is that the system evaluating progress remains static, creating a bottleneck. Researchers from the University of Cambridge and Nvidia have proposed a solution: a framework where both the AI agent and its evaluator co-evolve.
The framework, called the Red Queen Gödel Machine (RQGM), runs in discrete rounds. Both the working AI and the judging AI are upgraded simultaneously. This is an evolution of earlier concepts that relied on formal proofs, instead using a more organic, iterative co-evolution process.
Preliminary results are promising. In tests, co-evolved systems for scientific writing showed acceptance rates increase by up to 1.86x. Mathematical proof graders improved accuracy by 9%. For coding tasks, token usage was reduced by up to 1.72x, indicating greater efficiency.
The approach addresses the problem of AI systems gaming static benchmarks. By making the evaluation target move with the agent, it prevents performance plateaus.
The research notes alignment concerns, as flawed ground-truth metrics could be amplified during co-evolution. The paper is a preprint and has not yet undergone peer review.