一篇关于自我改进智能体的论文指出,自改进循环往往在评估器固定后停滞——智能体学会迎合固定评估器而非真正进步。剑桥大学提出的“Red Queen Gödel Machine”让智能体与其评估器共同进化,使标准随着智能体提升而持续提高,从结构上避免奖励欺骗(reward hacking)。名称借用了进化军备竞赛的隐喻:双方都必须不断奔跑才能保持原地。论文链接在arxiv。
Fascinating paper on self-improving agents.
(bookmark it)
If you are working on agentic loops, you will quickly realize that they are only as good as the effectiveness of the evaluator.
Self-improvement loops tend to stall the moment the judge stops getting harder. The agent learns to satisfy a fixed evaluator rather than getting genuinely better. The Red Queen Gödel Machine, from Cambridge, co-evolves the agent and its evaluator together, so the bar keeps rising as the agent climbs.
The name borrows the evolutionary arms race. Both sides have to keep running to stay in place.
A frozen evaluator is where reward hacking creeps into self-improvement. Co-evolving the judge is a structural answer to that, and it keeps the loop honest over many rounds.