AI 摘要
AI能执行的最大软件工程任务是什么? 为此,我们构建了MirrorCode,一个长期SWE基准测试,允许AI一次自主编程数天。 最好的模型完成了一些我们估计人类工程师需要数周的任务。
What are the largest software engineering tasks AI can perform?
To answer this, we built MirrorCode, our long-horizon SWE benchmark that lets AI code autonomously for days at a time.
The best models complete some tasks we estimate would take human engineers several weeks.