在1亿个标记之后,性能仍在持续提升。我们在这里看到的并非能力上限。 报告指出:"TLO上的性能随着推理计算量的增加而持续扩展,我们尚未在最佳模型中观察到性能平台期。" [引用 @AISecurityInst]:OpenAI的GPT-5.5是第二个端到端完成我们多步骤网络攻击模拟的模型🧵
After 100 million tokens, performance was still going up. What we're seeing here is not the capability ceiling.
From the report: "Performance on TLO continues to scale with the amount of inference compute spent, and we have not yet observed a plateau with the best models."