大语言模型(LLM)的一个重要特性是,更新、更大的模型在所有方面都表现更优。AI实验室正将大量资源投入编程等经济价值高的领域,但更大的模型在谈判、对齐、诗歌创作等广泛任务上同样更具优势。例如,在PACT基准测试的数千场模拟谈判中,GPT-5.5在买卖双方多轮议价游戏中取得了最佳成绩,这印证了模型规模与综合能力提升的正相关关系。
One of the most important properties of LLMs that we take for granted is that newer, bigger models are just better at everything. The AI Labs are pouring effort into economically valuable fields like coding, but bigger models are also better at negotiation, alignment, poetry, etc