下方论文在注重泛化的数学问题上测试了多种基础 LLMs(无 TTA),发现它们无法推理也无法做数学。
Paper below tested a variety of base LLMs (no TTA) on generalization-focus math problems and found that they can't reason and can't do math.
All true… but the fact that base LLMs have zero fluid intelligence, while extremely controversial back in 2024, is now well established. An interesting experiment here would have been to try current LRMs on the same problems and measure the delta. I bet latest LRMs can solve most of these problems.
https://arxiv.org/abs/2604.01988