# 小模型大智慧：随机推理实现性能超越

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-05-21 15:14
- AIHOT 分数：67
- AIHOT 链接：https://aihot.virxact.com/items/cmpf63ael03misljwvbe64cmh
- 原文链接：https://x.com/rohanpaul_ai/status/2057359289459913196

## AI 摘要

仅1000万参数的GRAM模型，通过引入可学习的随机性，在推理时并行探索多条不同路径，打破了传统递归模型锁定单一思维的限制。该模型在测试时同时运行这些平行轨迹，并借助奖励预测器选择最优结果，从而在深度之上增加了“宽度”维度。实验表明，GRAM在困难数独任务上准确率高达97%，远超此前最佳确定性模型；在多解的皇后问题上也能维持高性能，并能高效生成有效的数独谜题。这一框架为提升小模型的推理能力提供了新思路。

## 正文

A 10 million parameter model just outperformed deterministic rivals 3 times its size by doing something regular recursive AI dont do： exploring multiple reasoning paths at the same time.

Most AI reasoning models are trapped on a single train of thought， and GRAM （"Generative Recursive Reasoning"） is the first to break that by letting the model think in parallel universes simultaneously.

The problem is that all existing recursive models are fully deterministic， meaning given the same input they always follow the exact same reasoning path and can never escape a wrong trajectory or discover more than 1 valid answer.

GRAM fixes this by injecting learned randomness at each refinement step， so the model samples a slightly different direction each time rather than snapping to 1 fixed next state， which produces a spread of diverse reasoning trajectories.

At test time the model runs many of these paths in parallel and selects the best one using a small reward predictor trained alongside the main model， adding a "width" scaling axis on top of the usual "depth" axis of running more recursion steps.

On hard Sudoku puzzles， GRAM with 10M parameters hits 97% accuracy versus 87.4% for the best prior recursive model， and with only 20 parallel samples it outperforms every deterministic baseline even at 320 recursion steps.

On tasks with many valid answers like N-Queens， deterministic recursive models collapse as the number of solutions grows， while GRAM maintains near-perfect accuracy throughout.

The same stochastic framework also acts as a generator： given a blank board， GRAM produces valid Sudoku puzzles 99% of the time using 16 steps， versus 1，000 steps and 55M parameters for the best diffusion baseline at just 91%.

---

Paper Link - arxiv. org/abs/2605.19376v1
