# RiVER：无需标准答案即可训练LLM生成更优代码

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-06-27 18:00
- AIHOT 分数：50
- AIHOT 链接：https://aihot.virxact.com/items/cmqw7r11u00q2slr4qwyvi4dn
- 原文链接：https://x.com/rohanpaul_ai/status/2070809371710202227

## AI 摘要

论文提出RiVER方法，让LLM从没有已知标准答案的问题中学习编码行为。RiVER使模型编写多个程序，在相同隐藏测试上运行，奖励表现较优者。关键是对每个测试用例内的程序排序，给最优者额外权重，其他有效程序也获得较小分级反馈，避免因原始分数数值差异扭曲训练。在12个AtCoder Heuristic Contest任务上，RiVER同时提升了基于分数的竞赛表现和常规通过/失败编码基准测试。arXiv:2606.27369。

## 正文

LLMs can learn better coding behavior from problems with no known answers.

Many real problems do not have a gold solution waiting in a database， especially in optimization， where the best answer may be unknown， expensive， or impossible to certify.

Normal reinforcement learning works well when it can check a clear right answer， but that breaks down when the best answer is unknown.

The paper's method， called RiVER， lets the model write several programs， runs them on the same hidden tests， and rewards the programs that perform better than the others.

The key trick is that RiVER does not trust raw scores directly， because some test cases naturally produce much bigger numbers and can distort training.

Instead， it ranks programs within each test case， gives extra weight to the best one， and still gives smaller graded feedback to other valid programs.

The authors trained models on 12 AtCoder Heuristic Contest tasks， and RiVER improved both score-based contest performance and normal pass-or-fail coding benchmarks.

----

Link - arxiv. org/abs/2606.27369

Title： "Reinforcement Learning without Ground-Truth Solutions can Improve LLMs"
