# KnowRL：基于最小充分知识引导的强化学习提升大语言模型推理

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-14 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnzj0j7t03olsl0ffrtocp5j
- 原文链接：https://arxiv.org/abs/2604.12627

## AI 摘要

针对RLVR在难题上面临的奖励稀疏问题，KnowRL框架将知识提示解构为原子知识点（KPs），运用约束子集搜索（CSS）构建紧凑训练子集，并显式优化剪枝交互悖论下的鲁棒子集选择。基于OpenMath-Nemotron-1.5B训练的模型在8项推理基准测试中创下1.5B规模新SOTA：无提示推理准确率达70.08%，较基线提升9.63个百分点；结合选定KPs后升至74.16%。模型与代码已开源。

## 正文

RLVR improves reasoning in large language models, but its effectiveness is often limited by severe reward sparsity on hard problems. Recent hint-based RL methods mitigate sparsity by injecting partial solutions or abstract templates, yet they typically scale guidance by adding more tokens, which introduce redundancy, inconsistency, and extra training overhead. We propose KnowRL (Knowledge-Guided Reinforcement Learning), an RL training framework that treats hint design as a minimal-sufficient guidance problem. During RL training, KnowRL decomposes guidance into atomic knowledge points (KPs) and uses Constrained Subset Search (CSS) to construct compact, interaction-aware subsets for training. We further identify a pruning interaction paradox -- removing one KP may help while removing multiple such KPs can hurt -- and explicitly optimize for robust subset curation under this dependency structure. We train KnowRL-Nemotron-1.5B from OpenMath-Nemotron-1.5B. Across eight reasoning benchmarks at the 1.5B scale, KnowRL-Nemotron-1.5B consistently outperforms strong RL and hinting baselines. Without KP hints at inference, KnowRL-Nemotron-1.5B reaches 70.08 average accuracy, already surpassing Nemotron-1.5B by +9.63 points; with selected KPs, performance improves to 74.16, establishing a new state of the art at this scale. The model, curated training data, and code are publicly available at https://github.com/Hasuer/KnowRL.