AI 摘要
可能是我见过的最好的推理效率奖励函数。
probably the best reward function for reasoning efficiency i've seen
length penalty is very elegant and simple tbh
可能是我见过的最好的推理效率奖励函数。
probably the best reward function for reasoning efficiency i've seen
可能是我见过的最好的推理效率奖励函数。
probably the best reward function for reasoning efficiency i've seen