强化学习驱动未见语言翻译的上下文学习
阅读原文· arxiv.org大语言模型(LLM)翻译极低资源语言时,现有方法(继续训练或编码语法书)易过拟合特定语言,零样本迁移有限。本文提出一种强化学习(RL)方法,以字符级翻译指标chrF作为奖励,训练模型从丰富语言上下文中提取并应用语言知识,实现对完全未见语言的翻译。实验表明,即使使用轻量级奖励,RL训练模型在未见语言上的表现优于上下文学习和监督微调。研究显示,结果导向的RL可超越数学、编程等传统推理任务,成为从上下文中学习语言的通用方案。
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.