强化学习驱动未见语言翻译的上下文学习

2026-06-04 08:00·29天前

AI 摘要

大语言模型（LLM）翻译极低资源语言时，现有方法（继续训练或编码语法书）易过拟合特定语言，零样本迁移有限。本文提出一种强化学习（RL）方法，以字符级翻译指标chrF作为奖励，训练模型从丰富语言上下文中提取并应用语言知识，实现对完全未见语言的翻译。实验表明，即使使用轻量级奖励，RL训练模型在未见语言上的表现优于上下文学习和监督微调。研究显示，结果导向的RL可超越数学、编程等传统推理任务，成为从上下文中学习语言的通用方案。

原文 · 未翻译

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.

HuggingFace Daily Papers（社区热门论文）

57导出 Markdown

强化学习驱动未见语言翻译的上下文学习

2026-06-04 08:00·29天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译