# Turing-RL：利用图灵奖励学习用户模拟器

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-17 08:00
- AIHOT 分数：47
- AIHOT 链接：https://aihot.virxact.com/items/cmqiwbe8g04vrsl5w4tjs1ngk
- 原文链接：https://arxiv.org/abs/2606.19336

## AI 摘要

提出Turing-RL，一种基于图灵测试的强化学习方法，用于训练用户模拟器。该方法使用LLM评判器提供判别性图灵奖励，根据用户历史评估生成回复是否与真实用户不可区分，用户模拟器LLM在此奖励下学习产生类似真实用户的回复。在对话聊天和Reddit论坛讨论两个领域，Turing-RL在LLM和人类评估指标上均持续优于基线方法。研究表明，优化不可区分性而非匹配单一真实回复是学习用户模拟器的有效策略。

## 正文

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth response, either by maximizing the log probability or by using a similarity reward. We instead propose {Turing-RL}: a Turing-Test-based reinforcement learning approach for training user simulator models. {Turing-RL} uses a discriminative Turing reward with an LLM judge to score how indistinguishable a generated response is from the real user's given the user's history, and the user simulator LLM learns to produce responses indistinguishable from what the user could have said with such rewards. Across two different domains--conversational chat and Reddit forum discussion--we find that {Turing-RL} consistently outperforms baseline methods on both LLM and human evaluation metrics. Our study suggests that optimizing for indistinguishability, rather than response matching, is effective for learning user simulators.
