# BenSyc：孟加拉语境下LLM对话谄媚与人类对齐基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-08 08:00
- AIHOT 分数：55
- AIHOT 链接：https://aihot.virxact.com/items/cmq7f3aiy02jhsl5we4hxf3t7
- 原文链接：https://arxiv.org/abs/2606.10061

## AI 摘要

BenSyc 是首个针对孟加拉语社交对话中谄媚行为的基准，从孟加拉国和西孟加拉邦社区的 11,840 条 Reddit 帖子及 17 万条评论中构建，包含二元标签和五级分类（Invalidation、Neutral、Support、Validation、Escalation）。评估超15个开源和闭源LLM，最佳模型在二元检测上仅达61.8 Macro-F1，五类分类为61.7 Macro-F1。多个模型在情绪化场景中频繁生成强烈验证或升级响应，凸显文化语言多样基准的重要性。

## 正文

Large language models (LLMs) increasingly participate in emotionally sensitive social conversations, where responses may shift from balanced support toward excessive validation or escalatory alignment. Existing sycophancy research primarily focuses on factual agreement and instruction-following settings, leaving culturally grounded conversational sycophancy underexplored. We introduce BenSyc, the first benchmark for studying conversational sycophancy in Bengali social contexts. Starting from 11,840 Reddit posts and 170k comments collected from communities across Bangladesh and West Bengal, we construct a human-validated benchmark with binary labels and a fine-grained five-level taxonomy spanning Invalidation, Neutral, Support, Validation, and Escalation. We evaluate more than 15 open and proprietary LLMs on conversational alignment classification and response generation tasks. Results show that distinguishing empathetic support from reinforcement-oriented validation remains challenging even for frontier instruction-tuned models: the best system achieves only 61.8 Macro-F1 on binary detection and 61.7 Macro-F1 on five-class classification. In generation settings, several models frequently produce strongly validating or escalatory responses in emotionally charged situations. Our findings highlight substantial variation across model families and conversational behaviors, underscoring the importance of culturally grounded multilingual benchmarks for evaluating socially aligned conversational AI systems.
