# 通过一致性训练减少政治操纵

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-28 08:00
- AIHOT 分数：63
- AIHOT 链接：https://aihot.virxact.com/items/cmpr2vzbt0afcslnobrza7y6m
- 原文链接：https://arxiv.org/abs/2605.22771

## AI 摘要

研究发现大语言模型（LLM）在处理不同政治立场话题时存在系统性的“隐蔽政治偏见”，即不对称处理。该研究识别了7类偏见技术，并提出两种度量标准：情感一致性（对称修辞）与有用性一致性（对称深度与参与度）。为减少此类偏见，研究引入了政治一致性训练（PCT），这是一种包含两个互补范式的强化学习方法。结果表明，PCT在保持模型总体有用性的同时，显著减少了隐蔽政治偏见，并能推广至未见过的评测基准。

## 正文

Large language models (LLMs) exhibit systematic political bias across a variety of sensitive contexts. We find that LLMs handle counterpart topics from opposing political sides asymmetrically. We refer to this phenomenon as covert political bias and identify 7 categories of techniques through which it operates. We propose two metrics for covert bias: Sentiment Consistency measures symmetry in rhetoric and framing across paired political prompts; Helpfulness Consistency measures symmetric depth and engagement. To reduce both types of covert bias, we introduce Political Consistency Training (PCT), an RL training method with two complementary paradigms: Sentiment Consistency Training and Helpfulness Consistency Training. We show that PCT preserves overall helpfulness, substantially reduces covert political bias, and generalizes to held-out benchmarks. We release our work at https://political-manipulation.ai
