# 线性集成如何削弱大语言模型水印？

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-28 08:00
- AIHOT 分数：55
- AIHOT 链接：https://aihot.virxact.com/items/cmpwqvwfq05cuslsned2b46dm
- 原文链接：https://arxiv.org/abs/2605.30501

## AI 摘要

该研究揭示了一个根本漏洞：当用户同时使用多个大语言模型时，现有的文本水印技术会失效。因为各模型独立施加的输出分布扰动会相互抵消。团队提出了WASH方法，通过线性集成平均各模型的输出概率分布，可以恢复未加水印的分布。实验在六个水印方案和三个LLM上进行，结果显示，平均三个模型即可将检测z分数从5-300显著抑制到低于检测阈值2（阈值为4），并将5% FPR下的TPR降至50%以下，同时提升了生成质量。研究指出，要实现鲁棒的水印检测，需要模型提供商之间进行前所未有的协调。

## 正文

Watermarking embeds statistical signatures in AI-generated text for detection and attribution. We reveal a fundamental vulnerability: when users access multiple models (today's reality), watermarks trivially fail. Watermarks perturb output distributions away from the original, and in competitive markets, these perturbations are typically independent across providers. We theoretically prove that averaging output probability distributions recovers the unwatermarked distribution with up to a second-order error term. Empirically, simply averaging 3-5 models cancels out these perturbations. We introduce WASH (Watermark Attenuation via Statistical Hybridisation), which solves practical challenges in ensemble generation: vocabulary misalignment and tokenisation differences across heterogeneous models. Experiments across six watermarking schemes and three LLMs show that averaging across 3 models suppresses detection z-scores from 5-300 to below 2 (below the detection threshold of 4) and reduces TPR at 5% FPR to below 50%, while improving quality by 27.5% and running 6 times faster than the best baseline on the long sequence generation. Our results suggest that robust AI-text detection via watermarking requires either accepting this fundamental vulnerability or unprecedented coordination among model providers.