# 对机器文本检测器的攻击保留风格指纹

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-08 08:00
- AIHOT 分数：42
- AIHOT 链接：https://aihot.virxact.com/items/cmqha2n7f02fysle1axd1lnl1
- 原文链接：https://arxiv.org/abs/2505.14608

## AI 摘要

现有攻击（如提示工程、检测器引导优化）虽能降低标准检测器性能，但无法抹去机器文本底层的风格指纹；利用风格特征空间的少样本检测器可抵御这些攻击。然而，一种同时优化不可检测性与贴合特定人类风格的改写方法成功绕过了所有检测器（包括基于写作风格的检测器）。不过这种规避并非绝对：随着分析的文档数量增加，人类与机器文本的分布重新变得可区分。因此可靠检测需从单文档分析转向多文档分析。

## 正文

Despite considerable progress in the development of machine-text detectors, the ease with which machine-text can be manipulated to evade detection has led to suggestions that the problem is inherently intractable. In this work, we investigate the limits of such evasion strategies. We demonstrate that while current attacks, ranging from prompt engineering to detector-guided optimization can effectively degrade performance of standard detectors, they fail to erase the underlying stylistic "fingerprints" of machine text. We show that few-shot detectors that utilize the stylistic feature space are robust to these evasion attempts, reliably detecting samples even from models explicitly tuned to prevent detection. This raises the question: does style represent a universal defense against machine-detection attacks? We demonstrate that the answer is "no'' by introducing a novel paraphrasing approach that simultaneously optimizes for undetectability and adherence to specific human styles. We show that unlike prior methods, this attack effectively evades all considered detectors, including those that utilize writing style. However, we find that this evasion is not absolute: as the number of documents available for analysis grows, the human and machine distributions become distinguishable again. Overall, our findings suggest that reliable machine-text detection requires moving beyond single-document analysis to multi-document analysis.
