基础模型在AI检测器下更"像人"

2026-05-19 08:00·45天前

AI 摘要

本研究发现了一个有趣现象：当使用GPTZero等商业AI文本检测器时，基础模型生成的文本常被判定为人类文本，而指令调优模型生成的文本则不然。基于此，研究团队提出了“通过迭代释义实现人性化”的方法。该方法通过微调基础模型并迭代应用，以在语义保持和规避检测之间取得平衡。实验表明，该方法在不同规模的Llama-3和Qwen-3模型上均能有效提升文本在检测器眼中的人类相似度。研究指出，现有检测器可能更多地关注了指令调优产生的特征，而非机器生成文本的本质，这为未来的检测器设计提供了新的方向。

原文 · 未翻译

As AI-generated text enters the real-world at scale, institutions increasingly use commercial AI-text detectors, especially in education and academic-integrity workflows. We report a surprising empirical finding about such systems: when evaluated by GPTZero and Pangram, generated text from base models is often judged overwhelmingly human, whereas text generated by their instruction-tuned counterparts is not. Building on this observation, we propose Humanization by Iterative Paraphrasing (HIP), a detector-agnostic pipeline that minimally fine-tunes a base model into a paraphraser and applies it iteratively. Compared with the baselines we test, HIP yields a stronger trade-off between semantic preservation and detector evasion on commercial detectors. Across Llama-3 and Qwen-3 families, spanning model sizes from 0.6B to 70B, HIP consistently improves detector human-likeness. Our findings suggest that current detectors are tracking artifacts of instruction tuning and local context more than any invariant notion of machine-generated text. This, in turn, calls for detector designs that model these factors more explicitly.

HuggingFace Daily Papers（社区热门论文）

69导出 Markdown

基础模型在AI检测器下更"像人"

2026-05-19 08:00·45天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译