# LLMs可以泄露训练数据，但它们愿意吗？一种基于倾向性的记忆评估框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-04 08:00
- AIHOT 分数：67
- AIHOT 链接：https://aihot.virxact.com/items/cmq0mfh7j07kysltrg4y6rsfs
- 原文链接：https://arxiv.org/abs/2606.06286

## AI 摘要

提出PropMe框架，通过对比前缀攻击与非对抗性评估，衡量大语言模型在普通使用中泄露训练数据的倾向性。配套SimpleTrace管道，基于infini-gram对生成内容进行确定性溯源，计算逐字、近似逐字及倾向性转换后的记忆指标。在Comma和DFM Decoder两个全开放模型、Common Pile和Dynaword两个数据集上的评估显示：前缀攻击可大幅提升记忆提取，但非对抗性提示下倾向性分数始终较低，说明模型能泄露数据但通常不会自发这么做。DFM Decoder（从Comma持续预训练而来）在Common Pile上的记忆能力和倾向性均下降，表明后续训练侧重不同数据可降低记忆。建议记忆审计同时报告最坏情况可提取性和日常泄露倾向性。

## 正文

Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based capability attacks with non-adversarial evaluations. We propose a metric transformation that, applied to existing functions, allows to create propensity metrics. We further introduce SimpleTrace, a lightweight tracing pipeline built on infini-gram that deterministically attributes model generations to large-scale training corpora and computes verbatim, near-verbatim, and propensity-transformed memorization metrics. Evaluating two fully-open models: Comma and DFM Decoder on two datasets: Common Pile and Dynaword in two languages, we find a consistent gap between capability and propensity: prefix attacks elicit substantially stronger memorization signals than generic or dataset-specific prompts, while propensity scores remain low overall. Thus, the models can reveal training data when directly elicited, but rarely do so in more common non-adversarial settings. We also find that DFM Decoder, which is continually pre-trained from Comma, exhibits reduced memorization and memorization propensity for Common Pile, confirming that memorization capability can decrease when later training emphasizes partially different data. Our results suggest, and we encourage, that memorization audits should report both worst-case extractability and ordinary leakage propensity in order to have a more comprehensive view of this phenomenon.
