LLM可通过分析公开写作实现大规模去匿名化。研究让模型执行提取身份线索、搜索匹配池、比较验证候选者三项任务,在Hacker News与LinkedIn、Reddit跨社区及跨时间段等场景测试中,达到90%精确度与68%召回率,远胜旧方法。关键突破在于推理步骤能处理大规模候选池,证明零散公开文本已足以关联账户并识别个人,传统匿名保护机制失效。
Anonymous usernames are no longer much protection when LLMs can piece together a person's public trail.
LLMs can identify supposedly anonymous people online by turning messy posts into personal clues.
The best setup finds 68% of true matches at 90% precision, meaning 9 out of 10 guesses are right, while older methods stay near 0%.
The problem is that pseudonyms often seemed safe only because linking a person across sites used to take lots of careful manual work.
This paper cuts that work by making an LLM do 3 jobs: pull identity hints from raw text, search a huge pool of possible matches, and compare the best candidates to reject weak fits.
The authors tested this on 3 cases: matching Hacker News users to LinkedIn profiles, matching Reddit movie users across communities, and matching the same Reddit users across different time periods.