我们使用公开模型复现了Anthropic的Mythos研究成果

2026-04-18 15:59·75天前·__natty__

AI 摘要

安全研究团队利用公开可用的大语言模型成功复现了Anthropic关于模型虚构性（Mythos）的核心研究发现。实验表明，即使不依赖专有模型，研究人员也能观察到大型语言模型产生一致虚构概念的现象，验证了该研究结果的可重复性。这一复现为AI安全领域提供了重要的实证参考，证明相关模型行为模式在公开模型中同样显著存在。

原文 · 未翻译

TL;DR

Anthropic presents Mythos and Project Glasswing as evidence that advanced AI vulnerability research should be restricted. But our replication suggests a different conclusion: the capabilities Anthropic points to are already available in public models, so defenders should prepare for that reality instead.

Anthropic's Mythos release is useful because it makes something concrete: frontier models are getting much better at finding serious vulnerabilities in real software.1

The more important question for defenders is what that means outside Anthropic's own stack.

If public models can reproduce or at least get meaningful traction on representative Mythos findings across categories like FreeBSD, OpenBSD, FFmpeg, Botan, and wolfSSL, then the shift Anthropic is pointing at is already spreading beyond a single lab's private workflow.

FreeBSD

OpenBSD

FFmpeg

Botan

wolfSSL

That is what we tested. We used GPT-5.4 and Claude Opus 4.6 in opencode, together with a standardized chunked security-review workflow, and tried to reproduce Anthropic's patched public examples outside Anthropic's internal stack.2

GPT-5.4

Claude Opus 4.6

opencode

Our result is more mixed, and more useful because of it: we cleanly reproduced FreeBSD, Botan, and the OpenBSD case with at least one widely available model, while both GPT-5.4 and Claude Opus 4.6 only reached partial results on FFmpeg and wolfSSL rather than full replications. In the categories with model-by-model results already filled in, both GPT-5.4 and Claude Opus 4.6 reproduced Botan and FreeBSD in 3/3 runs, while only Claude Opus 4.6 reproduced OpenBSD, succeeding in 3/3 runs where GPT-5.4 went 0/3.

Hacker News 热门（buzzing.cc 中文翻译）

导出 Markdown

我们使用公开模型复现了Anthropic的Mythos研究成果

2026-04-18 15:59·75天前·__natty__

阅读原文· blog.vidocsecurity.com

AI 摘要

原文 · 保持原样，未翻译

TL;DR

Anthropic's Mythos release is useful because it makes something concrete: frontier models are getting much better at finding serious vulnerabilities in real software.1