Rohan Paul@rohanpaul_ai

2026-06-12 09:12·21天前

AI 摘要

atomic[.]chat 在单张 H100（FP8）上对比 DiffusionGemma 26B A4B 与 Gemma4 26B A4B 在事实性写作任务中的表现。DiffusionGemma 速度达 763 tok/s（3.7 秒），是 Gemma4（218 tok/s，15.1 秒）的 4 倍，但错误率显著更高。在 Steve Jobs 传记、Tetris 历史和 BeOS 故事三项任务中，Gemma4 答对 45 个事实、错 5 个；DiffusionGemma 仅对 33 个、错 28 个。主题越冷门错误越多：Jobs 4 错、Tetris 12 错、BeOS 12 错，例如将 Jobs 母亲写为 Clara Clley、为 Tetris 发明者虚构同事 Geri Gulovik、将 BeBox 价格误报为 $9,999（实价 $1,600）。原因在于 DiffusionGemma 一次生成 256 tokens 并多轮抛光，只追求文本流畅性而非事实准确性。Google 官方也建议在事实重要时使用常规 Gemma4。

atomic【.】chat shared a revealing comparison of local open-weight LLMs running on their own hardware.

They benchmarked the new DiffusionGemma （diffusion text model） vs. Gemma4 26B A4B （autoregressive model） on a single H100 （FP8）.

The 4X speed of DiffusionGemma changes the shape of error.

Autoregressive models move left to right， one token at a time， which is slower， but each new word is conditioned on the exact text already written.

Diffusion models write many tokens at once， then revise the block over several passes， so they can feel fast because the model is not waiting to finish token 1 before starting token 2.

atomic【.】chat， a desktop app for running LLMs locally

atomic.chatDiffusion Gemma is 4x faster, but makes 6x more mistakes! We benchmarked the new diffusion LLM against its autoregressive twin on a single H100 (FP8). We gave e...

Google 评测/基准

在 X 查看原推

Rohan Paul@rohanpaul_ai · X

56导出 Markdown

2026-06-12 09:12·21天前

在 X 看原推· x.com

AI 摘要

atomic【.】chat shared a revealing comparison of local open-weight LLMs running on their own hardware.

They benchmarked the new DiffusionGemma （diffusion text model） vs. Gemma4 26B A4B （autoregressive model） on a single H100 （FP8）.