Rohan Paul@rohanpaul_ai

2026-06-24 10:41·8天前

AI 摘要

VibeThinker是一个仅3B参数的推理模型，采用SFT+GRPO训练，在推理基准上与Opus 4.5几乎持平。在AIME26上达94.3，LiveCodeBench v6上80.2 Pass@1，近期未见过的LeetCode竞赛中接受率达96.1%，匹配或超越DeepSeek V3.2等大数个量级的旗舰系统。模型基于Qwen2.5-Coder 3B，经过硬样本筛选、多解监督训练、数学/代码/STEM可验证奖励强化学习、自蒸馏、指令聚焦RL及测试时答案检查方法CLR训练而成。

VibeThinker is a 3B param model， with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRPO.

Unusually strong for its size： with only 3B parameters， 94.3 on AIME26， 80.2 Pass@1 on LiveCodeBench v6， and 96.1% acceptance on recent unseen LeetCode contests.

"places it in the performance band of first-tier reasoning systems， matching or exceeding flagship models that are orders of magnitude larger， such as DeepSeek V3.2"

They start from a 3B Qwen2.5-Coder base model， then train it with carefully filtered hard examples， multi-solution supervised training， reinforcement learning on math/code/STEM tasks with verifiable rewards， self-distillation， instruction-focused RL， and a test-time answer-checking method called CLR.

推理数据/训练模型发布

在 X 查看原推导出 Markdown

Rohan Paul@rohanpaul_ai · X

52导出 Markdown

2026-06-24 10:41·8天前

在 X 看原推· x.com

AI 摘要

VibeThinker is a 3B param model， with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRPO.

Unusually strong for its size： with only 3B parameters， 94.3 on AIME26， 80.2 Pass@1 on LiveCodeBench v6， and 96.1% acceptance on recent unseen LeetCode contests.

"places it in the performance band of first-tier reasoning systems， matching or exceeding flagship models that are orders of magnitude larger， such as DeepSeek V3.2"