# VibeThinker：3B参数推理模型，性能接近Opus 4.5

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-06-24 10:41
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmqrh3ric0k1dslp5h91mgvcr
- 原文链接：https://x.com/rohanpaul_ai/status/2069611815647257024

## AI 摘要

VibeThinker是一个仅3B参数的推理模型，采用SFT+GRPO训练，在推理基准上与Opus 4.5几乎持平。在AIME26上达94.3，LiveCodeBench v6上80.2 Pass@1，近期未见过的LeetCode竞赛中接受率达96.1%，匹配或超越DeepSeek V3.2等大数个量级的旗舰系统。模型基于Qwen2.5-Coder 3B，经过硬样本筛选、多解监督训练、数学/代码/STEM可验证奖励强化学习、自蒸馏、指令聚焦RL及测试时答案检查方法CLR训练而成。

## 正文

VibeThinker is a 3B param model， with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRPO.

Unusually strong for its size： with only 3B parameters， 94.3 on AIME26， 80.2 Pass@1 on LiveCodeBench v6， and 96.1% acceptance on recent unseen LeetCode contests.

"places it in the performance band of first-tier reasoning systems， matching or exceeding flagship models that
are orders of magnitude larger， such as DeepSeek V3.2"

They start from a 3B Qwen2.5-Coder base model， then train it with carefully filtered hard examples， multi-solution supervised training， reinforcement learning on math/code/STEM tasks with verifiable rewards， self-distillation， instruction-focused RL， and a test-time answer-checking method called CLR.
