# 腾讯发布开源推理模型Hy3-preview，综合评分42分落后于近期同类模型

- 来源：Artificial Analysis (@ArtificialAnlys)
- 发布时间：2026-04-30 22:04
- AIHOT 分数：56
- AIHOT 链接：https://aihot.virxact.com/items/cmolkbk3t00c4slqtt4d86s9r
- 原文链接：https://x.com/ArtificialAnlys/status/2049852417316143393

## AI 摘要

腾讯发布开源混合专家模型Hy3-preview，总参数量2950亿，激活参数量210亿。其在Artificial Analysis综合智能指数上得分42，落后于近期开源的GLM-5.1、DeepSeek V4 Flash及Qwen3.6 27B等推理模型。具体评测表现不均衡：在真实世界任务基准GDPval-AA上落后于主要竞品，但在研究级物理评测CritPt上与高分模型GLM-5.1持平；其相对弱项在于AA-Omniscience指数，幻觉率较高。模型采用Tencent HY社区许可协议，商业使用受限，已在Hugging Face和SiliconFlowAI平台提供。

## 正文

Tencent has released Hy3-preview， an open weights reasoning model scoring 42 on the Artificial Analysis Intelligence Index， trailing recent open weights peers

Hy3-preview is the latest model from @TencentHunyuan. It is a 295B total / 21B active parameter Mixture-of-Experts model， smaller than its December 2025 predecessor Tencent HY 2.0 （406B total / 32B active）. Recent leading open weights reasoning models include Qwen3.6 27B （Reasoning， 46）， DeepSeek V4 Flash （Reasoning， Max Effort， 47， 284B / 13B） and GLM-5.1 （Reasoning， 51， 744B / 40B）. The Intelligence Index is the Artificial Analysis synthesis metric incorporating 10 evaluations covering agentic tasks， coding and scientific reasoning.

Key takeaways：
➤ Hy3-preview trails recent open weights peers on GDPval-AA. Hy3-preview scores an Elo of 1235 on GDPval-AA， our agentic real-world work tasks benchmark， behind Qwen3.6 27B （Reasoning， 1414）， DeepSeek V4 Flash （Reasoning， Max Effort， 1388） and GLM-5.1 （Reasoning， 1535）. GDPval-AA tests models on real-world tasks across 44 occupations and 9 major industries.
➤ Hy3-preview ties GLM-5.1 （Reasoning） on CritPt despite scoring nearly 10 Intelligence Index points lower. Hy3-preview scores 4.6% on CritPt （research-level physics）， matching GLM-5.1 （Reasoning， 51 on the Intelligence Index） and ahead of Qwen3.6 27B （Reasoning， 1.1%） but behind DeepSeek V4 Flash （Reasoning， Max Effort， 7.1%）. It trails the open weights leaders， including DeepSeek V4 Pro （Reasoning， Max Effort， 12.9%） and Kimi K2.6 （8.0%）.
➤ Hy3-preview used ~125M output tokens to run the Intelligence Index. This is ~12% more than GLM-5.1 （Reasoning， 112M） and less than Qwen3.6 27B （Reasoning， 144M） and DeepSeek V4 Flash （Reasoning， Max Effort， 241M）.
➤ AA-Omniscience is a relative weakness compared to peers. Hy3-preview scores -35 on the Artificial Analysis Omniscience Index with 28% accuracy and an 87% hallucination rate. This trails DeepSeek V4 Flash （Reasoning， Max Effort， -23）， Qwen3.6 27B （Reasoning， -20） and GLM-5.1 （Reasoning， 2）.

Other information：
➤ Size： 295B total parameters， 21B active parameters
➤ Context window： 256K tokens
➤ License： Tencent HY Community License Agreement， with restricted commercial use
➤ Availability： Weights are available on @huggingface Face and the model is also available on @SiliconFlowAI at $0/$0 per 1M input/output tokens
