Ethan Mollick@emollick

2026-05-01 08:02·63天前

AI 摘要

xAI发布Grok 4.3，其在Artificial Analysis智能指数得分53，性能优于Grok 4.20、Muse Spark等模型。核心改进在于“性价比”：输入与输出价格较前代分别降低约40%和60%，且基准测试套件运行成本下降。该版本在GDPval-AA等现实智能体任务上表现显著提升，指令遵循与客服任务强劲。但推文指出，其表现仍落后于最新的中国开源模型，并批评GDPval-AA测试本身价值有限。

The new Grok comes in below the latest Chinese open weights models， Grok 4 was at the frontier when released.

（&amp； Artificial Analysis： please stop using GDPval-AA which is not a useful test of anything except a model's ability to impress Gemini as a judge）

Artificial AnalysisxAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower ...

大佬观点行业动态评测/基准

在 X 查看原推导出 Markdown

Ethan Mollick@emollick · X

61导出 Markdown

2026-05-01 08:02·63天前

在 X 看原推· x.com

AI 摘要

The new Grok comes in below the latest Chinese open weights models， Grok 4 was at the frontier when released.

（&amp； Artificial Analysis： please stop using GDPval-AA which is not a useful test of anything except a model's ability to impress Gemini as a judge）

Artificial AnalysisxAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower ...

大佬观点行业动态