Artificial Analysis@ArtificialAnlys

2026-05-01 06:59·63天前

AI 摘要

xAI推出Grok 4.3模型，其在Artificial Analysis智能指数得分达53，超越Muse Spark等模型，较前代提升4分。模型在显著降低成本的同时保持智能水平，输入与输出价格分别降低约40%和60%。在真实世界智能体任务上表现突出，GDPval-AA基准得分大幅提升至1500 ELO，超越Gemini 3.1 Pro Preview等多款模型，但仍落后于GPT-5.5 (xhigh)。其在指令遵循和客服任务上表现强劲，但AA-Omniscience非幻觉率略有下降。

xAI has launched Grok 4.3， achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance， ~40% lower input price， and ~60% lower output price than Grok 4.20

The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index， and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite.

Key Takeaways：

➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2： it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index， around 20% lower than Grok 4.20 0309 v2， despite using more output tokens. This makes it one of the lower-cost models at its intelligence level

➤ Large increase in real world agentic task performance： The largest single benchmark improvement is on GDPval-AA， where Grok 4.3 scores an ELO of 1500， up 321 points from Grok 4.20 0309 v2's score of 1179 Grok 4.3， surpassing Gemini 3.1 Pro Preview， Muse Spark， Gpt-5.4 mini （xhigh）， and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA， but still trails GPT-5.5 （xhigh） by 276 Elo points， with an expected win rate of ~17% against GPT-5.5 （xhigh） under the standard Elo formula

➤ Grok 4.3's performs strongly on instruction following and agentic customer support tasks. It gains 5 points on τ2-Bench Telecom to reach 98%， in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2

➤ Gains 8 points on AA-Omniscience Accuracy， but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points， so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate， followed by MiMo-V2.5-Pro， in line with Grok 4.3

Congratulations to @xAI and @elonmusk on the impressive release！

Artificial Analysis@ArtificialAnlys · X

66导出 Markdown

2026-05-01 06:59·63天前

在 X 看原推· x.com

AI 摘要

xAI has launched Grok 4.3， achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance， ~40% lower input price， and ~60% lower output price than Grok 4.20

Key Takeaways：