Artificial Analysis@ArtificialAnlys

2026-04-03 11:57·90天前

AI 摘要

Sarvam AI发布印度首批从头预训练的开源权重模型Sarvam 105B与30B，采用MoE架构并在本土训练。两款模型在Intelligence Index分别得分18和12，支持推理与非推理双模式。105B在Agentic任务表现优于部分同类模型，但TerminalBench Hard编码测试成绩落后且幻觉率较高。模型采用Apache 2.0协议开源，上下文窗口128K/65K tokens，目前通过API免费提供服务。

India enters the open-weights AI race with its largest models pre-trained from scratch： Sarvam 105B and Sarvam 30B

@SarvamAI's Sarvam 105B and Sarvam 30B score 18 and 12 on the Artificial Analysis Intelligence Index respectively. Announced at the India AI Impact Summit 2026 and open-sourced under Apache 2.0， both are Mixture-of-Experts models trained entirely in India using compute provided under the IndiaAI Mission （@OfficialINDIAai）. Both support reasoning and non-reasoning modes.

These are an improvement from Sarvam's previous model， Sarvam M （8 on Intelligence Index， 23.6B parameters）， which was based on Mistral Small rather than pre-trained from scratch. Sarvam 105B has 106B total parameters with ~10B active per token and a 128K context window. Sarvam 30B has 32B total parameters with ~2.4B active per token and a 65K context window. Alongside the text models， Sarvam also announced Saaras v3 （Speech to Text） and Bulbul v3 （Text to Speech） with a focus on Indic languages.

Key takeaways in reasoning mode：

➤ Sarvam 105B scores 18 on the Intelligence Index. Among ~100B-class open-weights reasoning models， it trails GLM-4.5-Air （23）， INTELLECT-3 （22）， Mistral Small 4 （27）， and gpt-oss-120B （High， 33）. All four peers also activate more parameters per token

➤ Sarvam 30B scores 12 on the Intelligence Index. Among ~30B-class open-weights reasoning models， it trails GLM-4.7-Flash （30）， Nemotron Cascade 2 30B A3B （28）， Qwen3 30B A3B 2507 （22）， and Qwen3 32B （17）. Sarvam 30B activates fewer parameters than these peers.

➤ Sarvam 105B's relative strength is in select agentic tasks. Its agentic index of 25 places it ahead of INTELLECT-3 （20） and GLM-4.5-Air （21） despite trailing both on overall intelligence. Its GDPval index of 773 also edges ahead of GLM-4.5-Air （665）. Both new models are a large step up from Sarvam M （Reasoning）， which scored 8 on the Intelligence Index.

➤ Compared to peers， both models score lower on TerminalBench Hard （Agentic Coding & Terminal Use） and AA-Omniscience. Sarvam 105B scored 1.5% and Sarvam 30B scored 2.3% on TerminalBench Hard， compared to GLM-4.5-Air （20.5%） and INTELLECT-3 （9.1%）. The AA-Omniscience Index is -60 for Sarvam 105B and -72 for Sarvam 30B. Both models have high hallucination rates relative to their accuracy， and both attempt to answer far more questions rather than abstaining， which drives the negative scores.

Artificial Analysis@ArtificialAnlys · X

导出 Markdown

2026-04-03 11:57·90天前

在 X 看原推· x.com

AI 摘要

India enters the open-weights AI race with its largest models pre-trained from scratch： Sarvam 105B and Sarvam 30B