JetBrains 发布 Mellum2：用于多模型 AI 流水线快速专用任务的 12B MoE 模型

2026-06-02 16:00·30天前·Asif Razzaq

AI 摘要

JetBrains 发布开源模型 Mellum2。该模型为 12B 参数的 MoE 架构，在 10.6 万亿个 token 上训练，采用 Apache 2.0 许可，专为多模型 AI 流水线中的快速、专用任务设计。

原文 · 未翻译

JetBrains released Mellum2, open-sourcing the weights under the Apache 2.0 license. The first version of Mellum was a completion-focused 4B dense model. Mellum2 is its successor: a general-purpose model specialized in software engineering. It covers code generation and editing, debugging, multi-step reasoning, tool use and function calling, agentic coding, and conversational programming assistance.

JetBrains team positions Mellum2 as a “focal model” — a fast, specialized component inside larger AI systems, not a standalone replacement for frontier models.

Architecture

Mellum2 uses a Mixture-of-Experts (MoE) architecture with 12B total parameters and 2.5B active parameters per token. In MoE models, only a subset of parameters runs on each token. Here, the model has 64 experts and activates 8 per token. This keeps per-token compute equivalent to a 2.5B dense model, while the total parameter count provides higher capacity for specialization.

Key architectural details:

Layers: 28

Hidden size: 2304

MoE experts: 64 total, 8 activated per token

Attention: Grouped-Query Attention (GQA) with 32 query heads and 4 KV heads

Sliding Window Attention (SWA): Applied to three of every four layers, with a window size of 1,024. Full attention runs on the remaining layer.

Context length: 131,072 tokens

Multi-Token Prediction (MTP) head: Serves as an auxiliary pre-training objective and as a built-in draft model for speculative decoding

Precision: bfloat16

Vocabulary size: 98,304

The model handles natural language and code. It is not multimodal — there is no image or video input.

Pre-Training

Pre-training spans approximately 10.6 trillion tokens through a three-phase curriculum. The data mixture progressively shifts from diverse web content toward curated code and mathematical content across the three phases.

MarkTechPost（RSS）

65导出 Markdown

JetBrains 发布 Mellum2：用于多模型 AI 流水线快速专用任务的 12B MoE 模型

2026-06-02 16:00·30天前·Asif Razzaq

阅读原文· marktechpost.com

AI 摘要

原文 · 保持原样，未翻译

JetBrains team positions Mellum2 as a “focal model” — a fast, specialized component inside larger AI systems, not a standalone replacement for frontier models.

Architecture

JetBrains 发布 Mellum2：用于多模型 AI 流水线快速专用任务的 12B MoE 模型

JetBrains 发布 Mellum2：用于多模型 AI 流水线快速专用任务的 12B MoE 模型

Basic serve vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct \ --max-model-len 131072

With tool calling vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct \ --max-model-len 131072 \ --enable-auto-tool-choice \ --tool-call-parser hermes

Basic serve vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct \ --max-model-len 131072

With tool calling vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct \ --max-model-len 131072 \ --enable-auto-tool-choice \ --tool-call-parser hermes