Mistral AI 发布开源稀疏混合专家模型 Mixtral 8x7B

2023-12-11 00:00·935天前

AI 摘要

Mistral AI 发布开源模型 Mixtral 8x7B，采用 Apache 2.0 许可证。这是一个稀疏混合专家（SMoE）模型，总参数 46.7B，但每个 token 仅激活 12.9B 参数。其推理速度比 Llama 2 70B 快 6 倍，并在多数基准测试中匹配或超越 GPT-3.5。模型支持 32k token 上下文窗口，掌握英语、法语、意大利语、德语和西班牙语，并具备强大的代码生成能力。同步发布经监督微调和直接偏好优化（DPO）的指令版本 Mixtral 8x7B Instruct，其在 MT-Bench 上得分 8.3。

原文 · 未翻译

Mistral AI continues its mission to deliver the best open models to the developer community. Moving forward in AI requires taking new technological turns beyond reusing well-known architectures and training paradigms. Most importantly, it requires making the community benefit from original models to foster new inventions and usages.

Today, the team is proud to release Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.

Mixtral has the following capabilities.

It gracefully handles a context of 32k tokens.

It handles English, French, Italian, German and Spanish.

It shows strong performance in code generation.

It can be finetuned into an instruction-following model that achieves a score of 8.3 on MT-Bench.

Mixtral is a sparse mixture-of-experts network. It is a decoder-only model where the feedforward block picks from a set of 8 distinct groups of parameters. At every layer, for every token, a router network chooses two of these groups (the “experts”) to process the token and combine their output additively.

This technique increases the number of parameters of a model while controlling cost and latency, as the model only uses a fraction of the total set of parameters per token. Concretely, Mixtral has 46.7B total parameters but only uses 12.9B parameters per token. It, therefore, processes input and generates output at the same speed and for the same cost as a 12.9B model.

Mistral AI：News（网页）

62导出 Markdown

Mistral AI 发布开源稀疏混合专家模型 Mixtral 8x7B

2023-12-11 00:00·935天前

阅读原文· mistral.ai

AI 摘要

原文 · 保持原样，未翻译