Codestral Mamba 发布

2024-07-16 00:00·717天前

AI 摘要

Mistral AI 团队发布了 Codestral Mamba 模型。该模型由 Albert Gu 和 Tri Dao 协助设计，采用 Mamba 架构而非 Transformer，具备线性时间推理优势，并在代码与推理能力上进行了训练，以达到与 SOTA Transformer 模型相当的性能。模型在高达 256k tokens 的上下文检索能力上进行了测试。它是一个指令微调版本，参数规模为 7,285,403,648，以 Apache 2.0 许可证开源。用户可通过 mistral-inference SDK 或 TensorRT-LLM 进行部署，权重可从 HuggingFace 下载，也已在 la Plateforme 上提供。

原文 · 未翻译

Following the publishing of the Mixtral family, Codestral Mamba is another step in our effort to study and provide new architectures. It is available for free use, modification, and distribution, and we hope it will open new perspectives in architecture research. Codestral Mamba was designed with help from Albert Gu and Tri Dao.

Unlike Transformer models, Mamba models offer the advantage of linear time inference and the theoretical ability to model sequences of infinite length. It allows users to engage with the model extensively with quick responses, irrespective of the input length. This efficiency is especially relevant for code productivity use cases—this is why we trained this model with advanced code and reasoning capabilities, enabling it to perform on par with SOTA transformer-based models.

We have tested Codestral Mamba on in-context retrieval capabilities up to 256k tokens. We expect it to be a great local code assistant!

You can deploy Codestral Mamba using the mistral-inference SDK, which relies on the reference implementations from Mamba's GitHub repository. The model can also be deployed through TensorRT-LLM . For local inference, keep an eye out for support in llama.cpp. You may download the raw weights from HuggingFace . This is an instructed model, with 7,285,403,648 parameters.

For easy testing, we made Codestral Mamba available on la Plateforme (codestral-mamba-2407), alongside its big sister, Codestral 22B. While Codestral Mamba is available under the Apache 2.0 license, Codestral 22B is available under a commercial license for self-deployment or a community license for testing purposes.

codestral-mamba-2407

Mistral AI：News（网页）

49导出 Markdown

Codestral Mamba 发布

2024-07-16 00:00·717天前

阅读原文· mistral.ai

AI 摘要

原文 · 保持原样，未翻译