# Mistral AI与NVIDIA联合推出开源模型Mistral NeMo

- 来源：Mistral AI：News（网页）
- 发布时间：2024-07-18 00:00
- AIHOT 分数：62
- AIHOT 链接：https://aihot.virxact.com/items/cmppdcr7e0e4eslv44d6vzqhz
- 原文链接：https://mistral.ai/news/mistral-nemo

## AI 摘要

Mistral AI团队与NVIDIA合作发布了Mistral NeMo，这是一个12B参数的大语言模型。它提供高达128k tokens的上下文窗口，并在推理、世界知识和编码能力上达到了其规模的前沿水平。该模型基于标准架构，是Mistral 7B的即插即用替代品，并支持FP8推理。Mistral NeMo以Apache 2.0许可开源，包含预训练和指令微调版本，权重已发布在HuggingFace并可通过其API平台调用。新引入的Tekken分词器在超过100种语言上训练，在压缩多种语言文本时效率显著高于前代。

## 正文

Today, we are excited to release Mistral NeMo, a 12B model built in collaboration with NVIDIA. Mistral NeMo offers a large context window of up to 128k tokens. Its reasoning, world knowledge, and coding accuracy are state-of-the-art in its size category. As it relies on standard architecture, Mistral NeMo is easy to use and a drop-in replacement in any system using Mistral 7B.

We have released pre-trained base and instruction-tuned checkpoints checkpoints under the Apache 2.0 license to promote adoption for researchers and enterprises. Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss.

The following table compares the accuracy of the Mistral NeMo base model with two recent open-source pre-trained models, Gemma 2 9B, and Llama 3 8B.

Table 1: Mistral NeMo base model performance compared to Gemma 2 9B and Llama 3 8B.

Multilingual Model for the Masses

The model is designed for global, multilingual applications. It is trained on function calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. This is a new step toward bringing frontier AI models to everyone’s hands in all languages that form human culture.

Figure 1: Mistral NeMo performance on multilingual benchmarks.

Tekken, a more efficient tokenizer

Mistral NeMo uses a new tokenizer, Tekken, based on Tiktoken, that was trained on over more than 100 languages, and compresses natural language text and source code more efficiently than the SentencePiece tokenizer used in previous Mistral models. In particular, it is ~30% more efficient at compressing source code, Chinese, Italian, French, German, Spanish, and Russian. It is also 2x and 3x more efficient at compressing Korean and Arabic, respectively. Compared to the Llama 3 tokenizer, Tekken proved to be more proficient in compressing text for approximately 85% of all languages.

Figure 2: Tekken compression rate.

Instruction fine-tuning

Mistral NeMO underwent an advanced fine-tuning and alignment phase. Compared to Mistral 7B, it is much better at following precise instructions, reasoning, handling multi-turn conversations, and generating code.

Table 2: Mistral NeMo instruction-tuned model accuracy. Evals done with GPT4o as judge on official references.

Links

Weights are hosted on HuggingFace both for the base and for the instruct models. You can try Mistral NeMo now with mistral-inference and adapt it with mistral-finetune. Mistral NeMo is exposed on la Plateforme under the name open-mistral-nemo-2407. This model is also packaged in a container as NVIDIA NIM inference microservice and available from ai.nvidia.com .

open-mistral-nemo-2407

0%
