Mistral AI与NVIDIA联合推出开源模型Mistral NeMo
阅读原文· mistral.aiMistral AI团队与NVIDIA合作发布了Mistral NeMo,这是一个12B参数的大语言模型。它提供高达128k tokens的上下文窗口,并在推理、世界知识和编码能力上达到了其规模的前沿水平。该模型基于标准架构,是Mistral 7B的即插即用替代品,并支持FP8推理。Mistral NeMo以Apache 2.0许可开源,包含预训练和指令微调版本,权重已发布在HuggingFace并可通过其API平台调用。新引入的Tekken分词器在超过100种语言上训练,在压缩多种语言文本时效率显著高于前代。
Today, we are excited to release Mistral NeMo, a 12B model built in collaboration with NVIDIA. Mistral NeMo offers a large context window of up to 128k tokens. Its reasoning, world knowledge, and coding accuracy are state-of-the-art in its size category. As it relies on standard architecture, Mistral NeMo is easy to use and a drop-in replacement in any system using Mistral 7B.
We have released pre-trained base and instruction-tuned checkpoints checkpoints under the Apache 2.0 license to promote adoption for researchers and enterprises. Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss.
The following table compares the accuracy of the Mistral NeMo base model with two recent open-source pre-trained models, Gemma 2 9B, and Llama 3 8B.