# Mistral AI 开源多模态模型 Pixtral Large 发布

- 来源：Mistral AI：News（网页）
- 发布时间：2024-11-18 00:00
- AIHOT 分数：60
- AIHOT 链接：https://aihot.virxact.com/items/cmppdcr7e0e46slv4gh9m0q82
- 原文链接：https://mistral.ai/news/pixtral-large

## AI 摘要

Mistral AI 基于 Mistral Large 2 发布了开源多模态模型 Pixtral Large。该模型包含 123B 多模态解码器和 1B 视觉编码器，支持 128K 上下文窗口。性能方面，它在 MathVista、DocVQA、ChartQA 和 MM-MT-Bench 等基准测试中超越 GPT-4o 与 Gemini-1.5 Pro，并在 LMSYS Vision Leaderboard 上成为得分最高的开源模型。需要注意的是，该模型已停止维护，并被更新的视觉模型所取代。

## 正文

Pixtral Large is no longer maintained and has been replaced by our latest, more powerful vision and multimodal models.

Explore our current vision capabilities

Pixtral Large in short:

Frontier-class multimodal performance

Frontier-class multimodal performance

State-of-the-art on MathVista, DocVQA, VQAv2

State-of-the-art on MathVista, DocVQA, VQAv2

Extends Mistral Large 2 without compromising text performance

Extends Mistral Large 2 without compromising text performance

123B multimodal decoder, 1B parameter vision encoder

123B multimodal decoder, 1B parameter vision encoder

128K context window: fits minimum of 30 high-resolution images

128K context window: fits minimum of 30 high-resolution images

Use:

Use:

Try it on le Chat In our API as pixtral-large-latest Download it here

Try it on le Chat

Try it on le Chat

In our API as pixtral-large-latest

In our API as pixtral-large-latest

pixtral-large-latest

Download it here

Download it here

Today we announce Pixtral Large, a 124B open-weights multimodal model built on top of Mistral Large 2. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.

The model is available under the Mistral Research License (MRL) for research and educational use; and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.

Performance

We evaluate Pixtral Large against frontier models on a set of standard multimodal benchmarks, through a common testing harness. On MathVista, which evaluates complex mathematical reasoning over visual data, our model achieves 69.4%, outperforming all other models. To assess reasoning capabilities over complex charts and documents, we evaluate performance using ChartQA and DocVQA, where Pixtral Large surpasses GPT-4o and Gemini-1.5 Pro.

Pixtral Large also demonstrates competitive capabilities on MM-MT-Bench, outperforming all of Claude-3.5 Sonnet (new), Gemini-1.5 Pro and GPT-4o (latest). MM-MT-Bench is an open-source , judge-based evaluation intended to reflect real-world use cases of multimodal LLMs (see the Pixtral 12B technical report for details).

Finally, on the LMSys Vision Leaderboard , Pixtral Large is the best open-weights model by a substantial margin, surpassing the nearest competitor by nearly 50 ELO points. Pixtral Large also outperforms a number of proprietary models, such as GPT-4o (August '24).

Qualitative Samples

Prompt:

I bought the coffee and the sausage. How much do I owe? Add a 18% tip.

Response:

To calculate the total amount you owe, including the 18% tip, follow these steps:

Calculate the cost of the coffee and the sausage:

Calculate the cost of the coffee and the sausage:

2x Latte Macchiato: 2 * 4.50 CHF = 9.00 CHF 1x Schweinschnitzel: 22.00 CHF Total cost for coffee and sausage: 9.00 CHF + 22.00 CHF = 31.00 CHF

2x Latte Macchiato: 2 * 4.50 CHF = 9.00 CHF

2x Latte Macchiato: 2 * 4.50 CHF = 9.00 CHF

1x Schweinschnitzel: 22.00 CHF

1x Schweinschnitzel: 22.00 CHF

Total cost for coffee and sausage: 9.00 CHF + 22.00 CHF = 31.00 CHF

Total cost for coffee and sausage: 9.00 CHF + 22.00 CHF = 31.00 CHF

Calculate the 18% tip:

Calculate the 18% tip:

Tip: 31.00 CHF * 0.18 = 5.58 CHF

Tip: 31.00 CHF * 0.18 = 5.58 CHF

Tip: 31.00 CHF * 0.18 = 5.58 CHF

Add the tip to the total cost:

Add the tip to the total cost:

Total amount owed: 31.00 CHF + 5.58 CHF = 36.58 CHF

Total amount owed: 31.00 CHF + 5.58 CHF = 36.58 CHF

Total amount owed: 31.00 CHF + 5.58 CHF = 36.58 CHF

So, you owe a total of 36.58 CHF.

Prompt:

When did things start to go wrong for dark dragon?

Response:

The training loss for the "dark-dragon-50" model starts to exhibit significant instability around the 10,000 steps mark. There is a noticeable spike in the loss value at this point, and the instability continues with another major spike around the 20,000 steps mark. These spikes indicate that the training process for the "dark-dragon-50" model began to encounter issues starting around the 10,000 steps mark.

Prompt:

Which companies use models made by Mistral AI?

Response:

The companies that use models made by Mistral AI, as shown in the image, are:

BNP Paribas

BNP Paribas

Brave

Brave

Cloudflare

Cloudflare

CMA CGM

CMA CGM

Front

Front

One more thing...

Along with Pixtral Large, Mistral Large, our state-of-the-art text model, also gets an update. The model is available as pixtral-large-latest on our API , as well as for self-deployment as Mistral Large 24.11 on HuggingFace under the Mistral Research License (MRL) for research, or with a commercial license from Mistral AI for commercial use.

pixtral-large-latest

This newest model provides a significant upgrade on the previous Mistral Large 24.07, with notable improvements in long context understanding, a new system prompt, and more accurate function calling. The model is highly capable for RAG and agentic workflows, making it a suitable choice for enterprise use cases such as knowledge exploration and sharing, semantic understanding of documents, task automation, and improved customer experiences.

Mistral Large 24.11 will be available from our cloud provider partners soon, starting with Google Cloud and Microsoft Azure within a week.

0%
