# Mistral AI 发布 Codestral 25.01 代码生成模型

- 来源：Mistral AI：News（网页）
- 发布时间：2025-01-13 00:00
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmppdcr7e0e44slv48mwzd90m
- 原文链接：https://mistral.ai/news/codestral-2501

## AI 摘要

Mistral AI 推出了代码生成模型 Codestral 25.01。相比前代 Codestral-2405，该模型采用了更高效的架构并改进了分词器，使代码生成与补全速度提升约 2 倍。Codestral 25.01 的上下文长度为 256k。在多项基准测试中，它在 HumanEval 平均基准上得分 71.4%，在 HumanEvalFIM（填充中间）平均基准上得分 85.9%，成为同量级模型中代码生成，尤其是填充中间（FIM）任务的 SOTA。

## 正文

Among all the innovations in AI over the past year, code generation has arguably been the most significant. Akin to how the assembly line streamlined manufacturing and the calculator transformed mathematics, coding models represent a significant step change in software development.

Mistral AI has been at the forefront of this change with Codestral , a state of the art (SOTA) coding model released earlier this year. Lightweight, fast, and proficient in over 80 programming languages, Codestral is optimized for low-latency, high-frequency usecases and supports tasks such as fill-in-the-middle (FIM), code correction and test generation. Codestral has been used by thousands of developers as a highly capable coding companion, regularly boosting productivity several times over. And today, Codestral is getting a big upgrade.

Codestral 25.01 features a more efficient architecture and an improved tokenizer than the original, generating and completing code about 2 times faster. The model is now the clear leader for coding in its weight class, and SOTA for FIM use cases across the board.

Benchmarks

We have benchmarked the new Codestral with the leading sub-100B parameter coding models that are widely considered to be best-in-class for FIM tasks.

Overview

Python SQL Average on several languages Model Context length HumanEval MBPP CruxEval LiveCodeBench RepoBench Spider CanItEdit HumanEval (average) HumanEvalFIM (average) Codestral-2501 256k 86.6% 80.2% 55.5% 37.9% 38.0% 66.5% 50.5% 71.4% 85.9% Codestral-2405 22B 32k 81.1% 78.2% 51.3% 31.5% 34.0% 63.5% 50.5% 65.6% 82.1% Codellama 70B instruct 4k 67.1% 70.8% 47.3% 20.0% 11.4% 37.0% 29.5% 55.3% - DeepSeek Coder 33B instruct 16k 77.4% 80.2% 49.5% 27.0% 28.4% 60.0% 47.6% 65.1% 85.3% DeepSeek Coder V2 lite 128k 83.5% 83.2% 49.7% 28.1% 20.0% 72.0% 41.0% 65.9% 84.1%

Python

SQL

Average on several languages

Model

Context length

HumanEval

MBPP

CruxEval

LiveCodeBench

RepoBench

Spider

CanItEdit

HumanEval (average)

HumanEvalFIM (average)

Codestral-2501

256k

86.6%

80.2%

55.5%

37.9%

38.0%

66.5%

50.5%

71.4%

85.9%

Codestral-2405 22B

32k

81.1%

78.2%

51.3%

31.5%

34.0%

63.5%

50.5%

65.6%

82.1%

Codellama 70B instruct

4k

67.1%

70.8%

47.3%

20.0%

11.4%

37.0%

29.5%

55.3%

-

DeepSeek Coder 33B instruct

16k

77.4%

80.2%

49.5%

27.0%

28.4%

60.0%

47.6%

65.1%

85.3%

DeepSeek Coder V2 lite

128k

83.5%

83.2%

49.7%

28.1%

20.0%

72.0%

41.0%

65.9%

84.1%

Per-language

Model HumanEval Python HumanEval C++ HumanEval Java HumanEval Javascript HumanEval Bash HumanEval Typescript HumanEval C# HumanEval (average) Codestral-2501 86.6% 78.9% 72.8% 82.6% 43.0% 82.4% 53.2% 71.4% Codestral-2405 22B 81.1% 68.9% 78.5% 71.4% 40.5% 74.8% 43.7% 65.6% Codellama 70B instruct 67.1% 56.5% 60.8% 62.7% 32.3% 61.0% 46.8% 55.3% DeepSeek Coder 33B instruct 77.4% 65.8% 73.4% 73.3% 39.2% 77.4% 49.4% 65.1% DeepSeek Coder V2 lite 83.5% 68.3% 65.2% 80.8% 34.2% 82.4% 46.8% 65.9%

Model

HumanEval Python

HumanEval C++

HumanEval Java

HumanEval Javascript

HumanEval Bash

HumanEval Typescript

HumanEval C#

HumanEval (average)

Codestral-2501

86.6%

78.9%

72.8%

82.6%

43.0%

82.4%

53.2%

71.4%

Codestral-2405 22B

81.1%

68.9%

78.5%

71.4%

40.5%

74.8%

43.7%

65.6%

Codellama 70B instruct

67.1%

56.5%

60.8%

62.7%

32.3%

61.0%

46.8%

55.3%

DeepSeek Coder 33B instruct

77.4%

65.8%

73.4%

73.3%

39.2%

77.4%

49.4%

65.1%

DeepSeek Coder V2 lite

83.5%

68.3%

65.2%

80.8%

34.2%

82.4%

46.8%

65.9%

FIM (single line exact match)

Model HumanEvalFIM Python HumanEvalFIM Java HumanEvalFIM JS HumanEvalFIM (average) Codestral-2501 80.2% 89.6% 87.96% 85.89% Codestral-2405 22B 77.0% 83.2% 86.08% 82.07% OpenAI FIM API* 80.0% 84.8% 86.5% 83.7% DeepSeek Chat API 78.8% 89.2% 85.78% 84.63% DeepSeek Coder V2 lite 78.7% 87.8% 85.90% 84.13% DeepSeek Coder 33B instruct 80.1% 89.0% 86.80% 85.3%

Model

HumanEvalFIM Python

HumanEvalFIM Java

HumanEvalFIM JS

HumanEvalFIM (average)

Codestral-2501

80.2%

89.6%

87.96%

85.89%

Codestral-2405 22B

77.0%

83.2%

86.08%

82.07%

OpenAI FIM API*

80.0%

84.8%

86.5%

83.7%

DeepSeek Chat API

78.8%

89.2%

85.78%

84.63%

DeepSeek Coder V2 lite

78.7%

87.8%

85.90%

84.13%

DeepSeek Coder 33B instruct

80.1%

89.0%

86.80%

85.3%

FIM pass@1:

Model HumanEvalFIM Python HumanEvalFIM Java HumanEvalFIM JS HumanEvalFIM (average) Codestral-2501 92.5% 97.1% 96.1% 95.3% Codestral-2405 22B 90.2% 90.1% 95.0% 91.8% OpenAI FIM API* 91.1% 91.8% 95.2% 92.7% DeepSeek Chat API 91.7% 96.1% 95.3% 94.4%

Model

HumanEvalFIM Python

HumanEvalFIM Java

HumanEvalFIM JS

HumanEvalFIM (average)

Codestral-2501

92.5%

97.1%

96.1%

95.3%

Codestral-2405 22B

90.2%

90.1%

95.0%

91.8%

OpenAI FIM API*

91.1%

91.8%

95.2%