Mistral AI 推出 Mistral OCR,一款专注于文档理解的光学字符识别 API。该模型支持图像和 PDF 输入,能高精度提取并理解文本、表格、公式及内联图像,输出有序的文本与图像内容。其在内部基准测试中总分 94.89,超越了 GPT-4o-2024-11-20(89.77)与 Gemini-2.0-Flash-001(88.69)。API 命名为 mistral-ocr-latest,定价为 1000 页每美元,批量推理时处理能力翻倍。该 API 已在 la Plateforme 上线,支持部分组织自托管。模型原生支持多语言,单节点处理速度可达每分钟 2000 页。
原文 · 未翻译
Throughout history, advancements in information abstraction and retrieval have driven human progress. From hieroglyphs to papyri, the printing press to digitization, each leap has made human knowledge more accessible and actionable, fueling further innovation.
Today, we’re at the precipice of the next big leap—to unlock the collective intelligence of all digitized information. Approximately 90% of the world’s organizational data is stored as documents, and to harness this potential, we are introducing Mistral OCR .Mistral OCR is an Optical Character Recognition API that sets a new standard in document understanding. Unlike other models, Mistral OCR comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.
As a result, Mistral OCR is an ideal model to use in combination with a RAG system taking multimodal documents (such as slides or complex PDFs) as input.
We have made Mistral OCR as the default model for document understanding across millions of users on Le Chat, and are releasing the API mistral-ocr-latest at 1000 pages / $ (and approximately double the pages per dollar with batch inference). The API is available today on our developer suite la Plateforme , and coming soon to our cloud and inference partners, as well as on-premises.
Highlights
State of the art understanding of complex documents
State of the art understanding of complex documents
Mistral AI 推出 Mistral OCR,一款专注于文档理解的光学字符识别 API。该模型支持图像和 PDF 输入,能高精度提取并理解文本、表格、公式及内联图像,输出有序的文本与图像内容。其在内部基准测试中总分 94.89,超越了 GPT-4o-2024-11-20(89.77)与 Gemini-2.0-Flash-001(88.69)。API 命名为 mistral-ocr-latest,定价为 1000 页每美元,批量推理时处理能力翻倍。该 API 已在 la Plateforme 上线,支持部分组织自托管。模型原生支持多语言,单节点处理速度可达每分钟 2000 页。
原文 · 保持原样,未翻译
Throughout history, advancements in information abstraction and retrieval have driven human progress. From hieroglyphs to papyri, the printing press to digitization, each leap has made human knowledge more accessible and actionable, fueling further innovation.
Today, we’re at the precipice of the next big leap—to unlock the collective intelligence of all digitized information. Approximately 90% of the world’s organizational data is stored as documents, and to harness this potential, we are introducing Mistral OCR .Mistral OCR is an Optical Character Recognition API that sets a new standard in document understanding. Unlike other models, Mistral OCR comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.
Doc-as-prompt, structured output
Selectively available to self-host for organizations dealing with highly sensitive or classified information
Selectively available to self-host for organizations dealing with highly sensitive or classified information
Let’s dive into each.
State of the art understanding of complex documents
Mistral OCR excels in understanding complex document elements, including interleaved imagery, mathematical expressions, tables, and advanced layouts such as LaTeX formatting. The model enables deeper understanding of rich documents such as scientific papers with charts, graphs, equations and figures.
Below is an example of the model extracting text as well as imagery from a given PDF into a markdown file. You can access the notebook here .
Below we have side-by-side comparisons of PDFs and their respective OCR's outputs. Hover the slider to switch between input and output.
Tables + Figures
OCR result
Math
OCR result
Hindi
OCR result
Document
OCR result
Arabic
OCR result
Top-tier benchmarks
Mistral OCR has consistently outperformed other leading OCR models in rigorous benchmark tests. Its superior accuracy across multiple aspects of document analysis is illustrated below. We extract embedded images from documents along with text. The other LLMs compared below, do not have that capability. For a fair comparison, we evaluate them on our internal “text-only” test-set containing various publication papers, and PDFs from the web; below:
Since Mistral’s founding, we have aspired to serve the world with our models, and consequently strived for multilingual capabilities across our offerings. Mistral OCR takes this to a new level, being able to parse, understand, and transcribe thousands of scripts, fonts, and languages across all continents. This versatility is crucial for both global organizations that handle documents from diverse linguistic backgrounds, as well as hyperlocal businesses serving niche markets.
Model Fuzzy Match in Generation Google-Document-AI 95.88 Gemini-2.0-Flash-001 96.53 Azure OCR 97.31 Mistral OCR 2503 99.02
Model
Fuzzy Match in Generation
Google-Document-AI
95.88
Gemini-2.0-Flash-001
96.53
Azure OCR
97.31
Mistral OCR 2503
99.02
Benchmarks by language:
Language Azure OCR Google Doc AI Gemini-2.0-Flash-001 Mistral OCR 2503 ru 97.35 95.56 96.58 99.09 fr 97.50 96.36 97.06 99.20 hi 96.45 95.65 94.99 97.55 zh 91.40 90.89 91.85 97.11 pt 97.96 96.24 97.25 99.42 de 98.39 97.09 97.19 99.51 es 98.54 97.52 97.75 99.54 tr 95.91 93.85 94.66 97.00 uk 97.81 96.24 96.70 99.29 it 98.31 97.69 97.68 99.42 ro 96.45 95.14 95.88 98.79
Language
Azure OCR
Google Doc AI
Gemini-2.0-Flash-001
Mistral OCR 2503
ru
97.35
95.56
96.58
99.09
fr
97.50
96.36
97.06
99.20
hi
96.45
95.65
94.99
97.55
zh
91.40
90.89
91.85
97.11
pt
97.96
96.24
97.25
99.42
de
98.39
97.09
97.19
99.51
es
98.54
97.52
97.75
99.54
tr
95.91
93.85
94.66
97.00
uk
97.81
96.24
96.70
99.29
it
98.31
97.69
97.68
99.42
ro
96.45
95.14
95.88
98.79
Fastest in its category
Being lighter weight than most models in the category, Mistral OCR performs significantly faster than its peers, processing up to 2000 pages per minute on a single node. The ability to rapidly process documents ensures continuous learning and improvement even for high-throughput environments.
Doc-as-prompt, structured output
Mistral OCR also introduces the use of documents as prompts, enabling more powerful and precise instructions. This capability allows users to extract specific information from documents and format it in structured outputs, such as JSON. Users can chain extracted outputs into downstream function calls and build agents. See this example notebook .
Available to self-host on a selective basis
For organizations with stringent data privacy requirements, Mistral OCR offers a self-hosting option. This ensures that sensitive or classified information remains secure within your own infrastructure, providing compliance with regulatory and security standards. If you would like to explore self-deployment with us, please let us know .
Use cases
We are empowering our beta customers to elevate their organizational knowledge by transforming their extensive document repositories into actions and solutions. Some of the key use cases where our technology is making a significant impact include:
As a result, Mistral OCR is an ideal model to use in combination with a RAG system taking multimodal documents (such as slides or complex PDFs) as input.
We have made Mistral OCR as the default model for document understanding across millions of users on Le Chat, and are releasing the API mistral-ocr-latest at 1000 pages / $ (and approximately double the pages per dollar with batch inference). The API is available today on our developer suite la Plateforme , and coming soon to our cloud and inference partners, as well as on-premises.
Highlights
State of the art understanding of complex documents
State of the art understanding of complex documents
Natively multilingual and multimodal
Natively multilingual and multimodal
Top-tier benchmarks
Top-tier benchmarks
Fastest in its category
Fastest in its category
Doc-as-prompt, structured output
Doc-as-prompt, structured output
Selectively available to self-host for organizations dealing with highly sensitive or classified information
Selectively available to self-host for organizations dealing with highly sensitive or classified information
Let’s dive into each.
State of the art understanding of complex documents
Mistral OCR excels in understanding complex document elements, including interleaved imagery, mathematical expressions, tables, and advanced layouts such as LaTeX formatting. The model enables deeper understanding of rich documents such as scientific papers with charts, graphs, equations and figures.
Below is an example of the model extracting text as well as imagery from a given PDF into a markdown file. You can access the notebook here .
Below we have side-by-side comparisons of PDFs and their respective OCR's outputs. Hover the slider to switch between input and output.
Tables + Figures
OCR result
Math
OCR result
Hindi
OCR result
Document
OCR result
Arabic
OCR result
Top-tier benchmarks
Mistral OCR has consistently outperformed other leading OCR models in rigorous benchmark tests. Its superior accuracy across multiple aspects of document analysis is illustrated below. We extract embedded images from documents along with text. The other LLMs compared below, do not have that capability. For a fair comparison, we evaluate them on our internal “text-only” test-set containing various publication papers, and PDFs from the web; below:
Since Mistral’s founding, we have aspired to serve the world with our models, and consequently strived for multilingual capabilities across our offerings. Mistral OCR takes this to a new level, being able to parse, understand, and transcribe thousands of scripts, fonts, and languages across all continents. This versatility is crucial for both global organizations that handle documents from diverse linguistic backgrounds, as well as hyperlocal businesses serving niche markets.
Model Fuzzy Match in Generation Google-Document-AI 95.88 Gemini-2.0-Flash-001 96.53 Azure OCR 97.31 Mistral OCR 2503 99.02
Model
Fuzzy Match in Generation
Google-Document-AI
95.88
Gemini-2.0-Flash-001
96.53
Azure OCR
97.31
Mistral OCR 2503
99.02
Benchmarks by language:
Language Azure OCR Google Doc AI Gemini-2.0-Flash-001 Mistral OCR 2503 ru 97.35 95.56 96.58 99.09 fr 97.50 96.36 97.06 99.20 hi 96.45 95.65 94.99 97.55 zh 91.40 90.89 91.85 97.11 pt 97.96 96.24 97.25 99.42 de 98.39 97.09 97.19 99.51 es 98.54 97.52 97.75 99.54 tr 95.91 93.85 94.66 97.00 uk 97.81 96.24 96.70 99.29 it 98.31 97.69 97.68 99.42 ro 96.45 95.14 95.88 98.79
Language
Azure OCR
Google Doc AI
Gemini-2.0-Flash-001
Mistral OCR 2503
ru
97.35
95.56
96.58
99.09
fr
97.50
96.36
97.06
99.20
hi
96.45
95.65
94.99
97.55
zh
91.40
90.89
91.85
97.11
pt
97.96
96.24
97.25
99.42
de
98.39
97.09
97.19
99.51
es
98.54
97.52
97.75
99.54
tr
95.91
93.85
94.66
97.00
uk
97.81
96.24
96.70
99.29
it
98.31
97.69
97.68
99.42
ro
96.45
95.14
95.88
98.79
Fastest in its category
Being lighter weight than most models in the category, Mistral OCR performs significantly faster than its peers, processing up to 2000 pages per minute on a single node. The ability to rapidly process documents ensures continuous learning and improvement even for high-throughput environments.
Doc-as-prompt, structured output
Mistral OCR also introduces the use of documents as prompts, enabling more powerful and precise instructions. This capability allows users to extract specific information from documents and format it in structured outputs, such as JSON. Users can chain extracted outputs into downstream function calls and build agents. See this example notebook .
Available to self-host on a selective basis
For organizations with stringent data privacy requirements, Mistral OCR offers a self-hosting option. This ensures that sensitive or classified information remains secure within your own infrastructure, providing compliance with regulatory and security standards. If you would like to explore self-deployment with us, please let us know .
Use cases
We are empowering our beta customers to elevate their organizational knowledge by transforming their extensive document repositories into actions and solutions. Some of the key use cases where our technology is making a significant impact include: