# Gemini API 文件搜索现已支持多模态

- 来源：Hacker News 热门（buzzing.cc 中文翻译）
- 作者：gmays
- 发布时间：2026-05-10 19:05
- AIHOT 分数：66
- AIHOT 链接：https://aihot.virxact.com/items/cmozonrbv0jg1sllhhadj9iv1
- 原文链接：https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag

## AI 摘要

Gemini API 的文件搜索功能现已升级为多模态版本。此次更新允许开发者上传并搜索图像、PDF、PPT 和 Word 等多种格式的文件，突破了此前仅支持文本文件的限制。该功能基于检索增强生成技术，能同时处理文本和视觉信息，从上传的文档中提取关键内容以生成更准确的回答。这一变化旨在帮助开发者更高效地构建能理解和分析复杂多模态数据的应用程序。

## 正文

Gemini API File Search is now multimodal: build efficient, verifiable RAG

May 05, 2026

We’re introducing three major updates to the Gemini API File Search tool: multimodal support, custom metadata and page-level citations. These features help developers bring structure to unstructured data for efficient, verifiable RAG.

Your browser does not support the audio element.

Today, we are expanding the Gemini API’s File Search tool. You can now build retrieval-augmented generation (RAG) systems with multimodal data and custom metadata. We’re also introducing page citations to improve grounding and transparency.

Whether you are prototyping a weekend project or scaling a production application for thousands of users, your RAG systems can now natively process and better organize your text and visual data.

Give your apps a photographic memory

File Search now processes images and text together. Powered by the Gemini Embedding 2 model, the tool understands native image data, providing your agents contextual awareness.

Think of a creative agency trying to dig up a specific visual asset. Instead of relying on keywords or filenames, your app can search an entire archive for an image matching a specific emotional tone or visual style described in a natural language brief.

See how developers are already using it:

Filter the noise with custom metadata

Dumping files into a database is easy. Finding the right one at scale is the real challenge. Custom metadata allows you to attach key-value labels to your unstructured data — things like department: Legal or status: Final.

department: Legal

status: Final

By applying metadata filters at query time, your application can scope requests to the data slice required. This significantly reduces noise from irrelevant documents, increasing both the speed and accuracy of your RAG workflows.

Show your work with page citations

When your application pulls an answer from a massive PDF, users need to verify exactly where that answer came from.

File Search now ties the model’s response directly to the original source. It captures the page number for every piece of indexed information. This level of granularity allows you to point users directly to the right spot, which helps build trust and makes your tool immediately useful for rigorous fact-checking.

Get started with File Search

We want to make it as easy as possible to store and retrieve the data that makes your ideas work. The File Search tool handles the heavy infrastructure so you can focus on building the product.

Uploading files and searching across them is simple:

Explore more code snippets in our developer guide and Gemini API documentation to learn how to build with File Search.
