紧凑型语言模型在RAG系统中实现设备端推理，无需GPU

2026-06-29 08:00·3天前

AI 摘要

一项研究评估了小型语言模型在检索增强生成（RAG）系统中的生成性能。实验采用开源与专有数据集，覆盖多种学科与问题类型。结果表明，配备小语言模型的RAG系统可在设备端直接运行，且无需任何GPU硬件，在合理时间内完成推理。实验代码及补充材料已通过GitHub仓库公开。

原文 · 未翻译

While large language models have been dominating the research landscape recently, small language models remain highly relevant across various domains; yet, they receive far less attention. In this study, we investigate how smaller language models perform during the generation stage within a Retrieval-Augmented Generation (RAG) system. To benchmark these models effectively, we utilised both open-source and proprietary datasets covering diverse subject areas and question types. Our findings demonstrate that a RAG system with small language models can be executed directly on-device without requiring any GPU hardware within a reasonable time. The experimental code and links to the supplementary materials can be accessed through the GitHub repository: https://github.com/SibNN/SLM-RAG-EVAL.

HuggingFace Daily Papers（社区热门论文）

51导出 Markdown

紧凑型语言模型在RAG系统中实现设备端推理，无需GPU

2026-06-29 08:00·3天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译

检索增强端侧论文/研究