# 紧凑型语言模型在RAG系统中实现设备端推理，无需GPU

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-29 08:00
- AIHOT 分数：51
- AIHOT 链接：https://aihot.virxact.com/items/cmr1p21wt00qasl8zdg1m88u2
- 原文链接：https://arxiv.org/abs/2606.30062

## AI 摘要

一项研究评估了小型语言模型在检索增强生成（RAG）系统中的生成性能。实验采用开源与专有数据集，覆盖多种学科与问题类型。结果表明，配备小语言模型的RAG系统可在设备端直接运行，且无需任何GPU硬件，在合理时间内完成推理。实验代码及补充材料已通过GitHub仓库公开。

## 正文

While large language models have been dominating the research landscape recently, small language models remain highly relevant across various domains; yet, they receive far less attention. In this study, we investigate how smaller language models perform during the generation stage within a Retrieval-Augmented Generation (RAG) system. To benchmark these models effectively, we utilised both open-source and proprietary datasets covering diverse subject areas and question types. Our findings demonstrate that a RAG system with small language models can be executed directly on-device without requiring any GPU hardware within a reasonable time. The experimental code and links to the supplementary materials can be accessed through the GitHub repository: https://github.com/SibNN/SLM-RAG-EVAL.