# TransitLM： 用于无地图公交路线生成的大规模数据集与基准测试

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-21 08:00
- AIHOT 分数：62
- AIHOT 链接：https://aihot.virxact.com/items/cmpgn9hza0gy2sljwd0ceo8x2
- 原文链接：https://arxiv.org/abs/2605.22355

## AI 摘要

TransitLM是首个支持绕过地图依赖的公交路线规划数据集，包含来自中国四个城市超过1300万条记录。它既提供大规模语料用于持续预训练，也设立了三项互补的评估任务。实验表明，基于该数据集训练的大语言模型能高准确率地生成结构合理的路线，并能隐式地将GPS坐标匹配至站点，无需显式地图。这证明公交路线规划可完全从数据中学习，实现直接从起终点信息生成路线的端到端无地图模式。数据集与基准测试代码已开源。

## 正文

Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.
