# Qwen-RobotNav技术报告：为智能体导航系统设计的可扩展导航模型

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-18 08:00
- AIHOT 分数：43
- AIHOT 链接：https://aihot.virxact.com/items/cmqzlw9sf006tsl4ergf7sap2
- 原文链接：https://arxiv.org/abs/2606.18112

## AI 摘要

Qwen-RobotNav通过参数化接口支持多种任务模式（如指令跟随、目标搜索）和可控观察参数（模型token预算、每摄像头权重），训练时随机化所有参数，推理时无需修改骨干架构。模型在15.6M样本上训练，与视觉语言数据共同训练避免仅轨迹训练导致的崩溃。上层规划器可在回合中动态切换其任务模式与上下文策略，通过重复调用同一模型组合复杂行为。Qwen-RobotNav在多项导航基准上取得新最优结果，参数从2B扩展到8B表现出良好缩放规律，多任务联合训练形成跨任务家族共享的空间规划基板，并在真实机器人上展现强零样本泛化能力。

## 正文

Agentic navigation systems require a base navigation model whose observation strategy can be externally reconfigured at inference time, because instruction following, object search, target tracking, and autonomous driving share the same perception-planning backbone yet demand fundamentally different strategies for consuming the visual stream. We present Qwen-RobotNav, a scalable navigation model built on Qwen-RobotNav that addresses it through a parameterised interface with two complementary dimensions: multiple task modes that select the navigation behaviour, and controllable observation parameters (e.g., token budget, per-camera weights) that govern how visual history is encoded. With training-time randomization over all parameters, Qwen-RobotNav is robust to any inference-time configuration requiring zero architectural modification to the Qwen-RobotNav backbone. We train Qwen-RobotNav on 15.6M samples; co-training with vision-language data prevents the collapse into reactive action-sequence mappers observed in trajectory-only training. The parameterised interface also makes Qwen-RobotNav a natural building block for agentic systems: for long-horizon scenarios, an upper-level planner decomposes goals into sub-tasks and dynamically switches Qwen-RobotNav's task mode and context strategy mid-episode, composing complex behaviours from repeated calls to the same model. Extensive experiments show that Qwen-RobotNav sets new state-of-the-art results across major navigation benchmarks. The model exhibits favourable scaling from 2B to 8B parameters, with joint multi-task training developing a shared spatial-planning substrate that transfers across task families, and demonstrates strong zero-shot generalisation to real-world robots across diverse environments.
