Qwen-RobotNav技术报告:为智能体导航系统设计的可扩展导航模型
阅读原文· arxiv.orgQwen-RobotNav通过参数化接口支持多种任务模式(如指令跟随、目标搜索)和可控观察参数(模型token预算、每摄像头权重),训练时随机化所有参数,推理时无需修改骨干架构。模型在15.6M样本上训练,与视觉语言数据共同训练避免仅轨迹训练导致的崩溃。上层规划器可在回合中动态切换其任务模式与上下文策略,通过重复调用同一模型组合复杂行为。Qwen-RobotNav在多项导航基准上取得新最优结果,参数从2B扩展到8B表现出良好缩放规律,多任务联合训练形成跨任务家族共享的空间规划基板,并在真实机器人上展现强零样本泛化能力。
Agentic navigation systems require a base navigation model whose observation strategy can be externally reconfigured at inference time, because instruction following, object search, target tracking, and autonomous driving share the same perception-planning backbone yet demand fundamentally different strategies for consuming the visual stream. We present Qwen-RobotNav, a scalable navigation model built on Qwen-RobotNav that addresses it through a parameterised interface with two complementary dimensions: multiple task modes that select the navigation behaviour, and controllable observation parameters (e.g., token budget, per-camera weights) that govern how visual history is encoded. With training-time randomization over all parameters, Qwen-RobotNav is robust to any inference-time configuration requiring zero architectural modification to the Qwen-RobotNav backbone. We train Qwen-RobotNav on 15.6M samples; co-training with vision-language data prevents the collapse into reactive action-sequence mappers observed in trajectory-only training. The parameterised interface also makes Qwen-RobotNav a natural building block for agentic systems: for long-horizon scenarios, an upper-level planner decomposes goals into sub-tasks and dynamically switches Qwen-RobotNav's task mode and context strategy mid-episode, composing complex behaviours from repeated calls to the same model. Extensive experiments show that Qwen-RobotNav sets new state-of-the-art results across major navigation benchmarks. The model exhibits favourable scaling from 2B to 8B parameters, with joint multi-task training developing a shared spatial-planning substrate that transfers across task families, and demonstrates strong zero-shot generalisation to real-world robots across diverse environments.