# 将通信与策略解耦：带宽约束下的鲁棒多智能体强化学习

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-20 08:00
- AIHOT 分数：50
- AIHOT 链接：https://aihot.virxact.com/items/cmpmjlrbh0p99sl015d54dvvv
- 原文链接：https://arxiv.org/abs/2605.21085

## AI 摘要

多智能体强化学习（MARL）中，通信对于协调至关重要，但常受带宽限制。现有架构常将通信与策略共享潜在表示，导致缩减通信尺寸会直接限制策略容量，造成性能下降。为此，我们提出两项贡献：一是引入归一化带宽预算β，将稀疏度、轮次和消息维度统一为一个可比约束；二是提供最小架构SLIM，它将通信路径与策略的潜在表示解耦，从而隔离带宽与策略容量的影响，并支持步内通信。在多个需要通信的部分可观测基准测试中，该方法取得了最先进的性能，在带宽受限时表现出可扩展性与鲁棒性，性能下降边际。

## 正文

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce β, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.