# 异构智能体稠密潜在通信：See What I See， Know What I Think

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-11 08:00
- AIHOT 分数：36
- AIHOT 链接：https://aihot.virxact.com/items/cmqbalwd5012mslruftwbzdyt
- 原文链接：https://arxiv.org/abs/2606.13594

## AI 摘要

多智能体系统通常依赖文本通信，解码-重编码代价高且信息有损。KV-cache通信是低开销替代方案，但现有方法多限于同构模型。本文提出稠密对齐方法，通过轻量级跨模型缓存变换和两阶段训练（重构→生成）实现异构智能体间KV-cache直接传输。在Qwen3-4B、8B、14B三个模型组成的六个方向和六个基准上，上下文感知设置中性能匹配或超越文本通信，计算量降低2–3倍；上下文无关传输中仍有效，而先前方法完全失效。

## 正文

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.
