# CogniRoute：全模态社交推理的模式引导MoE框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-18 08:00
- AIHOT 分数：50
- AIHOT 链接：https://aihot.virxact.com/items/cmqzq994b005cslki4spxekrj
- 原文链接：https://arxiv.org/abs/2606.20970

## AI 摘要

CogniRoute 是一个基于模式引导的混合专家（MoE）框架，专用于全模态社交推理。它在训练时通过认知模式分解跨模态关系、推理需求和时序范围，并在监督微调中对齐全局路由签名；还引入路由感知强化学习，联合优化 token 生成与专家分配。在 OmniSocialBench（含118K结构化训练示例的诊断性社交视频问答数据集）上，CogniRoute 平均准确率达59.38%，比最强专有基线高15.33个百分点，比最强开源全模态基线高26.77个百分点，在视听协调、冲突解决和时序社交推理上提升最大。

## 正文

Omni-modal models can ingest video, audio, and text, but unified access to multiple modalities does not guarantee that a model uses the right evidence. This gap is especially pronounced in social video question answering, where the answer may hinge on a gesture, vocal tone, temporal cue, or mismatch between what is said and what is visually expressed. We introduce CogniRoute, a schema-guided Mixture-of-Experts framework for social omni reasoning. CogniRoute uses a training-only cognitive schema that factorizes each example by cross-modal relation, reasoning demand, and temporal scope, and aligns global routing signatures with this structure during supervised fine-tuning. We further introduce route-aware reinforcement learning, which jointly optimizes token generation and expert allocation using rewards for answer correctness, modality-consistent reasoning, and cognitive temporal grounding. To support training and evaluation, we construct OmniSocialBench, a diagnostic social video QA resource with 118K structured training examples, grounded reasoning traces, schema labels, temporal evidence spans, and a manually verified evaluation split. CogniRoute achieves 59.38\% average accuracy on OmniSocialBench, improving over the strongest proprietary baseline by 15.33 percentage points and the strongest open-source omni baseline by 26.77 points, with the largest gains on questions requiring audio-visual coordination, conflict resolution, and temporally grounded social inference.
