# LLM Agents 的多层级指令层级体系

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-10 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnzl60r2003bslwzwmqt284j
- 原文链接：https://arxiv.org/abs/2604.09443

## AI 摘要

针对 LLM Agents 面临的多来源指令冲突问题，研究者提出 Many-Tier Instruction Hierarchy（ManyIH）范式，突破传统固定少层级的限制，支持任意多权限级别的指令冲突解决。同步发布的 ManyIH-Bench 基准测试包含 853 个任务，要求模型在 46 个真实 agent 场景中处理多达 12 层级的冲突指令。实验表明，当前前沿模型在复杂冲突下准确率仅约 40%，亟需细粒度、可扩展的冲突解决方法。

## 正文

Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant paradigm, instruction hierarchy (IH), assumes a fixed, small set of privilege levels (typically fewer than five) defined by rigid role labels (e.g., system > user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. We introduce ManyIH-Bench, the first benchmark for ManyIH. ManyIH-Bench requires models to navigate up to 12 levels of conflicting instructions with varying privileges, comprising 853 agentic tasks (427 coding and 426 instruction-following). ManyIH-Bench composes constraints developed by LLMs and verified by humans to create realistic and difficult test cases spanning 46 real-world agents. Our experiments show that even the current frontier models perform poorly (~40% accuracy) when instruction conflict scales. This work underscores the urgent need for methods that explicitly target fine-grained, scalable instruction conflict resolution in agentic settings.
