# Send a SCOUT First：提示注入防御中自适应检测器分配的事前推理

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-29 08:00
- AIHOT 分数：45
- AIHOT 链接：https://aihot.virxact.com/items/cmq76imgj008xsl5w75vurmtu
- 原文链接：https://arxiv.org/abs/2605.30837

## AI 摘要

SCOUT框架通过预测每个检测器对样本的可靠性与延迟，动态决定每个请求运行哪些检测器以及是否升级到GPT-4o judge，避免固定单检测器管线的盲区。在SCOUT-450基准上，安全导向操作点相比始终启用GPT-4o judge使攻击成功率降低46%、总时间降低40%，良性效用仅下降5.1点。该框架还能迁移到BIPIA、IPI和IHEval三个外部基准，改善安全-效用边界。

## 正文

Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector's blind spots. We reframe defense as detector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our framework SCOUT (Scalable and Controllable Outcome-prediction for Uncertainty-aware Triage) makes this decision dynamic by predicting each detector's per-sample reliability and latency from how it behaved on similar past inputs, and exposes a single safety-utility threshold to the operator (where utility bundles benign-pass rate and wall-clock). To evaluate this setting, we build SCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. On SCOUT-450, a safety-oriented operating point reduces attack-success rate by 46% and total wall-clock by 40% relative to an always-on GPT-4o judge, at a 5.1-point benign-utility drop. SCOUT also transfers to three external benchmarks (BIPIA, IPI, and IHEval), improving the safety-utility frontier.
