# 从提示词注入到持久控制：防御智能体框架中的木马后门

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-29 08:00
- AIHOT 分数：49
- AIHOT 链接：https://aihot.virxact.com/items/cmpulkg6204vbslagdwa2mn3h
- 原文链接：https://arxiv.org/abs/2605.31042

## AI 摘要

在本地智能体框架中，LLM智能体通过读写文件与复用状态增强了能力，但也面临多步木马攻击风险。攻击者可在文件或工具输出中嵌入提示词注入，智能体可能读取并执行这些隐藏指令。现有防御因检查步骤孤立，难以检测早期植入的后门。ClawTrojan基准测试在GPT-5.4模拟环境中实现了95.5%的攻击成功率。为此提出的DASGuard方案，通过扫描敏感文件中的控制文本、追溯其来源并移除非可信内容，实现了动态防御。

## 正文

LLM agents are evolving from conversational chatbots to operational tools in real-world workspaces. In local agentic harnesses, an LLM can read and write files, call tools, and reuse workspace state across sessions. While such capabilities enhance utility, they also expose a new attack surface for attackers. Attackers can embed a prompt injection within a file or tool output. Agents may read this hidden instruction, store it, and execute it later. In this multi-step trojan attack paradigm, no individual step appears malicious on its own, but these steps can collectively turn untrusted text into persistent control content. However, existing defenses often inspect each step in isolation. As a result, they can block a clear harmful action, but fail to detect the earlier write operation that plants the backdoor. To reveal this threat, we introduce ClawTrojan, a benchmark designed to identify multi-step trojan attacks in local agentic harnesses. In an OpenClaw-style simulated workspace with GPT-5.4, ClawTrojan reaches a 95.5% attack success rate (ASR), while existing single-turn prompt-injection attacks produce near-zero ASR on the same model. To address this threat, we propose DASGuard, which scans control-like text in sensitive local files, traces its origin, and removes control content that does not originate from a trusted source. Our results show that DASGuard achieves strong dynamic defense by combining runtime attack blocking with sanitized commits to the workspace.
