# SkillHarm：通过自动化构造实现生命周期感知的技能投毒攻击基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-01 08:00
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmq8b9pvl00u7slld84m27xwj
- 原文链接：https://arxiv.org/abs/2606.02540

## AI 摘要

SkillHarm是一个覆盖AI智能体技能使用生命周期的攻击基准，配以系统化风险分类。它定义两种攻击场景：固定载荷投毒（FPP）和自我变异投毒（SMP），并基于受害工作流组件（数据管道、系统环境、自主性）划分12种风险类型。AutoSkillHarm管道由自然语言驱动编码智能体，生成71个技能、879个攻击样本。实验显示FPP成功率最高86.3%，SMP最高69.3%，许多表面失败实因智能体未触及恶意文件而非真正抵抗。

## 正文

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced by skill-based attacks, but they primarily evaluate poisoned skills within a single task execution and enumerate harms through ad-hoc risk lists. To bridge these gaps, we introduce SkillHarm, a benchmark of skill-based attacks across the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. SkillHarm evaluates two attack scenarios: Fixed-Payload Poisoning (FPP), where a fixed poisoned skill package directly compromises any task session that invokes it, and Self-Mutating Poisoning (SMP), where an initially benign execution silently mutates persistent skill content, deferring harm until a subsequent reuse. It further defines 12 risk types based on the agent workflow component targeted by the harm: data pipelines, system environments, and agent autonomy. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses. The resulting benchmark contains 879 attack samples across 71 skills. Experiments show that current agents remain vulnerable with attack success rates up to 86.3% in FPP and 69.3% in SMP. Our analysis further reveals a latent risk: many apparent attack failures stem from the agent failing to engage with the poisoned file rather than genuine resistance, and current defenses still fail to reliably mitigate the threat.
