# ClawHub Security Signals： VirusTotal、静态分析与SkillSpector的分歧

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-01 07:20
- AIHOT 分数：54
- AIHOT 链接：https://aihot.virxact.com/items/cmpxn3gho05icslckqlkyjeo4
- 原文链接：https://arxiv.org/abs/2606.01494

## AI 摘要

ClawHub Security Signals数据集包含67,453个公开OpenClaw Agent技能版本，用于研究三个安全扫描器（VirusTotal、静态启发式分析与NVIDIA SkillSpector）的检测分歧。研究发现，三者极少标记相同技能：任意两者的正例重叠率最高仅10.4%，仅0.69%的技能被全部三者标记，81.9%的被标记技能仅被单个扫描器识别。NVIDIA SkillSpector主要在25,504个可疑样本中发出75.3%的警报，而VirusTotal则在206个恶意样本中标识出72.8%。结果表明，Agent技能安全需要分层治理，而非依赖单一扫描器的允许或阻止决策。该数据集作为包含自动裁决标签的银标准版本发布。

## 正文

Agent skills extend AI agents with reusable instructions, tools, scripts, references, and workflows, establishing a security boundary distinct from both model safety and traditional package-malware detection. ClawHub Security Signals is a sanitized dataset of 67,453 latest public OpenClaw skill versions. Each row pairs redacted SKILL.md content and sanitized bundled files where present with a final ClawScan registry verdict and evidence from three scanner families: VirusTotal, static heuristic analysis, and NVIDIA SkillSpector. Rather than estimating malicious-skill prevalence, we study scanner disagreement. The three scanners rarely flag the same skills: any pair overlaps on at most 10.4% of their combined positives, only 0.69% of skills are flagged by all three, and 81.9% of flagged skills are identified by a single scanner. The disagreement is structured by attack surface. SkillSpector, which raises semantic agentic-risk advisories rather than malware-reputation signals, is positive for 19,209 of 25,504 suspicious rows (75.3%) but only 14 of 206 malicious rows (6.8%). The malicious-verdict region shows the inverse profile: 150 of 206 malicious rows (72.8%) are VirusTotal-positive, consistent with bundled-code malware evidence. These results show that agent-skill security requires layered governance, not single-scanner allow/block decisions. The corpus is released as a sanitized silver-standard dataset: labels are the registry's automated verdicts, not human-annotated ground truth, and the release represents an early, versioned snapshot intended to support the community while a human-annotated subset is developed. Further research is encouraged, including models tailored for skill-security triage.
