阿里巴巴的研究论文表明,AI正从发现漏洞转向实际生成可利用的攻击代码。其提出的VulnSage框架采用多智能体协作工作流,将过程分解为数据流提取、自然语言约束重写、候选攻击生成及沙箱验证与反思等步骤。该系统的关键突破在于将代码理解转化为对代码使用方式的推理,从而能在更复杂、现实的软件上成功生成漏洞利用。评估显示,其在SecBench.js上的成功率比传统工具高34.64%,并在真实软件包中发现146个零日漏洞,印证了谷歌CEO关于前沿模型可能颠覆软件安全的警告。
Alibaba's published a paper giving a strong example of what Sundar Pichai is warning about.
Shows AI is moving beyond bug finding and into actually proving software is exploitable.
This paper asks a simple question with hard consequences: can LLMs confirm software vulnerabilities by actually building working exploits?
The authors' answer is yes, but only when the model stops acting like a single genius and starts acting like a team.
That sounds minor until you look at the mechanism.
Automated exploit generation usually fails for familiar reasons. Fuzzers miss deep paths. Symbolic execution chokes on messy real code, especially when the right input is not just a value but a carefully assembled object, class instance, or string with the right structure.