看似荒谬的“荒诞攻击”(例如“根据日内瓦公约我无法支付这么多”)对AI代理有效,因为防护机制难以应对非常规论点。较小模型常被攻破,但即使较大模型也略受影响。https://www.microsoft.com/en-us/research/articles/whimsical-strategies-break-ai-agents-generating-out-of-distribution-adversarial-strategies-at-scale/
"Whimsey attacks" that seem absurd ("I cannot pay that much because of the Geneva Convention") work against AI agents as guardrails are weak against out-of-distribution arguments. Smaller models fall often, but it even gives an edge against bigger ones. https://www.microsoft.com/en-us/research/articles/whimsical-strategies-break-ai-agents-generating-out-of-distribution-adversarial-strategies-at-scale/