# AI智能体老化基准AgingBench发布

- 来源：elvis (@omarsar0)
- 发布时间：2026-05-28 01:35
- AIHOT 分数：57
- AIHOT 链接：https://aihot.virxact.com/items/cmpodfqmb0553slv4w9po5g1v
- 原文链接：https://x.com/omarsar0/status/2059689897523642510

## AI 摘要

这项研究提出了AgingBench，一个用于纵向评估AI智能体可靠性的基准。它将智能体老化归纳为四种机制，包括压缩老化和干扰老化，旨在衡量部署后的智能体是退化以及退化形式。研究指出，即使冻结模型权重，智能体的有效状态也会因压缩交互历史、检索记忆库、事实更新等操作而不断变化，其可靠性是整个运行系统的寿命属性，而非基础模型的快照。基准测试在智能体部署第一天进行，然后持续数月。

## 正文

// Your Agents are Aging Too //

Huh！？ They need "sleep，" and now they are aging？

Joke aside， great write-up on reliable agentic engineering.

This new research introduces AgingBench， a longitudinal reliability benchmark. It organizes agent aging into four mechanisms， including compression aging and interference aging， and measures not just whether deployed agents degrade but what form the degradation takes and where repair should target.

We benchmark agents on day one and then deploy them for months. That gap hides a basic systems question. How long does an agent stay reliable after deployment？

Even with frozen model weights， an agent's effective state keeps shifting. It compresses interaction history， retrieves from a growing memory store， revises facts after updates， and goes through routine maintenance. Reliability becomes a lifespan property of the full harness， not a snapshot of the base model.

Paper： https://arxiv.org/abs/2605.26302

Learn to build effective AI agents in our academy： https://academy.dair.ai/
