# ReasoningBank：让智能体从经验中学习

- 来源：Google Research：Blog（网页）
- 发布时间：2026-04-21 00:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo8v08it0765slml989qxjw1
- 原文链接：https://research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience

## AI 摘要

Google Cloud提出ReasoningBank智能体记忆框架，突破传统方法仅记录动作轨迹或成功经验的局限，从成败经验中提炼可泛化的高层推理策略。该框架通过LLM-as-judge自评估构建检索-提取-整合闭环，特别利用失败案例生成预防性策略，并创新提出Memory-aware Test-Time Scaling技术将测试时计算转化为高质量记忆。在网页浏览与软件工程基准测试中，该系统显著提升任务成功率并减少执行步骤。

## 正文

ReasoningBank: Enabling agents to learn from experience Jump to Content

Research

Research Who we are

Back to Who we are menu * * Defining the technology of today and tomorrow. ## Philosophy

We strive to create an environment conducive to many different types of research across many different time scales and levels of risk.

Learn more about our Philosophy Learn more Philosophy ## People

Our researchers drive advancements in computer science through both fundamental and applied research.

Learn more about our People Learn more People Research areas

Back to Research areas menu * * Research areas Explore all research areas

Research areas

Back to Research areas menu * * Explore all research areas Foundational ML & Algorithms Algorithms & Theory Data Management Data Mining & Modeling Information Retrieval & the Web Machine Intelligence Machine Perception Machine Translation Natural Language Processing Speech Processing

Foundational ML & Algorithms

Back to Foundational ML & Algorithms menu * * Algorithms & Theory Data Management Data Mining & Modeling Information Retrieval & the Web Machine Intelligence Machine Perception Machine Translation Natural Language Processing Speech Processing Computing Systems & Quantum AI Distributed Systems & Parallel Computing Hardware & Architecture Mobile Systems Networking Quantum Computing Robotics Security, Privacy, & Abuse Prevention Software Engineering Software Systems

Computing Systems & Quantum AI

Back to Computing Systems & Quantum AI menu * * Distributed Systems & Parallel Computing Hardware & Architecture Mobile Systems Networking Quantum Computing Robotics Security, Privacy, & Abuse Prevention Software Engineering Software Systems Science, AI & Society Climate & Sustainability Economics & Electronic Commerce Education Innovation General Science Health & Bioscience Human-Computer Interaction and Visualization Responsible AI

Science, AI & Society

Back to Science, AI & Society menu * * Climate & Sustainability Economics & Electronic Commerce Education Innovation General Science Health & Bioscience Human-Computer Interaction and Visualization Responsible AI Our work

Back to Our work menu * * ## Projects

We regularly open-source projects with the broader research community and apply our developments to Google products.

Learn more about our Projects Learn more Projects ## Publications

Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.

Learn more about our Publications Learn more Publications ## Resources

We make products, tools, and datasets available to everyone with the goal of building a more collaborative ecosystem.

Learn more about our Resources Learn more Resources Programs & events

Back to Programs & events menu * * Shaping the future, together.

Collaborate with us ## Student programs

Supporting the next generation of researchers through a wide range of programming.

Learn more about our Student programs Learn more Student programs ## Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Learn more about our Faculty programs Learn more Faculty programs ## Conferences & events

Connecting with the broader research community through events is essential for creating progress in every aspect of our work.

Learn more about our Conferences & events Learn more Conferences & events

Collaborate with us Careers Blog

Search

1. Home 2. Blog ReasoningBank: Enabling agents to learn from experience

April 21, 2026

Jun Yan and Chen-Yu Lee, Research Scientists, Google Cloud

ReasoningBank is a novel agent memory framework that uses successful and failed experiences to distill generalizable reasoning strategies, enabling an agent to continuously learn from experience after deployment. Quick links Paper ReasoningBank code Share [](https://twitter.com/intent/tweet?text=https%3A//research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/ "Share on Twitter") [](https://www.facebook.com/sharer/sharer.php?u=https%3A//research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/ "Share on Facebook") [](https://www.linkedin.com/shareArticle?url=https%3A//research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/&mini=true "Share on LinkedIn") [](mailto:name@example.com?subject=Check%20out%20this%20site&body=Check%20out%20https%3A//research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/ "Send via Email") Copy link ×

Agents are becoming increasingly crucial in tackling complex real-world tasks, ranging from general web navigation to assisting with extensive software engineering codebases. However, as these agents transition into persistent, long-running roles in the real world, they face a critical limitation: they struggle to analyze and learn from successful and failed experiences after deployment.

Agents approaching each new task without a memory mechanism will repeatedly make the same strategic errors and discard valuable insights. To address this, various forms of agent memory have been introduced to store information about past interactions for reuse. However, existing methods generally focus on saving exhaustive records of every action taken — such as the trajectory memory used in Synapse — or only documenting workflows summarized from successful attempts, as seen in Agent Workflow Memory). These approaches have two fundamental drawbacks: first, by recording detailed actions instead of tactical foresight, they fail to distill higher-level, transferable reasoning patterns; second, by over-emphasizing successful experiences, they miss out on a primary source of learning — their own failures.

To bridge this gap, in our ICLR paper, "ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory", we introduce a novel agent memory framework (github) that distills useful insights from both successful and failed experiences for test-time self-evolution. When evaluated on web browsing and software engineering benchmarks, ReasoningBank enhances both agent effectiveness (higher success rates) and efficiency (fewer task steps) compared to baseline approaches.

_Memory content comparison: existing strategies and ReasoningBank._ Distilling insights with ReasoningBank

ReasoningBank distills global reasoning patterns into high-level, structured memories. Each structured memory item contains the following: _Title_: A concise identifier summarizing the core strategy. _Description_: A brief summary of the memory item. _Content_: The distilled reasoning steps, decision rationales, or operational insights extracted from past experiences.

The memory workflow operates in a continuous, closed loop of retrieval, extraction, and consolidation. Before taking action, the agent draws upon the ReasoningBank to gather relevant memories into its context. It then interacts with the environment and uses an LLM-as-a-judge to self-assess the resulting trajectory and extracts success insights or failure reflection. Notably, this self-judgement does not need to be perfectly accurate, as we find ReasoningBank to be quite robust against judgment noise. During extraction, the agent distills workflows and generalizable insights from the trajectory into new memories. For simplicity, we directly append these to the ReasoningBank, leaving more sophisticated consolidation strategies for future work.

Crucially, unlike existing workflow memory strategies that only focus on successful runs, ReasoningBank actively analyzes failed experiences to source counterfactual signals and pitfalls. By distilling these mistakes into preventative lessons, ReasoningBank builds powerful strategic guardrails. For example, instead of merely learning a procedural rule like "click the 'Load More' button”, the agent might learn from a past failure to "always verify the current page identifier first to avoid infinite scroll traps before attempting to load more results”.

_Workflow of ReasoningBank integrated with an agent during test time._ Memory-aware test-time scaling (MaTTS)

Test-time scaling (TTS) — scaling compute at inference time — has shown immense effectiveness in reasoning domains like math and competitive programming. However, in agentic environments, existing TTS methods often discard the exploration trajectory and treat the final answer as the only useful outcome. This overlooked exploration is actually a rich data source that could accelerate an agent's ability to learn from experience over time.

We bridge this gap by explicitly linking memory with scaling through memory-aware test-time scaling (MaTTS). By using ReasoningBank as a powerful experience learner, MaTTS distills extensive exploration into high-quality memories via contrastive and refinement signals. We demonstrate the power of MaTTS functions through two distinct forms of scaling: _Parallel scaling_: The agent generates multiple distinct trajectories for the same query under the guidance of memory. Through self-contrast, ReasoningBank compares successful and spuriously reasoned trajectories to distill more robust strategies and synthesize higher-quality memories. _Sequential scaling_: The agent iteratively refines reasoning within a single trajectory to produce strong intermediate rationale. ReasoningBank captures these intermediate insights on the agent's trial-and-errors and progressive improvement as high-quality memory items.

MaTTS establishes a strong synergy: high-quality memory from ReasoningBank steers the scaled exploration towards more promising strategies, and in return, the scaled interactions generate significantly richer learning signals that feed back into an even smarter ReasoningBank to help the agent.

_Comparison of memory-aware test-time scaling (MaTTS) with ReasoningBank._ Performance & emergent capabilities

We evaluated ReasoningBank across challenging benchmarks covering dynamic environments. Using the ReAct prompting strategy as the foundation for all agents, we compared ReasoningBank against three memory configurations: a memory-free baseline (Vanilla ReAct), Synapse (Trajectory Memory) and AWM (Workflow Memory). From our main evaluation results with Gemini-2.5-Flash on WebArena and SWE-Bench-Verified, we have the following key observations: _Superior success rates_: ReasoningBank without scaling outperformed memory-free agents by 8.3% on WebArena and 4.6% on SWE-Bench-Verified. _Efficiency gains_: Because the agent actively accesses past decision rationales, it executes commands with vastly reduced aimless exploration. On SWE-Bench-Verified, ReasoningBank saved almost 3 total execution steps per task over memory-free baselines. _MaTTS synergy_: When adding MaTTS (parallel scaling with a scaling factor k=5), success rates are further boosted. ReasoningBank w/ MaTTS improves over ReasoningBank by a 3% success rate increase and 0.4 fewer steps on WebArena.

_Performance comparison (task success rates and average steps per task) of different agent memory strategies on WebArena and SWE-Bench-Verified._

Importantly, during evaluation, we observed the emergence of strategic maturity. In a web-browsing example, the agent's initial curated rules resembled simple procedural checklists (e.g., "Look for page links"). As the agent persisted through more problem sets, these memories were incorporated during execution. Building upon existing knowledge, the agent distilled new trajectories into more advanced memories. Over time, simple checklists evolved into memories with compositional, preventative logic structures (e.g., "Cross-reference tasks continuously with active page filters to ensure retrieved datasets aren't paginated prematurely"). See the paper for more details. Conclusion

ReasoningBank provides a powerful framework for enabling LLMs to learn from experiences and evolve into continuous learners during test-time. We believe memory-driven experience scaling…
