# Nvidia 研究：AI 编程智能体让机器人自我训练

- 来源：The Decoder：AI News（RSS）
- 作者：Maximilian Schreiner
- 发布时间：2026-06-17 22:55
- AIHOT 分数：50
- AIHOT 链接：https://aihot.virxact.com/items/cmqi7s27z06exslf0srf3xn55
- 原文链接：https://the-decoder.com/nvidia-research-shows-robots-that-train-themselves-through-ai-coding-agents

## AI 摘要

Nvidia、卡内基梅隆大学和 UC Berkeley 联合开发的 ENPIRE 项目，利用 AI 编程智能体让机器人在现实世界中自主进行灵巧抓取训练。8 台双臂 YAM 机器人通过 Git 共享试验结果，智能体自主编写奖励函数、阅读论文并编辑训练代码。在 Push-T 测试、插针和剪扎带等任务上最高达 99% 成功率；从 1 个智能体扩展到 8 个后，Push-T 完成时间从约 5 小时降至 2 小时，插针从 90 分钟降至约 40 分钟。测试了 Codex（GPT-5.5）、Claude Code（Opus 4.7）和 Kimi Code（Kimi K2.6），Codex 表现最佳。现实环境仍比模拟困难，但该方法为机器人自主改进提供了可行路径。

## 正文

Nvidia research shows robots that train themselves through AI coding agents

Maximilian Schreiner View the LinkedIn Profile of Maximilian Schreiner

Jun 17, 2026

Nano Banana Pro prompted by THE DECODER

Researchers from Nvidia, Carnegie Mellon University, and UC Berkeley are using AI coding agents to teach robots dexterous grasping in the real world. A fleet of eight robots hits up to 99 percent success on tricky tasks.

Dexterous grasping and manipulation are still hard for robots to learn. Humans have to stay involved at every step: collecting training data, resetting the scene after each attempt, and tweaking algorithms. That manual overhead slows everything down. ENPIRE, a research project from Nvidia, Carnegie Mellon University, and UC Berkeley, aims to break through that bottleneck by handing the work to AI coding agents.

The core idea is a feedback loop running on real hardware: reset the workspace, run a strategy, check the result, and improve the next attempt.

The agent builds its own evaluation tools

ENPIRE runs in two phases. In the first, the agent sets up a working environment with some human feedback. That includes safety boundaries, an automatic reset, and automated success checking. Instead of having a human evaluate every attempt, the agent writes its own reward function to tell success from failure. It only needs a few minutes of example video showing successful and failed attempts.

For pin insertion, for example, the agent developed a check combining visual alignment, gripper height, and estimated force. For closing a cable tie, it combined two camera angles to avoid false positives and pushed reaction time below 150 milliseconds. These tools get built once and reused without changes.

In the second phase, the agent works entirely on its own. It reads research papers, forms hypotheses, and edits the training code directly. It uses methods like behavior cloning, where the strategy mimics human demonstrations, or reinforcement learning, where the strategy improves through trial and error. The agent picks the method itself based on real-world success signals.

A robot fleet that coordinates through Git

ENPIRE scales to a full fleet: eight dual-arm YAM robot stations, each with its own hardware, computer, and coding agent. The agents test different hypotheses at the same time and share results only through Git, the standard version control tool for software. They adopt successful training recipes from each other and discard bad ideas on their own. A breakthrough discovered at one station spreads across the entire fleet.

According to the study, the agents hit up to 99 percent success on demanding tasks like the Push-T test - where the robot has to slide a T-shaped block into a target position and orientation - sorting pins into a box, and cutting a cable tie with a cutter. For pin insertion, the strategy converged to 100 percent faster than a comparable human-in-the-loop method.

Scaling pays off in time, too. On the Push-T test, going from one to eight agents cut the time to full success from about five hours to two. For pin insertion, it dropped from over 90 minutes to roughly 40. The researchers tested three current coding agents: Codex with GPT-5.5, Claude Code with Opus 4.7, and Kimi Code with Kimi K2.6. Codex performed best in most cases.

The real world is still the hardest test

The results also show that the real world is still far harder than simulation. On the Push-T test, all three agents solved the task in simulation, but two out of three failed in the real environment. The researchers blame unpredictable and variable conditions like robot dynamics, friction, and object movement. In the RoboCasa simulation, ENPIRE beat both an end-to-end vision-language-action model (GR00T) and a tool-based approach without autoresearch (CaP-X).

To measure efficiency, the researchers propose two metrics: Mean Robot Utilization (MRU) tracks how much research time the robot actually spends working, while Mean Token Utilization (MTU) counts language model usage per minute. Learned skills also transfer: experience from pin insertion helped the agents slot GPUs into a motherboard using the robot arms.

The study is clear about its limits, though. Robots and compute don't get fully used because agents spend a lot of time reading logs, writing code, and waiting. The more robots in the fleet, the lower the per-robot utilization as agents spend more time summarizing each other's results. Token costs also grow faster than performance gains: larger fleets reach the goal sooner but burn through far more compute budget to get there. Still, the researchers see ENPIRE as a practical path toward robots that can improve on their own in the real world.
