# BEHAVIOR挑战启动：机器人学迎来ImageNet时刻

- 来源：Jim Fan (@DrJimFan)
- 发布时间：2025-09-13 22:51
- AIHOT 链接：https://aihot.virxact.com/items/cmo22oj0z010mslbaj85gv3hi
- 原文链接：https://x.com/DrJimFan/status/1966877464598094334

## AI 摘要

推文指出计算机视觉（ImageNet）和自然语言处理（MMLU、HLE、SWEBench）已建立标准化基准体系，而机器人学仍缺乏统一评估标准，存在硬件、任务定义、评分体系混乱的问题。由ImageNet创造者开发的BEHAVIOR项目基于Isaac Sim物理引擎，旨在建立可复现的机器人学统一基准。该项目已启动首届NeurIPS 2025挑战赛，期望成为推动领域进步的标志性信号。

## 正文

There was something deeply satisfying about ImageNet. It had a well curated training set. A clearly defined testing protocol. A competition that rallied the best researchers. And a leaderboard that spawned ResNets and ViTs， and ultimately changed the field for good.

Then NLP followed. No matter how much OpenAI， Anthropic， and xAI disagree， they at least agree on one thing： benchmarking. MMLU， HLE， SWEBench - you can't make progress until you are able to measure it.

Robotics still doesn't have such a rallying call. No one agrees on anything： hardware， task， scoring， simulation engine， or real world environment. Everyone is SOTA， by definition， on the benchmark they define on the fly for each paper.

From the maker of ImageNet - BEHAVIOR takes a stab at the daunting challenge of unifying robotics benchmarking on a reproducible physics engine （Isaac Sim）. The project started before I graduated from Stanford Vision Lab， and took so many years of dedication and PhD careers to build. I hope BEHAVIOR is either the hill-climbing signal we need， or the spark that finally gets us talking about how to measure real progress as a field.

### 引用推文

> Fei-Fei Li：(1/N) How close are we to enabling robots to solve the long-horizon, complex tasks that matter in everyday life? 🚨 We are thrilled to invite you to join the 1s...