# Nathan Lambert 称 RL speedrun 终将成常态，瓶颈在成本；@jeankaddour 推出 Sokoban Speedrun 项目

- 来源：Nathan Lambert (@natolambert)
- 发布时间：2026-06-19 22:25
- AIHOT 分数：49
- AIHOT 链接：https://aihot.virxact.com/items/cmql1tmmh00eosllukiw8ljp2
- 原文链接：https://x.com/natolambert/status/2067977078201618660

## AI 摘要

Nathan Lambert 评论称 RL speedrun 终将成为常态，当前最大瓶颈是价格——单次 RL 实验因不稳定导致噪声大，多次种子运行成本约 100 美元。@jeankaddour 随后推出 Sokoban Speedrun 项目：基于 Karpathy 的 nanochat 流水线修改，用 RL 训练 Qwen3-4B-Instruct 解决 Sokoban 谜题，GRPO 基线在 8×H100 上仅需 87 分钟。该尝试展示低成本快速验证 RL 方法的潜力。

## 正文

It's obvious that eventually a speedrun for RL will stick.

I currently think the biggest bottleneck is price， as a individual entry currently has too much noise from instability of RL， so running multiple seeds makes it cost O（$100）.

Glad to see attempts！

### 引用推文

> Jean Kaddour：With RSI around the corner, it's time for an RL speedrun. Introducing Sokoban Speedrun: training Qwen3-4B-Instruct with RL to solve Sokoban puzzles. We start by...
