Robotic data is insanely expensive and brutal to collect. It’s the only thing holding back general-purpose robots right now. Figure CEO @adcock_brett : "If we could get a pile of data in the helix stack, we would solve general robotics right now."

译机器人数据收集成本极高且过程艰难，这是当前阻碍通用机器人发展的唯一瓶颈。Figure CEO Brett Adcock表示，只要为helix stack提供足够数据，现在就能解决通用机器人问题。

Rohan Paul@rohanpaul_ai · 4月13日

India is quietly becoming a training floor for humanoid robots, with workers filming thousands of first-person hand tasks so AI systems can learn grasping, folding, sorting, and tool use. This story is really about how the humanoid robot boom still depends on cheap, repetitive human labor to teach machines basic physical skill. The problem is that robots do not fail on big plans first; they fail on tiny physical details like grip angle, finger timing, slip correction, and object contact. That kind of knowledge is hard to code and expensive to collect. These labs capture that missing layer by putting cameras or sensors on people and recording ordinary actions as machine-readable motion examples. The useful part is not the towel or box itself but the sequence: where the hand starts, how force changes, when fingers adjust, and how the body recovers from small mistakes. That gives robotics teams supervised data for models that map visual input to physical actions, which is much easier than hand-coding every movement rule. This is a story about how physical intelligence gets extracted before it gets automated. --- quasa. io/media/the-hidden-hand-farms-of-india-fueling-the-ai-robot-revolution-with-human-motion

译印度工人佩戴头戴摄像头采集第一人称手部动作数据，用于训练人形机器人掌握抓取、折叠等物理技能。这揭示了当前机器人热潮仍依赖廉价人类劳动获取 embodied data。与语言模型不同，机器人需从人类动作中学习握持角度、力度调整等微观物理细节。该模式将人类劳动双重商品化：既是生产工作，又成为训练AI的数据基础设施。在具身数据采集成本降低前，机器人行业将持续依赖工人劳动作为"物理智能"的廉价来源。

Rohan Paul@rohanpaul_ai · 4月12日

Reasoning tokens in LLMs are not equal. Models seem to know which parts of their own reasoning matter most. What survives pruning is usually the part doing actual computational work, not the fluent narration wrapped around it. The method is clever in a plain way. Start with a full chain of thought, delete one token at a time, and keep deleting whichever removal hurts the model’s likelihood least. The resulting order becomes a functional ranking, not of what sounds important to us, but of what the model itself seems to need. Here’s the interesting part. If a model’s reasoning were just verbose decoration, pruning should look mostly random once you preserve the answer. Instead, the paper finds structure. Symbolic math tokens survive pruning far more than grammar, narration, and referential bookkeeping, which means the model is not treating all tokens as equally useful. That matters because the test is behavioral, not rhetorical. Students trained on these greedily pruned chains do better than students trained on several other pruning baselines, including a method supervised by a frontier model, at the same reasoning length. So the pruning signal is not merely interpretable. It is useful. The deeper point is that importance is dynamic. A token that looks expendable early can become important later as surrounding context disappears, which argues against the comforting idea that reasoning has a fixed salience map you can read off once and reuse forever. And yet the signal is not inaccessible. The paper shows attention patterns alone can predict pruning scores surprisingly well, suggesting that functional importance is partly visible in the model’s internals before you do the expensive deletion game. So this is less about making chain-of-thought shorter than about making it legible. The claim is not that pruned tokens are causally irrelevant in any philosophical sense. The cleaner claim is better: LLMs appear to encode a workable internal ranking of which reasoning tokens are carrying the load.

译研究通过贪婪剪枝方法（逐个删除对模型似然度影响最小的token）评估LLM推理token的功能重要性。发现符号数学token比语法叙述更能经受剪枝，表明模型内部存在重要性排序。重要性具有动态性，早期可丢弃的token可能在上下文减少后变得关键。注意力模式可预测剪枝分数，说明功能重要性在模型内部可见。该发现有助于使chain-of-thought更可解释，而非仅仅缩短长度。

Nathan Lambert@natolambert · 4月12日

Great stuff happening as we start to build out the codebases for my RLHF book (sorry, I haven't had much time until now!). Very accessible to issues, emails, comments etc to make it better. I'm also going to need another dgx spark.

译开始为 RLHF 书籍搭建代码库，欢迎通过 issues、邮件和评论等方式提交反馈以完善内容。作者提到还需要再购置一台 DGX Spark。

Rohan Paul@rohanpaul_ai · 4月11日

WSJ: OpenAI forecasting $121B in AI research hardware costs for 2028 by itself. Even though sales are expected to double that year, losses would still hit $85B, and the company does not expect overall profits until the 2030s. Anthropic’s outlook shows the same steep upward path in training costs, just at a lower scale, and both companies seem to reach early profitability only if compute spending is ignored. --- wsj .com/tech/ai/openai-anthropic-ipo-finances-04b3cfb9

译据《华尔街日报》报道，OpenAI预测2028年仅AI研究硬件成本就将高达1210亿美元。尽管当年销售额预计翻倍，公司仍将面临850亿美元亏损，预计直到2030年代才能实现整体盈利。竞争对手Anthropic也面临类似的训练成本激增轨迹，只是规模较小。分析指出，两家公司只有在忽略计算支出的情况下才能实现早期盈利，反映出巨额基础设施投入正成为AI行业的普遍挑战。

宝玉@dotey · 4月11日

推荐学习👍 这属于自己“蒸馏”自己了😂

译推荐阅读 @Khazix0918 关于模型自我蒸馏的技术文章，作者调侃这种训练方式属于"自己蒸馏自己"。

AK@_akhaliq · 4月11日

Rethinking Generalization in Reasoning SFT A Conditional Analysis on Optimization, Data, and Model Capability paper: https://huggingface.co/papers/2604.06628

译从优化过程、数据构成与模型能力三个条件维度，对推理 SFT 的泛化性展开分析，重新审视监督微调在推理任务中的泛化机制与关键影响因素。

Hao AI Lab@haoailab · 4月10日

(1/5) FP4 hardware is here, but 4-bit attention still kills model quality, blocking true end-to-end FP4 serving. To fix that, we propose Attn-QAT, the first systematic study of quantization-aware training for attention. The result: FP4 attention quality is comparable to BF16 attention with 1.1x–1.5x higher throughput than SageAttention3 on an RTX 5090 and 1.39x speedup over FlashAttention-4 on a B200. Blog: https://haoailab.com/blogs/attn-qat/ Code: https://github.com/hao-ai-lab/FastVideo/pull/1225 Checkpoints: https://huggingface.co/FastVideo/14B_qat_400

译FP4硬件虽已普及，但4-bit attention长期存在质量瓶颈，阻碍端到端FP4部署。研究团队提出Attn-QAT，首次系统研究attention机制的量化感知训练。该方法使FP4 attention质量达到BF16水平，同时在RTX 5090上实现比SageAttention3高1.1-1.5倍的吞吐量，在B200上较FlashAttention-4提速1.39倍。

Nathan Lambert@natolambert · 4月10日

My book, Reinforcement Learning from Human Feedback, is wrapping up and going into final production (copyediting, making pretty, formatting, etc.). Shipping to you in 1-2 months! It's a wonderful project to create a foundation of knowledge for the research communities that I love and operate in. It’s the book I wish I had when starting on my LLM journey about 3 years ago. The book’s deepest cut is on core reinforcement learning methods, intuitons, and implementations for LLMs. These don’t live in isolation, and it’s presented in the broader context of post-training methods and unsolved problems in RLHF. A nice balance of depth and breadth. I’m always asked about the title, and I am staying firm that this is THE book documenting the organization of the field of RLHF. Any other topic is too dynamic, where writing a book today would be immediately outdated. RLHF is largely being overshadowed by lots of other developments in AI, but will always be around and at the forefront of human-AI interactions. The topic deserves coverage in depth and this platform. Thank you for all your support. More projects related to the book being announced soon 🎥 I'm excited to reconnect with the community through in-person book events this summer and fall.

译作者宣布《Reinforcement Learning from Human Feedback》已完成写作，进入最终制作阶段，预计1-2个月内出版。该书聚焦LLM的核心强化学习方法、直觉与实现，同时涵盖后训练技术及RLHF领域的未解决问题。作者强调，这是记录RLHF领域组织的权威著作，尽管该方向常被AI其他进展掩盖，但其在人机交互中的核心地位使其值得深入探讨，而非追逐易过时的动态话题。

AK@_akhaliq · 4月10日

FP4 Explore, BF16 Train Diffusion Reinforcement Learning via Efficient Rollout Scaling paper: https://huggingface.co/papers/2604.06916

译新论文提出扩散强化学习方法，在Rollout探索阶段使用FP4低精度采样，训练阶段采用BF16精度，通过混合精度策略平衡计算效率与训练稳定性，实现高效扩展。

AK@_akhaliq · 4月10日

MARS Enabling Autoregressive Models Multi-Token Generation paper: https://huggingface.co/papers/2604.07023

译MARS 新方法支持自回归模型每步生成多个 Token，打破传统逐 Token 解码的效率限制，相关论文已公开。

AK@_akhaliq · 4月10日

Embarrassingly Simple Self-Distillation Improves Code Generation paper: https://huggingface.co/papers/2604.01193

译「简单到令人尴尬」的自蒸馏方法无需复杂架构或额外数据，即可有效提升大模型代码生成能力，效果优于现有复杂方案。相关论文已发布在 Hugging Face Papers。

Peter Steinberger 🦞@steipete · 4月9日

I'm working on character evals and noticed that Claude would constantly pick itself as #1, so I removed the model names from the judge and changed things.

译做角色评估时发现 Claude 总把自己排第一，于是移除评判中的模型名称并调整设置，避免模型自我偏好影响结果。

Epoch AI@EpochAIResearch · 4月9日

New essay by @ansonwhho: Chinese and open model AI labs have ≈10× less compute than the frontier. But they can distill frontier models, replicate innovations fast, and have enormous talent. Is that enough to compete at the frontier? 🧵

译中国及开源 AI 实验室算力约为前沿的 1/10，但具备蒸馏前沿模型、快速复制技术创新及庞大人才储备等优势。@ansonwhho 探讨这些条件能否弥补算力差距，支撑其在最前沿 AI 领域保持竞争力。

Yuchen Jin@Yuchenj_UW · 4月9日

Meta released Avocado, they call it Muse Spark. It's not open source (a bit sad). Meta TBD lab rebuilt the entire pretraining stack in 9 months and reached similar capability with >10x less compute than Llama 4 Maverick. I still think infra is the real moat in AI labs. You can train models much faster with a good infra, and it allows researchers to experiment with many more ideas much more quickly.

译Meta TBD 实验室发布 Avocado（内部代号 Muse Spark），未开源。团队仅用 9 个月重建预训练技术栈，以不到 Llama 4 Maverick 十分之一的算力达到相近能力。作者认为，基础设施才是 AI 实验室的真正护城河，决定模型训练速度和实验迭代效率。

Nathan Lambert@natolambert · 4月8日

Nothing will beat REINFORCE REward Increment = Nonnegative Factor x Offset Reinforcement x Characteristic Eligibility Great RL trivia I found when writing my book

译REINFORCE 算法名称实为反向缩写，全称为「REward Increment = Nonnegative Factor × Offset Reinforcement × Characteristic Eligibility」。这是作者撰写书籍时发现的强化学习趣味冷知识，并借机吐槽了 AI 领域另一极为牵强的反向缩写 BIRD。

Haider.@haider1 · 4月8日

some important takeaways from anthropic's "mythos" model: 1) the model is extremely strong across benchmarks, so scaling has not hit a wall 2) but better scaling also brings much higher training and inference costs, and its setup was strong partly because it's expensive

译Anthropic "Mythos" 模型在基准测试中表现极强，证明模型扩展（scaling）尚未触及天花板；但更强性能伴随极高训练与推理成本，其出色表现很大程度上源于昂贵的配置投入。

François Chollet@fchollet · 4月8日

Join the ARC Prize team -- help us build ARC-AGI-4 and ARC-AGI-5

译加入 ARC Prize 团队——帮助我们构建 ARC-AGI-4 和 ARC-AGI-5

AK@_akhaliq · 4月7日

MinerU2.5-Pro Pushing the Limits of Data-Centric Document Parsing at Scale paper: https://huggingface.co/papers/2604.04771

译MinerU2.5-Pro 发布，专注于突破大规模数据驱动文档解析的技术极限。相关论文已上传至 Hugging Face。

François Chollet@fchollet · 4月7日

One thing about DL researchers that has always been surprising to me, is that a lot of them have never been exposed to forms of learning other than fitting the parameters of a curve via gradient descent, and are even unable to conceive that there might exist other options

译关于深度学习研究者有一点一直令我惊讶，那就是他们中的许多人从未接触过除通过梯度下降拟合曲线参数之外的学习形式，甚至无法想象可能存在其他选择。

AK@_akhaliq · 4月7日

Test-Time Scaling Makes Overtraining Compute-Optimal paper: https://huggingface.co/papers/2604.01411

译新论文提出，Test-Time Scaling（测试时扩展）可让 Overtraining（过度训练）实现 Compute-Optimal。传统 Chinchilla 最优假设训练与推理计算固定，而该研究表明，若允许推理阶段增加计算，过度训练模型在总成本下反而性能更优。

Deedy@deedydas · 4月7日

Meta Harnesses is Autoresearch on steroids. Something I've been exploring recently is to get long running agents to hill climb on a verifiable task to continuously improve without my intervention. Karpathy's Autoresearch did this pretty well on specific tasks, but this weekend I tried Meta Harnesses which moves one level of abstraction up. What does Meta Harness do? Autoresearch can be used in harness like Claude Code / Codex to generate experiments to try, evaluate results, and continue looping. Meta Harness generates a harness itself that optimizes on a task or a set of task. Here, we define a harness as "a single-file Python program that modifies task-specific prompting, retrieval, memory, and orchestration logic". The idea is that LLMs are very powerful today, but to harness [pun intended] their power, you need to give it the right prompts and context. Meta Harnesses automates coming up with the right prompts and the right way to retrieve context to solve a problem. Where did this idea come from? This is from a paper from Stanford and the author of DSPy written last week. The paper shows fantastic performance on 3 tasks: text classification, math reasoning (IMO level problems) and coding (Terminal Bench 2.0), far outperforming traditional harnesses. The discovered harnesses are interesting: math for example, splits up the logic into different categories (Combinatorics, Geometry, Number Theory, Algebra) and prompts and looks at the context differently. The coding harness, amongst other things, pre-processes the tools available in the environment to save exploratory turns. When should you use and not use it? Meta Harnesses seem pretty useful for tackling a specific but wide set of problems where the result is verifiable. In contrast, when I tried it on a specific task like Chess, it arbitrarily divides the problem into separate tasks - opening, mid game, end game, and creates different approaches for each. This "works" but isn't really clean because we believe there should be one approach that does all three. It does far better on things like examinations (JEE, Gaokao) where it splits problems into categories and tackles each category with different strategies. This paper covers a pretty light version of what a harness means. In the future, we can split up tasks into harnesses that have access to specific kinds of data, specific toolchains and various models to get even better results. Overall, pretty cool applied AI approach to hillclimb a verifiable task in a specific domain with variety within the problem space.

译Meta Harnesses是由斯坦福与DSPy作者提出的自动化框架生成技术，通过自动生成单文件Python程序（harness）来优化特定任务的提示词、检索与编排逻辑，实现无需人工干预的持续迭代。相比Autoresearch，其抽象层级更高，适用于结果可验证的特定领域任务（如数学推理、编程），能自动将问题分类并制定差异化策略，但在需要统一方法论的任务上存在局限。

François Chollet@fchollet · 4月6日

Science went from the initial observation of radioactivity to a working atom bomb over 47 years via only about 9 distinct key experiments -- extremely few data points -- and symbolic models concise enough they would fit on a single page. This is what extreme generalization looks like, and it powered entirely by symbolic compression. Turn a handful of data points (deliberately collected) into a tractable plan to completely reshape reality, by reverse-engineering the causal symbolic rules behind the data.

译推文以原子弹研发为例，阐述极端泛化的本质：科学仅用47年、约9个关键实验便实现从放射性观察到核武器的突破。这种进步不依赖大数据，而源于符号压缩——将少量刻意收集的数据点提炼为单页纸可承载的因果符号规则。核心观点在于，通过逆向推导数据背后的因果逻辑，人类能够将极简信息转化为重塑现实的完整方案，展现符号推理在突破认知边界中的决定性作用。

François Chollet@fchollet · 4月6日

Tutorial on fine tuning Gemma on TPU v5 using Kinetic + Keras + JAX. Easiest stack to fully leverage TPUs at scale.

译关于使用 Kinetic + Keras + JAX 在 TPU v5 上微调 Gemma 的教程。

François Chollet@fchollet · 4月4日

Good tutorial on using Keras Kinetic to fine-tune LLMs on the Keras + JAX + TPU stack!

译关于在 Keras + JAX + TPU 技术栈上使用 Keras Kinetic 微调 LLM 的好教程！

Nathan Lambert@natolambert · 4月4日

People are too obsessed with benchmarks for open models. The core determining factor of success often is: 1. Immediate & long term tooling support. 2. Finetunability Tbh Gemma has struggled here in the past. Qwen has excelled at it. It's where the winners are crowned.

译开源模型成功的核心并非基准分数，而是即时且长期的工具支持与可微调性。Gemma 过去在这些方面表现挣扎，而 Qwen 则表现出色，这才是决定模型成败的关键因素。

François Chollet@fchollet · 4月4日

Perhaps the craziest thing that was introduced on the Keras community call today: Keras Kinetic, a new library that lets you run jobs on cloud TPU/GPU via a simple decorator -- like Modal but with TPU support. When you call a decorated function, Kinetic handles the entire remote execution pipeline: - Packages your function, local code, and data dependencies - Builds a container with your dependencies via Cloud Build (cached after first build) - Runs the job on a GKE cluster with the requested accelerator (TPU or GPU) - Returns the result to your local machine (logs are streamed in real time, and the function's return value is delivered back as if it ran locally)

译Keras 社区发布 Kinetic 库，开发者通过装饰器即可将函数部署至云端 TPU/GPU 运行，定位类似 Modal 但新增 TPU 支持。该工具自动完成代码打包、Cloud Build 容器构建（支持缓存）、GKE 集群调度及结果返回，实现日志实时流式传输，使远程执行体验如同本地运行。

karminski-牙医@karminski3 · 3月30日

不是的哈, 并不是让大模型模拟数据库, 而是让大模型从0写代码实现一个高性能向量数据库, 主要考验大模型对体系结构, 数据库, 索引性能调优, Agent 等各项编程方面的能力. 还在剪视频, 一会我放出详细测评. 可以看评测框架repo，开源的：https://github.com/KCORES/vector-db-bench

译开发者澄清该测试并非让大模型模拟数据库，而是要求其从零编写代码实现高性能向量数据库，重点考验体系结构、数据库、索引性能调优及 Agent 等编程能力。评测框架 vector-db-bench 已开源，详细测评视频即将发布。

Deedy@deedydas · 3月30日

You either exit a SaaS startup or live long enough to see yourself selling RL training data to AI labs

译SaaS 初创公司难逃两种结局：要么被收购或上市成功退出，要么在生存压力下转型，开始向 AI 实验室出售 RL 训练数据。这句调侃揭示了 AI 时代传统软件公司的尴尬处境。

Anthropic@AnthropicAI · 3月25日

New from the Anthropic Economic Index: how people’s use of Claude changes with experience. Longer-term users are more likely to iterate carefully with Claude, and less likely to hand it full autonomy. They attempt higher-value tasks, and receive more successful responses.

译Anthropic Economic Index 显示，随着使用经验增长，用户更倾向于与 Claude 仔细迭代而非赋予完全自主权，同时尝试更高价值任务并获得更成功回复。

Jim Fan@DrJimFan · 3月24日

Teleop is so 2025. Ever since we unveiled EgoScale and the dexterity scaling law, it's been clear to us and the ecosystem that behavior cloning directly from humans is the way to break the curse of teleop. 2026 is all about scaling robot learning without robots.

译EgoVerse 生态系统正式发布：由4家研究实验室与3家行业伙伴共建，基于1300+小时第一人称人类视频数据，覆盖240个场景与2000+项任务。无需真实机器人即可规模化训练，通过行为克隆直接从人类数据学习，2026年机器人学习将彻底告别遥操作模式。

Epoch AI@EpochAIResearch · 3月24日

How do AI companies allocate their R&D compute? @datagenproc and @cherylwoooo estimate that across OpenAI, MiniMax, and http://Z.ai, less than 30% of R&D compute spending goes to final training runs.

译@datagenproc 与 @cherylwoooo 估算，OpenAI、MiniMax 和 Z.ai 的研发算力支出中，仅有不到 30% 用于最终训练运行，其余大部分消耗在实验、迭代与架构搜索等环节。

Deedy@deedydas · 3月23日

Be careful what you post anonymously. New research shows AI can find who you are solely from your posts. It's rare to see ~500x research improvements, but they went from mapping <0.1% to 54% of HackerNews profiles to their LinkedIn. It's so over, u/throwaway4927.

译新研究实现 AI 去匿名化技术约 500 倍提升：通过文本将 HackerNews 用户匹配到 LinkedIn 身份的成功率从不到 0.1% 跃升至 54%。匿名小号（如 u/throwaway4927）面临暴露风险。

Lilian Weng@lilianweng · 3月11日

Building technologies for better human-AI collaboration on next gen hardware at scale. Exciting.

译构建技术以在下一代大规模硬件上实现更好的人机协作。令人兴奋。

Noam Brown@polynoamial · 3月11日

The recipe behind today’s frontier reasoning models is surprisingly similar to AlphaGo: 1) Imitate large amounts of human data 2) Scale inference compute to reason better (back then it was Monte Carlo Tree Search, today it's Chain of Thought) 3) Use RL to go beyond imitation

译当今前沿推理模型的训练路径与 AlphaGo 高度一致：先模仿大量人类数据，再扩展推理计算（从蒙特卡洛树搜索到思维链），最后用强化学习突破模仿上限。Demis Hassabis 称，十年前 AlphaGo 的"第37步"预示 AI 可攻克真实科学难题，这些思路对构建 AGI 仍至关重要。

Saining Xie@sainingxie · 3月5日

another scientific exploration from @TongPetersb, @DavidJFan, and @__JohnNguyen__ that might teach you something new, even if you’re in a frontier lab lots of interesting observations here, but I’ll highlight just one: - it’s kind of an open industry secret that trying to scale DiTs with MoE has mostly been fruitless. - the unexpected, yet intuitive, synergy between RAE and MoE might actually change that.

译来自 @TongPetersb、@DavidJFan 和 @__JohnNguyen__ 的又一项科学探索，即使你身处前沿实验室，也可能会让你学到新东西这里有很多有趣的观察，但我只强调一点： - 尝试用 MoE 扩展 DiTs 大多徒劳无功，这算是行业公开的秘密。 - 但 RAE 与 MoE 之间意外却直观的协同作用，可能真的会改变这一点。 [引用 @TongPetersb]：超越语言训练。我们押注视觉世界，将其作为与语言建模并行且超越它的关键下一步。因此，我们研究了从零开始用视觉构建基础模型。我们分享我们的探索：视觉表征、数据、世界建模、架构和扩展行为！[1/9]

Jim Fan@DrJimFan · 2月26日

We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:

译研究团队提出EgoScale方法，基于20,000小时第一人称人类视频预训练GR00T N1.5，仅用4小时机器人数据即可掌握组装模型车、操作注射器等高灵巧度任务，性能较从头训练提升54%。研究发现人类视频量与动作预测损失呈对数线性缩放关系（R²=0.998）。该方法利用22-DoF手部与人类的运动学相似性，无需复杂迁移算法即可重定向动作。策略可跨硬件迁移至Unitree G1（7-DoF），性能提升30%以上，且仅需单个示教即可学习新任务。

Saining Xie@sainingxie · 1月24日

love this teaser lol (and it is real) academia boxed us in sooo tightly that we nearly broke, but we clawed our way out and found a whole new universe on the other side😅 thank you to Google for supporting the gpu-poor rebels and pulling us into this ride, helping us build what I believe is one of the best tpu/gcp infrastructure teams outside of google

译喜欢这段预告片哈哈（而且是真的）学术界把我们限制得太紧了，差点崩溃，但我们挣扎着爬了出来，在另一边发现了一个全新的宇宙😅 感谢 Google 支持我们这些缺 GPU 的叛逆者，带我们踏上这段旅程，帮助我们建立了我认为是 Google 之外最好的 TPU/GCP 基础设施团队之一 [引用 @TongPetersb]：我们已经在学术界用 TPU 训练两年了（非常感谢 Google TRC！）。像 Cambrian-1、Cambrian-S、RAE 和 Scale-RAE 这样的工作没有 TPU 是不可能的。我们写了一篇博客文章分享我们的经验、优化和教训：https://cambrian-mllm.github.io/blog/tpu-training-experiments.html 我们希望这能帮助更多人更顺畅地使用 TPU，它们非常强大！

Saining Xie@sainingxie · 12月16日

new paper: iREPA diffusion models are a renderer of their underlying representations. with this new setup, we can gain much clearer insight into what those representations are really about. Jas took on a spontaneous quest, and over the past three months we have learned so much ps. this is also our little experiment in a new kind of online water cooler effect that I loved seeing. let’s argue, discuss, and then turn it into proper science with real effort

译新论文：iREPA 扩散模型是其底层表征的渲染器。通过这种新设置，我们能更清楚地洞察这些表征的真正含义。Jas 开始了一场自发的探索，过去三个月我们学到了很多 ps. 这也是我们对一种新型线上"饮水机效应"的小实验，我很喜欢看到这种现象。让我们争论、讨论，然后用真正的努力将其转化为正经科学 [引用 @1jaskiratsingh]：‼️ 表征对生成很重要！但事实证明，我们对表征如何帮助生成的理解一直都是错的 ‼️ 我们之前的想法：（我们错了） ❌ 更大的视觉编码器 → 更好的表征 → 更好的生成 ❌ 更好的全局语义 → 更好的表征 → 更好的生成结果发现： 🤯 在表征对齐方面，小 20 倍以上的视觉编码器可以达到与更大模型相似或更好的性能 🤯 线性探测准确率约 20%（全局语义的衡量指标）的视觉编码器可以胜过准确率 >80% 的编码器 🤯 即使是 SiFT 和 HoG 这类经典特征也能带来与现代大得多的视觉编码器相媲美的提升 ‼️ 🚨 介绍：什么对表征对齐重要？全局信息还是空间结构 🚨 TL;DR： ✅ 更好的全局语义信息 ≠ 更好的生成 ✅ 空间结构（而非全局语义）驱动表征的生成性能 ✅ 我们提出 iREPA：仅需 3 行代码，强调空间结构迁移，并在 REPA、REPA-E、Meanflow、JiT 等方法上持续提高收敛速度在 @AdobeResearch 的激动人心的项目，与 @xingjian_leng、@zongze_wu、@LiangZheng_06、@rzhang88、@elishechtman 和 @sainingxie 合作 🙏 对我来说这也是一次特别有趣且独特的经历，在项目的每一步我们都在证明自己的偏见是错误的 😆 还要大力感谢 @YouJiacheng、@ShumingHu 和 @gallabytes，他们在 X 上的评论开启了这一方向的探索 🫡 论文：https://arxiv.org/abs/2512.10794 代码：https://github.com/End2End-Diffusion/iREPA 项目页面：https://end2end-diffusion.github.io/irepa 更多细节见线程：[1/n] 🧵

Ilya Sutskever@ilyasut · 11月28日

One point I made that didn’t come across: - Scaling the current thing will keep leading to improvements. In particular, it won’t stall. - But something important will continue to be missing.

译我之前说的一点没被传达清楚： - 继续扩展当前的技术会持续带来进步。特别是，它不会停滞。 - 但某些重要的东西仍会继续缺失。 [引用 @haider1]：以下是今天 ilya sutskever 播客的要点： - 5-20 年内实现超级智能 - 当前的扩展将严重停滞；我们回到了真正的研究 - 超级智能 = 超快速的持续学习者，而非完成的预言机 - 模型的泛化能力比人类差 100 倍，这是最大的 AGI 阻碍 - 需要全新的 ML 范式（我有想法，现在不能分享） - AI 影响将很剧烈，但只在经济扩散之后 - 历史上的突破几乎不需要算力 - SSI 有足够的专注研究算力来获胜 - 当前的 RL 已经比预训练消耗更多算力