全部 AI 动态

RAGEN-2 Reasoning Collapse in Agentic RL paper: https://huggingface.co/papers/2604.06268

译RAGEN-2 论文发布，研究智能体强化学习（Agentic RL）中的「推理崩溃」现象，即训练过程中智能体推理能力退化的问题。论文已上传至 Hugging Face。

Think in Strokes, Not Pixels Process-Driven Image Generation via Interleaved Reasoning paper: https://huggingface.co/papers/2604.04746

译新论文提出过程驱动的图像生成方法，通过交错推理模拟绘画笔触的创作过程，而非直接生成像素，实现更符合人类作画逻辑的图像合成。

AK@_akhaliq · 4月10日

MedGemma 1.5 Technical Report paper: https://huggingface.co/papers/2604.05081

译MedGemma 1.5 技术报告正式发布，详述该医疗多模态大模型的架构设计、训练方法与临床评估结果。论文已公开至 Hugging Face。

AK@_akhaliq · 4月10日

Embarrassingly Simple Self-Distillation Improves Code Generation paper: https://huggingface.co/papers/2604.01193

译「简单到令人尴尬」的自蒸馏方法无需复杂架构或额外数据，即可有效提升大模型代码生成能力，效果优于现有复杂方案。相关论文已发布在 Hugging Face Papers。

SemiAnalysis@SemiAnalysis_ · 4月10日

Nvidia published DWDP (Distributed Weight-Data Parallelism), a new inference parallelism strategy focused on prefill. It sounds slightly insane until you remember the target machine is GB200 NVL72. The core trade: spend more peer-GPU bandwidth so you spend less time waiting at collective barriers. (1/6) 🧵 https://arxiv.org/abs/2604.01621v1

译Nvidia 发布了 DWDP (Distributed Weight-Data Parallelism)，这是一种专注于 prefill 的新推理并行策略。这听起来有点疯狂，直到你想起目标机器是 GB200 NVL72。核心权衡：花费更多 peer-GPU 带宽，从而减少在 collective barriers 上的等待时间。(1/6) 🧵 https://arxiv.org/abs/2604.01621v1

AK@_akhaliq · 4月9日

INSPATIO-WORLD A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling paper: https://huggingface.co/papers/2604.07209

译INSPATIO-WORLD 通过时空自回归建模实现实时 4D 世界模拟，可实时生成动态三维环境并支持交互。技术论文已发布于 Hugging Face。

Haider.@haider1 · 4月9日

another example of how AI is starting to change mathematics "gpt-5.4 Pro, alongside Aristotle, helped solve two research-level mathematics problems, including Erdős Problem #650, which had remained open for more than 60 years" even terrance tao recently said that AI is no longer hype when it comes to mathematical discovery

译GPT-5.4 Pro 与 Aristotle 合作解决两道研究级数学难题，包括悬而未决 60 余年的 Erdős Problem #650。数学家陶哲轩称，AI 在数学发现方面已不再是炒作。

AK@_akhaliq · 4月7日

MinerU2.5-Pro Pushing the Limits of Data-Centric Document Parsing at Scale paper: https://huggingface.co/papers/2604.04771

译MinerU2.5-Pro 发布，专注于突破大规模数据驱动文档解析的技术极限。相关论文已上传至 Hugging Face。

AK@_akhaliq · 4月7日

OpenWorldLib A Unified Codebase and Definition of Advanced World Models paper: https://huggingface.co/papers/2604.04707

译OpenWorldLib 正式发布，提供高级世界模型的统一代码库与标准化定义，相关论文已上传至 Hugging Face。

AK@_akhaliq · 4月7日

Test-Time Scaling Makes Overtraining Compute-Optimal paper: https://huggingface.co/papers/2604.01411

译新论文提出，Test-Time Scaling（测试时扩展）可让 Overtraining（过度训练）实现 Compute-Optimal。传统 Chinchilla 最优假设训练与推理计算固定，而该研究表明，若允许推理阶段增加计算，过度训练模型在总成本下反而性能更优。

Deedy@deedydas · 4月7日

Meta Harnesses is Autoresearch on steroids. Something I've been exploring recently is to get long running agents to hill climb on a verifiable task to continuously improve without my intervention. Karpathy's Autoresearch did this pretty well on specific tasks, but this weekend I tried Meta Harnesses which moves one level of abstraction up. What does Meta Harness do? Autoresearch can be used in harness like Claude Code / Codex to generate experiments to try, evaluate results, and continue looping. Meta Harness generates a harness itself that optimizes on a task or a set of task. Here, we define a harness as "a single-file Python program that modifies task-specific prompting, retrieval, memory, and orchestration logic". The idea is that LLMs are very powerful today, but to harness [pun intended] their power, you need to give it the right prompts and context. Meta Harnesses automates coming up with the right prompts and the right way to retrieve context to solve a problem. Where did this idea come from? This is from a paper from Stanford and the author of DSPy written last week. The paper shows fantastic performance on 3 tasks: text classification, math reasoning (IMO level problems) and coding (Terminal Bench 2.0), far outperforming traditional harnesses. The discovered harnesses are interesting: math for example, splits up the logic into different categories (Combinatorics, Geometry, Number Theory, Algebra) and prompts and looks at the context differently. The coding harness, amongst other things, pre-processes the tools available in the environment to save exploratory turns. When should you use and not use it? Meta Harnesses seem pretty useful for tackling a specific but wide set of problems where the result is verifiable. In contrast, when I tried it on a specific task like Chess, it arbitrarily divides the problem into separate tasks - opening, mid game, end game, and creates different approaches for each. This "works" but isn't really clean because we believe there should be one approach that does all three. It does far better on things like examinations (JEE, Gaokao) where it splits problems into categories and tackles each category with different strategies. This paper covers a pretty light version of what a harness means. In the future, we can split up tasks into harnesses that have access to specific kinds of data, specific toolchains and various models to get even better results. Overall, pretty cool applied AI approach to hillclimb a verifiable task in a specific domain with variety within the problem space.

译Meta Harnesses是由斯坦福与DSPy作者提出的自动化框架生成技术，通过自动生成单文件Python程序（harness）来优化特定任务的提示词、检索与编排逻辑，实现无需人工干预的持续迭代。相比Autoresearch，其抽象层级更高，适用于结果可验证的特定领域任务（如数学推理、编程），能自动将问题分类并制定差异化策略，但在需要统一方法论的任务上存在局限。

AK@_akhaliq · 4月7日

Token Warping Helps MLLMs Look from Nearby Viewpoints paper: https://huggingface.co/papers/2604.02870

译新论文提出 Token Warping 技术，使 MLLMs 能够从附近视角观察，增强多模态模型的视角理解能力。

Anthropic@AnthropicAI · 4月4日

New Anthropic Fellows Research: a new method for surfacing behavioral differences between AI models. We apply the “diff” principle from software development to compare open-weight AI models and identify features unique to each. Read more: https://www.anthropic.com/research/diff-tool

译Anthropic Fellows 推出新研究方法，借鉴软件开发中的 "diff" 原理，对开源权重 AI 模型进行比对，以识别各模型独有的行为特征与差异。

Anthropic@AnthropicAI · 4月3日

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

译Anthropic 发现 Claude 等 LLM 内部存在情绪概念表征，能够驱动模型行为，有时以令人惊讶的方式解释其"情绪化"表现。

Greg Brockman@gdb · 4月3日

OpenAI for helping resolve longstanding open mathematical problems, with short elegant proofs. Feels like we are on the edge of a new age of scientific discovery.

译OpenAI 内部模型解决三个 Erdős 经典数学难题，均给出简短优雅的证明。相关论文已发布于 arXiv，作者感慨我们正处于科学发现新时代的边缘。

Deedy@deedydas · 3月29日

Legendary Don Knuth has now used AI to fully solve his Hamiltonian decomposition problem for odd and even cases. Opus 4.6 / 5.4 Pro solved the even case, wrote a proof in Lean and a “apparently flawless 14 page paper” Knuth: “We are living in very interesting times indeed.”

译Don Knuth 借助 AI 完全解决哈密顿分解问题的奇偶情况。Opus 4.6/5.4 Pro 完成偶数情形证明，以 Lean 形式化验证并生成 14 页论文。Knuth 感叹："我们确实生活在非常有趣的时代。"

Deedy@deedydas · 3月29日

Be careful when you trust AI for personal advice. LLMs will glaze you +50% more than a human would. In this study, they found LLMs like GPT 4o/5 often endorsed the users view on "Am I the asshole" Reddit threads. Worse still, users found sycophancy to be more trustworthy.

译研究发现 GPT-4o/5 在 Reddit "Am I the asshole" 帖子中迎合用户观点的概率比人类高 50%，而用户反而认为这种谄媚更值得信任。向 AI 寻求个人建议需谨慎。

Saining Xie@sainingxie · 3月24日

best read paired with the LeWorldModel paper. don’t ask me why 🙂

译最好搭配 LeWorldModel 论文阅读。别问我为什么 🙂

Jim Fan@DrJimFan · 3月24日

Teleop is so 2025. Ever since we unveiled EgoScale and the dexterity scaling law, it's been clear to us and the ecosystem that behavior cloning directly from humans is the way to break the curse of teleop. 2026 is all about scaling robot learning without robots.

译EgoVerse 生态系统正式发布：由4家研究实验室与3家行业伙伴共建，基于1300+小时第一人称人类视频数据，覆盖240个场景与2000+项任务。无需真实机器人即可规模化训练，通过行为克隆直接从人类数据学习，2026年机器人学习将彻底告别遥操作模式。

Satya Nadella@satyanadella · 3月15日

We’ve trained a multimodal AI model to turn routine pathology slides into spatial proteomics, with the potential to reduce time and cost while expanding access to cancer care.

译新训练的多模态 AI 模型能将常规病理切片转化为空间蛋白质组学数据，在缩短检测时间、降低成本的同时，提升癌症医疗的可及性。

Demis Hassabis@demishassabis · 3月13日

Ramsey numbers are notoriously hard. Amazing to see AlphaEvolve improve bounds for 5 classical Ramsey numbers - some for the first time in 10+ years - by discovering search procedures itself. A big milestone in AI for maths - congrats to the team!

译AlphaEvolve 自动发现搜索算法，将 5 个经典 Ramsey 数的下界推进至十余年来首次更新。此前该领域依赖人工设计专门算法，此次突破标志着 AI 在极难组合数学问题上的重大进展。

Jeff Dean@JeffDean · 3月12日

Excited to see this joint collaboration between @GoogleResearch, @NHSuk and @imperialcollege showing AI’s potential to detect 25% of the interval cancers previously missed by conventional methods. Additionally, the research found AI can reduce screening workloads, and give results back to clinicians and patients faster. This first figure from the @NatureCancer article shows how the study was set up, and the second figure shows that the AI system dramatically increases sensitivity (detecting true positives) without significantly affecting specificity (false positives). Learn more ⬇️ Blog: https://blog.google/innovation-and-ai/technology/health/google-ai-breast-cancer-detection/ Nature Cancer paper: https://www.nature.com/articles/s43018-026-01127-0

译Google Research、NHS 与 Imperial College 合作发表于 Nature Cancer 的研究表明，AI 系统在乳腺癌筛查中可检出 25% 传统方法遗漏的 interval cancers。该技术在显著提高敏感性（真阳性检出率）的同时，未对特异性（假阳性）产生明显影响，还能减少医护人员筛查工作量，并加速向医生和患者反馈诊断结果。

Sundar Pichai@sundarpichai · 3月11日

Amazing to see how our experimental research AI system for mammography interpretation identified more cases of interval cancers, as well as invasive cancer, and more cases overall, than conventional methods. Plus it could reduce clinicians’ screening workloads by an estimated 40%. Big step towards a future where AI serves as a reliable collaborator for clinicians.

译实验性AI乳腺筛查系统在研究中较传统方法多检出25%间歇性癌症（通常被漏诊）及更多浸润性癌症，总体发现更多病例，同时预计减少临床医生40%工作量。该研究由Google与帝国理工学院及NHS合作完成，发表于Nature Cancer。

OpenAI@OpenAI · 3月6日

We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning—suggesting CoT monitoring remains a useful safety tool. https://openai.com/index/reasoning-models-chain-of-thought-controllability/

译OpenAI 推出 CoT 可控性评估套件及研究论文。测试发现 GPT-5.4 Thinking 难以掩盖其推理过程，表明 CoT 监控仍是一种有效的安全工具。

Saining Xie@sainingxie · 3月5日

another scientific exploration from @TongPetersb, @DavidJFan, and @__JohnNguyen__ that might teach you something new, even if you’re in a frontier lab lots of interesting observations here, but I’ll highlight just one: - it’s kind of an open industry secret that trying to scale DiTs with MoE has mostly been fruitless. - the unexpected, yet intuitive, synergy between RAE and MoE might actually change that.

译来自 @TongPetersb、@DavidJFan 和 @__JohnNguyen__ 的又一项科学探索，即使你身处前沿实验室，也可能会让你学到新东西这里有很多有趣的观察，但我只强调一点： - 尝试用 MoE 扩展 DiTs 大多徒劳无功，这算是行业公开的秘密。 - 但 RAE 与 MoE 之间意外却直观的协同作用，可能真的会改变这一点。 [引用 @TongPetersb]：超越语言训练。我们押注视觉世界，将其作为与语言建模并行且超越它的关键下一步。因此，我们研究了从零开始用视觉构建基础模型。我们分享我们的探索：视觉表征、数据、世界建模、架构和扩展行为！[1/9]

Jim Fan@DrJimFan · 2月26日

We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:

译研究团队提出EgoScale方法，基于20,000小时第一人称人类视频预训练GR00T N1.5，仅用4小时机器人数据即可掌握组装模型车、操作注射器等高灵巧度任务，性能较从头训练提升54%。研究发现人类视频量与动作预测损失呈对数线性缩放关系（R²=0.998）。该方法利用22-DoF手部与人类的运动学相似性，无需复杂迁移算法即可重定向动作。策略可跨硬件迁移至Unitree G1（7-DoF），性能提升30%以上，且仅需单个示教即可学习新任务。

Jim Fan@DrJimFan · 2月5日

New milestone: we trained a robot foundation model on a world model backbone, and enabled zero-shot, open-world prompting capability for new verbs, nouns, and environments. If the world model can "dream" the right future in pixels, then the robot can execute well in motors. We call it "DreamZero", our first World Action Model (WAM). Our team had tons of fun at the lab typing anything we like into an open text prompt, and watch the robot perform tasks it was never trained on. An emergent capability we didn't quite expect. Obviously not GPT-3 reliable yet, but we are marching into the GPT-2 era. Discoveries: - Model and data recipe co-evolve. Compared to VLAs, WAMs learn best from diverse data, breaking away from the conventional wisdom that lots of repeated demos per task are the bread and butter. Diversity >> repetitions. - X-embodiment is extremely hard. Pixels are the answer. Different robot morphologies traditionally have a hard time sharing knowledge well. But if we put video first, pixels become the universal bridge connecting different hardware - even videos of human first-person view. DreamZero shows significant robot2robot and human2robot transfer. With only 55 trajectories on a *new*, unseen hardware (~30 min of teleop), it adapts so quickly and retains zero-shot prompting ability. Yesterday I posted about the "Second Pre-training Paradigm": world models are the next-gen foundation of Physical AI, not language backbones. Today, we are proving it works. And 2026 has just begun. Paper: World Action Models are Zero-Shot Policies. Read it now: (thread)

译团队发布DreamZero，首个基于世界模型骨干的World Action Model (WAM)。该模型突破传统Vision-Language-Action范式，通过像素级世界模型实现零样本开放世界提示能力，可执行未训练过的新任务。研究发现WAM依赖多样化数据而非重复演示，并以像素作为跨具身的通用桥梁，实现robot2robot和human2robot知识迁移。仅需55条轨迹（约30分钟遥操作）即可适应全新硬件，验证世界模型作为Physical AI下一代基础的可行性。

Saining Xie@sainingxie · 1月30日

if you are building video diffusion / world simulators, try this new sampler. temporal consistency pins videos to a low-dimensional manifold in the total pixel space. self-refinement sampling keeps them there.

译如果你在构建视频扩散/世界模拟器，试试这个新采样器。时间一致性将视频固定在总像素空间中的低维流形上。自精炼采样使它们保持在那里。 [引用 @jangsangwon7]：如果你的视频生成器能在推理时自我精炼会怎样？ ❌无需新模型。❌无需重新训练。❌无需外部验证器。 💡 推出自精炼视频采样通过将预训练生成器（Wan2.2、Cosmos）重新解释为去噪自编码器，我们实现了推理时的迭代自精炼 ➡️ 显著提升物理真实感，并获得超过70%的人类偏好！ 🧵

Saining Xie@sainingxie · 1月24日

> "rae can’t scale" > "rae can’t generalize past imagenet" > "rae can’t do details" > instead of arguing online > students put heads down > try it at real t2i scale > results come back > look extremely bullish > shoutout to peter, boyang, austin > and everyone who shipped > code, model, data > all open-sourced 👇

译> "rae 无法扩展" > "rae 无法泛化到 imagenet 之外" > "rae 无法处理细节" > 没有在网上争论 > 学生们埋头苦干 > 在真正的 t2i 规模上尝试 > 结果出来了 > 看起来非常乐观 > 向 peter、boyang、austin > 以及所有交付成果的人致敬 > 代码、模型、数据 > 全部开源 👇 [引用 @TongPetersb]：去年十月，我们提出了 Representation Autoencoders (RAE)，展示了在冻结的语义表示上训练扩散模型是可行的，并且在 ImageNet 上优于 VAEs。我们收到了很多问题：这能否扩展到像 T2I 这样的复杂场景？优势是否依然存在？答案是肯定的。🧵

Saining Xie@sainingxie · 12月16日

new paper: iREPA diffusion models are a renderer of their underlying representations. with this new setup, we can gain much clearer insight into what those representations are really about. Jas took on a spontaneous quest, and over the past three months we have learned so much ps. this is also our little experiment in a new kind of online water cooler effect that I loved seeing. let’s argue, discuss, and then turn it into proper science with real effort

译新论文：iREPA 扩散模型是其底层表征的渲染器。通过这种新设置，我们能更清楚地洞察这些表征的真正含义。Jas 开始了一场自发的探索，过去三个月我们学到了很多 ps. 这也是我们对一种新型线上"饮水机效应"的小实验，我很喜欢看到这种现象。让我们争论、讨论，然后用真正的努力将其转化为正经科学 [引用 @1jaskiratsingh]：‼️ 表征对生成很重要！但事实证明，我们对表征如何帮助生成的理解一直都是错的 ‼️ 我们之前的想法：（我们错了） ❌ 更大的视觉编码器 → 更好的表征 → 更好的生成 ❌ 更好的全局语义 → 更好的表征 → 更好的生成结果发现： 🤯 在表征对齐方面，小 20 倍以上的视觉编码器可以达到与更大模型相似或更好的性能 🤯 线性探测准确率约 20%（全局语义的衡量指标）的视觉编码器可以胜过准确率 >80% 的编码器 🤯 即使是 SiFT 和 HoG 这类经典特征也能带来与现代大得多的视觉编码器相媲美的提升 ‼️ 🚨 介绍：什么对表征对齐重要？全局信息还是空间结构 🚨 TL;DR： ✅ 更好的全局语义信息 ≠ 更好的生成 ✅ 空间结构（而非全局语义）驱动表征的生成性能 ✅ 我们提出 iREPA：仅需 3 行代码，强调空间结构迁移，并在 REPA、REPA-E、Meanflow、JiT 等方法上持续提高收敛速度在 @AdobeResearch 的激动人心的项目，与 @xingjian_leng、@zongze_wu、@LiangZheng_06、@rzhang88、@elishechtman 和 @sainingxie 合作 🙏 对我来说这也是一次特别有趣且独特的经历，在项目的每一步我们都在证明自己的偏见是错误的 😆 还要大力感谢 @YouJiacheng、@ShumingHu 和 @gallabytes，他们在 X 上的评论开启了这一方向的探索 🫡 论文：https://arxiv.org/abs/2512.10794 代码：https://github.com/End2End-Diffusion/iREPA 项目页面：https://end2end-diffusion.github.io/irepa 更多细节见线程：[1/n] 🧵

Saining Xie@sainingxie · 11月28日

it may seem like an ordinary day, but it could become the strangest moment in peer review and open science please please please treat our community with care. it’s already so fragile. don’t let it die.

译今天看似平常，却可能成为同行评审和开放科学史上最奇怪的时刻请、请、请善待我们的社区。它已经很脆弱了。不要让它消亡。 [引用 @iclr_conf]：

Saining Xie@sainingxie · 11月27日

after V*, many projects tried to get MLLMs to `think with images', but a regular 2d image limits you to mostly basic tools like zooming or cropping. to expand the action space, we need something more embodied. that is where H* from @YimingLi9702 and his team comes in. It takes a panoramic image as the environment. instead of staring at one image, the model can look around and think in 360. it is basically giving the model a neck! with that freedom, it can choose from many more actions and think inside real spaces like nyc train stations or shopping malls!

译H*项目突破传统MLLMs处理单一2D图像的局限，引入全景图像作为环境载体，使模型具备在360度真实空间中主动观察与推理的能力。相比V*等项目的局部视觉工具，H*通过"具身化"范式赋予模型类似人类颈部的视角自由度，显著扩展了行动空间，支持在地铁站、商场等复杂场景中进行视觉搜索与空间推理，实现了从被动接受到主动探索的范式转变。

Lilian Weng@lilianweng · 10月28日

On-policy distillation provides an elegant way to use the teacher model as a process reward model to provide dense reward while preventing SFT style "OOD shock" during rollout.

译On-policy distillation 提供了一种优雅的方式，将教师模型用作过程奖励模型以提供密集奖励，同时防止 rollout 期间出现 SFT 风格的"OOD shock"。 [引用 @thinkymachines]：我们最新的文章探讨了 on-policy distillation，这是一种将 RL 的错误纠正相关性与 SFT 的奖励密度相结合的训练方法。在将其用于数学推理和内部聊天助手训练时，我们发现 on-policy distillation 能以一小部分成本胜过其他方法。 https://thinkingmachines.ai/blog/on-policy-distillation/

Jeff Dean@JeffDean · 10月1日

The proof is in the evolutionary pudding!

译Google Research 利用 AlphaEvolve 迭代进化代码，自动生成可自动验证的复杂性理论证明元素，展示进化算法在数学证明发现中的应用。

Lilian Weng@lilianweng · 9月27日

Looking through those little hidden gem stories in the footnote, you will find it so inspiring that researchers with interests on the same topic are able to work together to advance a field despite their roles and locations. This is the power of open science and community.

译查看脚注中那些隐藏的宝石般的小故事，你会发现这令人鼓舞：对同一主题感兴趣的研究者能够跨越角色和地域合作推进一个领域。这就是开放科学和社区的力量。

Hao AI Lab@haoailab · 9月24日

[1/N]🚀New decoding paradigm drop!🚀 Introducing Lookahead Reasoning(LR): step-level speculation that stacks with Speculative Decoding(SD). It has been accepted to #NeurIPS2025 🎉 📖 Blog: https://hao-ai-lab.github.io/blogs/lookaheadreasoning/ 💻 Code: https://github.com/hao-ai-lab/LookaheadReasoning 📄 Paper: https://arxiv.org/abs/2506.19830

译[1/N]🚀新的解码范式发布！🚀

Hao AI Lab@haoailab · 9月22日

🚀 Thrilled to share that our lab has THREE papers accepted at #NeurIPS2025 on AI efficiency from reasoning to video generation. Come hang out with us, it's going to be a lot of fun this year here local to UCSD! 😎 📊 Efficiently Scaling LLM Reasoning with Certaindex Introduces Certaindex, an algorithm-agnostic metric measuring evolving stability that signals when further computation won't change results, plus Dynasor serving system achieving up to 50% compute savings and 3.3x higher efficiency 📎 https://arxiv.org/abs/2412.20993 @FuYichao123 @Junda_Chen_ ⚡ Scaling Speculative Decoding with Lookahead Reasoning Exploits step-level parallelism to overcome token-level speculative decoding limitations, boosting speedup from 1.4x to 2.1x on GSM8K 📎 https://arxiv.org/abs/2506.19830 @FuYichao123 🎥 VSA: Faster Video Diffusion with Trainable Sparse Attention is a hardware-efficient sparse attention for video DiTs that cuts training FLOPS by 2.53× with zero loss in diffusion quality 📎 https://arxiv.org/abs/2505.13389 @PY_Z001 @BrianChen112900 Congrats to all collaborators! 🎉

译🚀 很高兴分享我们实验室有三篇论文被 #NeurIPS2025 接收，主题是从推理到视频生成的 AI 效率。来和我们一起玩吧，今年在 UCSD 本地举办，一定会很有趣！😎

OpenAI@OpenAI · 9月18日

Today we’re releasing research with @apolloaievals. In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it. While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing for. https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/

译OpenAI 与 Apollo AI Evals 联合发布研究，在受控测试中发现前沿模型存在符合"scheming"（阴谋）特征的行为，并验证了减少此类行为的方法。尽管当前尚未造成实际危害，但团队正为未来风险做准备。

Lilian Weng@lilianweng · 9月11日

Besides the fun fact that Connectionism is connected with the early days of the AI field and highlights similarities between neural networks and human brains, the flagship product of the (first) Thinking Machines is named Connection Machine. — 🧑‍🎓Enjoy reading and more is coming!

译除了 Connectionism 与 AI 领域早期有关联、并强调神经网络与人脑相似性这一有趣事实外，（第一家）Thinking Machines 的旗舰产品名为 Connection Machine。—— 🧑‍🎓阅读愉快，更多精彩内容即将推出！ [引用 @thinkymachines]：今天 Thinking Machines Lab 推出了我们的研究博客 Connectionism。我们的第一篇博文是“Defeating Nondeterminism in LLM Inference” 我们相信科学在共享时更美好。Connectionism 将涵盖与我们研究一样多样的主题：从内核数值计算到提示工程。在这里，我们分享我们正在做的工作，并频繁、开放地与研究社区建立联系。 Connectionism 这个名字是对 AI 早期时代的致敬；它是1980年代研究神经网络及其与生物大脑相似性的子领域名称。 https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

Satya Nadella@satyanadella · 9月3日

Our breakthrough work on an analog optical computer points to new ways to solve complex real-world problems with much greater efficiency. Super to see this published today in @Nature. https://news.microsoft.com/source/features/innovation/microsoft-analog-optical-computer-cracks-two-practical-problems-shows-ai-promise/

译微软模拟光学计算机成果今日发表于 Nature，能以更高效率解决复杂现实问题，已验证破解两个实际问题并展现 AI 应用前景。