AIHOT
内容
精选全部 AI 动态AI 日报主题收藏
接入
Agent 接入
更多
关于更新日志反馈
内部员工登录
精选全部日报更多
内部员工登录
全部动态X · 605 条
全部一手资讯X论文
标签「论文/研究」清除
Jim Fan@DrJimFan · 8月5日

World modeling for robotics is incredibly hard because (1) control of humanoid robots & 5-finger hands is wayyy harder than ⬆️⬅️⬇️➡️ in games (Genie 3); and (2) object interaction is much more diverse than FSD, which needs to *avoid* coming into contact. Our GR00T Dreams work was a first attempt at building high-fidelity world simulator for humanoid robots. It's not only for evaluation but also for large-scale synthetic data generation. Time to move away from the "fossil fuel" of robotics (human teleoperation) and embrace clean energy (nuclear "diffusion")! GR00T Dreams kind of flew under the radar, so bringing it back to life on a cheerful day ;)

译NVIDIA发布DreamGen引擎(GR00T Dreams),将Sora/Veo等视频生成模型用作神经物理引擎,通过微调模型、模拟并行世界、恢复伪动作、训练基础模型四步流程,为机器人生成大规模合成训练数据。人形机器人仅凭单一拾放任务即可学会倾倒、折叠等22种新行为,在新动词和陌生环境中实现零样本泛化(成功率分别达43%和28%)。相比传统图形引擎,该方法以恒定计算成本处理可变形物体、流体等复杂交互,团队计划数周内完全开源。

Saining Xie@sainingxie · 7月11日

The three biggest hps for stable training in everything are lr, bs, and beta2. We’ve built up good intuitions on how to tune them over time, but this lays it all out analytically and convincingly. this is definitely my new handbook for training big models on small gpus.

译对于所有任务中稳定训练的三个最重要超参数是 lr、bs 和 beta2。随着时间推移,我们已经建立了关于如何调整它们的良好直觉,但这篇文章分析性地、令人信服地阐述了这一切。

Saining Xie@sainingxie · 7月1日

awesome work by @jiacheng_chen_ and @sanghyunwoo1219 on 3D-grounded visual compositing (and nice demos!)

译@jiacheng_chen_ 和 @sanghyunwoo1219 在基于3D的视觉合成方面的工作很棒(演示也很棒!)

Jim Fan@DrJimFan · 5月20日

What if robots could dream inside a video generative model? Introducing DreamGen, a new engine that scales up robot learning not with fleets of human operators, but with digital dreams in pixels. DreamGen produces massive volumes of neural trajectories - photorealistic robot videos paired with motor action labels - and unlocks strong generalization to new nouns, verbs, and environments. Whether you’re a humanoid (GR1), an industrial arm (Franka), or a cute little robot (HuggingFace SO-100), DreamGen enables you to dream. Video generation models like Sora & Veo are neural physics engines. By compressing billions of internet videos, they learn a multiverse of plausible futures, i.e. superpositions of how the world could unfold from any initial image frame. DreamGen taps into this power with a simple 4-step recipe: 1. Fine-tune a SOTA video model on your target robot; 2. Prompt the model with diverse language prompts to simulate parallel worlds: how your robot would have acted in new scenarios. Filter out the bad dreams (ha!) that don’t follow instructions; 3. Recover pseudo-actions using inverse dynamics or latent action models; 4. Train robot foundation models on the massively augmented dataset of neural trajectories. That’s it. Just more data, and plain old supervised learning. Simple, right? What’s remarkable is how far this goes. Starting with just a single-task dataset of pick-and-place, our humanoid robot learns 22 new behaviors, such as pouring, folding, scooping, ironing, and hammering, despite never seeing those verbs before. Better yet, we can take the robot out of the lab and drop it into the NVIDIA HQ Cafe, and let DreamGen work its magic. We show true zero-to-one generalization: from 0% success to over 43% for novel verbs, and 0 -> 28% in unseen environments. Compared to a traditional graphics engine, DreamGen doesn’t care if the scene involves deformable objects, fluids, translucent materials, contact-rich interactions, or crazy lighting. Good luck engineering those by hand. For DreamGen, every world is just a forward pass through a diffusion neural net. No matter how complex the dream is, it takes constant compute time to roll out. Read our blog and paper today! We plan to fully open-source the entire pipeline in the next few weeks. Links in thread:

译DreamGen让机器人在视频生成模型中"做梦"合成训练数据。通过微调Sora等模型生成海量神经轨迹(逼真视频+动作标签),机器人从单一拾取放置任务泛化到倾倒、折叠等22种新行为。在NVIDIA总部咖啡厅测试中,人形机器人对新动词零样本成功率从0%提升至43%,新环境达28%。相比传统图形引擎,无需手工建模即可处理流体、可变形物体等复杂场景,整个pipeline将于近期完全开源。

DeepSeek@deepseek_ai · 2月18日

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning. 📖 For more details, check out our paper here: https://arxiv.org/abs/2502.11089

译NSA是一种硬件对齐且原生可训练的稀疏注意力机制,专为超快速长上下文训练与推理设计。其核心采用动态分层稀疏策略,结合粗粒度token压缩与细粒度token选择。通过针对现代硬件的优化,NSA在加速推理、降低预训练成本的同时不损失性能,在通用基准、长上下文任务及指令推理中匹配或超越Full Attention模型。

没有更多了
全部 AI 动态
AI 相关资讯全量信息流
全部一手信源资讯推文
全部模型产品行业论文技巧
8月5日
23:57
Jim Fan@DrJimFan
精选
NVIDIA推出DreamGen引擎:让机器人在视频生成模型中"做梦"学习

NVIDIA发布DreamGen引擎(GR00T Dreams),将Sora/Veo等视频生成模型用作神经物理引擎,通过微调模型、模拟并行世界、恢复伪动作、训练基础模型四步流程,为机器人生成大规模合成训练数据。人形机器人仅凭单一拾放任务即可学会倾倒、折叠等22种新行为,在新动词和陌生环境中实现零样本泛化(成功率分别达43%和28%)。相比传统图形引擎,该方法以恒定计算成本处理可变形物体、流体等复杂交互,团队计划数周内完全开源。

Jim Fan: What if robots could dream inside a video generative model? Introducing DreamGen, a new engine that scales up robot lear...

具身智能视频论文/研究

推荐理由:NVIDIA提出用视频生成模型为机器人“造梦”合成训练数据,实现零样本技能泛化
7月11日
07:33
Saining Xie@sainingxie
对于所有任务中稳定训练的三个最重要超参数是 lr、bs 和 beta2。随着时间推移,我们已经建立了关于如何调整它们的良好直觉,但这篇文章分析性地、令人信服地阐述了这一切。

Micah Goldblum: 🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is ...

数据/训练论文/研究
7月1日
01:06
Saining Xie@sainingxie
@jiacheng_chen_ 和 @sanghyunwoo1219 在基于3D的视觉合成方面的工作很棒(演示也很棒!)

Sanghyun Woo: Introducing BlenderFusion: Reassemble your visual elements-objects, camera, and background-to compose a new visual narra...

图像生成论文/研究
5月20日
21:29
Jim Fan@DrJimFan
精选
DreamGen:让机器人在视频生成模型中"做梦"合成训练数据

DreamGen让机器人在视频生成模型中"做梦"合成训练数据。通过微调Sora等模型生成海量神经轨迹(逼真视频+动作标签),机器人从单一拾取放置任务泛化到倾倒、折叠等22种新行为。在NVIDIA总部咖啡厅测试中,人形机器人对新动词零样本成功率从0%提升至43%,新环境达28%。相比传统图形引擎,无需手工建模即可处理流体、可变形物体等复杂场景,整个pipeline将于近期完全开源。

具身智能视频论文/研究

推荐理由:NVIDIA 提出 DreamGen:让机器人在视频生成模型中「做梦」合成训练数据,实现强零样本泛化,将开源
2月18日
15:04
DeepSeek@deepseek_ai
精选
NSA:硬件对齐的稀疏注意力新机制

NSA是一种硬件对齐且原生可训练的稀疏注意力机制,专为超快速长上下文训练与推理设计。其核心采用动态分层稀疏策略,结合粗粒度token压缩与细粒度token选择。通过针对现代硬件的优化,NSA在加速推理、降低预训练成本的同时不损失性能,在通用基准、长上下文任务及指令推理中匹配或超越Full Attention模型。

DeepSeek推理论文/研究部署/工程

推荐理由:DeepSeek 推出硬件对齐稀疏注意力 NSA,长上下文训练推理双提速,预训练成本显著降低
‹ 上一页
1…141516
下一页 ›