UniDoc-RL Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards paper: https://huggingface.co/papers/2604.14967

译UniDoc-RL 具有分层动作和密集奖励的从粗到细视觉 RAG 论文: https://huggingface.co/papers/2604.14967

AK@_akhaliq · 4月18日46

RAD-2 Scaling Reinforcement Learning in a Generator-Discriminator Framework paper: https://huggingface.co/papers/2604.15308

译RAD-2 在生成器-判别器框架中扩展强化学习论文: https://huggingface.co/papers/2604.15308

AK@_akhaliq · 4月18日55

DR3-Eval Towards Realistic and Reproducible Deep Research Evaluation paper: https://huggingface.co/papers/2604.14683

译DR3-Eval 迈向现实且可复现的深度研究评估论文: https://huggingface.co/papers/2604.14683

AK@_akhaliq · 4月17日46

HY-World 2.0 A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds paper: https://huggingface.co/papers/2604.14268

译HY-World 2.0 一个用于重建、生成和模拟3D世界的多模态世界模型 paper: https://huggingface.co/papers/2604.14268

AK@_akhaliq · 4月17日44

Seedance 2.0 Advancing Video Generation for World Complexity paper: https://huggingface.co/papers/2604.14148

译Seedance 2.0 推进视频生成以应对世界复杂性论文: https://huggingface.co/papers/2604.14148

AK@_akhaliq · 4月17日47

Parcae Scaling Laws For Stable Looped Language Models paper: https://huggingface.co/papers/2604.12946

译Parcae 稳定循环语言模型的缩放定律论文: https://huggingface.co/papers/2604.12946

AK@_akhaliq · 4月17日39

Geometric Context Transformer for Streaming 3D Reconstruction paper: https://huggingface.co/papers/2604.14141

译用于流式3D重建的几何上下文Transformer paper: https://huggingface.co/papers/2604.14141

AK@_akhaliq · 4月17日39

GameWorld Towards Standardized and Verifiable Evaluation of Multimodal Game Agents paper: https://huggingface.co/papers/2604.07429

译GameWorld 迈向标准化且可验证的多模态游戏智能体评估论文: https://huggingface.co/papers/2604.07429

AK@_akhaliq · 4月16日49

GlotOCR Bench OCR Models Still Struggle Beyond a Handful of Unicode Scripts paper: https://huggingface.co/papers/2604.12978

译GlotOCR Bench OCR 模型在少数 Unicode 文字体系之外仍表现不佳 paper: https://huggingface.co/papers/2604.12978

AK@_akhaliq · 4月16日39

Continuous Adversarial Flow Models paper: https://huggingface.co/papers/2604.11521

译连续对抗流模型 paper: https://huggingface.co/papers/2604.11521

AK@_akhaliq · 4月16日46

ClawGUI A Unified Framework for Training, Evaluating, and Deploying GUI Agents paper: https://huggingface.co/papers/2604.11784

译ClawGUI 一个用于训练、评估和部署GUI智能体的统一框架论文: https://huggingface.co/papers/2604.11784

AK@_akhaliq · 4月16日39

KnowRL Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance paper: https://huggingface.co/papers/2604.12627

译KnowRL 通过强化学习与最小充分知识指导来提升大语言模型的推理能力论文: https://huggingface.co/papers/2604.12627

AK@_akhaliq · 4月16日48

Rethinking On-Policy Distillation of Large Language Models Phenomenology, Mechanism, and Recipe paper: https://huggingface.co/papers/2604.13016

译重新思考大型语言模型的在线策略蒸馏现象学、机制与方案论文: https://huggingface.co/papers/2604.13016

AK@_akhaliq · 4月16日39

Habitat-GS A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting paper: https://huggingface.co/papers/2604.12626

译Habitat-GS 一种采用动态高斯泼溅的高保真导航模拟器论文: https://huggingface.co/papers/2604.12626

AK@_akhaliq · 4月15日36

QuanBench+ A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation paper: https://huggingface.co/papers/2604.08570

译QuanBench+ 一个用于基于LLM的量子代码生成的统一多框架基准测试论文: https://huggingface.co/papers/2604.08570

AK@_akhaliq · 4月15日36

The Past Is Not Past Memory-Enhanced Dynamic Reward Shaping paper: https://huggingface.co/papers/2604.11297

译过去并未过去记忆增强的动态奖励塑形论文: https://huggingface.co/papers/2604.11297

AK@_akhaliq · 4月15日39

Attention Sink in Transformers A Survey on Utilization, Interpretation, and Mitigation paper: https://huggingface.co/papers/2604.10098

译Transformers中的注意力下沉关于其利用、解释与缓解方法的研究综述论文: https://huggingface.co/papers/2604.10098

AK@_akhaliq · 4月15日38

OmniShow Unifying Multimodal Conditions for Human-Object Interaction Video Generation paper: https://huggingface.co/papers/2604.11804

译OmniShow 统一多模态条件以生成人物-物体交互视频论文: https://huggingface.co/papers/2604.11804

AK@_akhaliq · 4月14日47

Matrix-Game 3.0 Real-Time and Streaming Interactive World Model with Long-Horizon Memory paper: https://huggingface.co/papers/2604.08995

译Matrix-Game 3.0 具备长时记忆的实时流式交互世界模型论文: https://huggingface.co/papers/2604.08995

AK@_akhaliq · 4月14日48

WildDet3D Scaling Promptable 3D Detection in the Wild paper: https://huggingface.co/papers/2604.08626

译WildDet3D 在野外扩展可提示的3D检测论文: https://huggingface.co/papers/2604.08626

AK@_akhaliq · 4月14日40

FORGE Fine-grained Multimodal Evaluation for Manufacturing Scenarios paper: https://huggingface.co/papers/2604.07413

译FORGE 面向制造场景的细粒度多模态评估论文: https://huggingface.co/papers/2604.07413

AK@_akhaliq · 4月14日48

Process Reward Agents for Steering Knowledge-Intensive Reasoning paper: https://huggingface.co/papers/2604.09482

译用于引导知识密集型推理的过程奖励智能体 paper: https://huggingface.co/papers/2604.09482

Hao AI Lab@haoailab · 8月28日49

[1/5] [Lmgame Bench] 🎮 Question: Can RL-based LLM post-training on games generalize to other tasks? We shared a preliminary study to explore this question: - Same-family (in-domain): Training on 6×6 Sokoban → 8×8 and Tetris (1 block type) → Tetris (2 block types) transfers, yielding up to 56% improvement across same-family variants. - Other tasks (out-of-domain): Blocksworld +3–7% and WebShop ~+6% (unstable); GSM8K: no improvement. We introduce GRL, an agent-centric multi-turn RL framework that makes LLM–environment interaction highly customizable for systematic generalization studies. Repo: https://github.com/lmgame-org/GRL Blog: https://lmgame.org/#/blog/grl (check it for details!)

译研究探讨了基于强化学习的LLM游戏后训练能否泛化到其他任务。在相同任务族内（如6×6推箱子泛化至8×8版本），训练带来了高达56%的性能提升。但在跨领域任务中，效果有限或不稳定：Blocksworld有小幅提升，WebShop有约6%但不稳定，GSM8K则无改善。研究团队为此提出了GRL框架，这是一个以智能体为中心的多轮强化学习框架，旨在高度定制LLM与环境的交互，以系统研究泛化能力。

Hao AI Lab@haoailab · 8月22日35

[Lmgame Bench] 🤔 Ever wondered how to evaluate different games in Lmgame-Bench or even add your own, but don’t know where to start? We’ve made it super easy to run evaluations and integrate new games. Our latest blog walks you through a few key features from Lmgame Bench including: - Agent & environment setup. - One-command single & multi-agent evals. - Model & gaming harness support. You can find out more from our Blog 👉https://lmgame.org/#/blog/lmgame_use

译[Lmgame Bench] 🤔 是否曾想过如何在 Lmgame-Bench 中评估不同游戏，甚至添加自己的游戏，却不知从何入手？我们已让运行评估和集成新游戏变得极其简单。我们最新的博客将引导您了解 Lmgame Bench 的几个关键功能，包括： - 智能体与环境设置。 - 单命令单智能体与多智能体评估。 - 模型与游戏框架支持。您可以通过我们的博客了解更多 👉https://lmgame.org/#/blog/lmgame_use