来自 @TongPetersb、@DavidJFan 和 @__JohnNguyen__ 的又一项科学探索,即使你身处前沿实验室,也可能会让你学到新东西 这里有很多有趣的观察,但我只强调一点: - 尝试用 MoE 扩展 DiTs 大多徒劳无功,这算是行业公开的秘密。 - 但 RAE 与 MoE 之间意外却直观的协同作用,可能真的会改变这一点。 [引用 @TongPetersb]:超越语言训练。我们押注视觉世界,将其作为与语言建模并行且超越它的关键下一步。因此,我们研究了从零开始用视觉构建基础模型。我们分享我们的探索:视觉表征、数据、世界建模、架构和扩展行为![1/9]
another scientific exploration from @TongPetersb, @DavidJFan, and @__JohnNguyen__ that might teach you something new, even if you're in a frontier lab
lots of interesting observations here, but I'll highlight just one: - it's kind of an open industry secret that trying to scale DiTs with MoE has mostly been fruitless. - the unexpected, yet intuitive, synergy between RAE and MoE might actually change that.