Rohan Paul@rohanpaul_ai

2026-06-24 11:26·8天前

AI 摘要

微软新论文Next-Latent Prediction (NextLat) 提出一种自监督学习方法，在常规token预测基础上增加预测下一隐藏状态的任务，迫使Transformer学习紧凑的内部世界模型。该方法在地图式世界建模、数学推理、图规划、故事预测等任务上表现更优，生成速度通过自推测解码最高提升3.3x，且无需改变Transformer架构或减慢正常推理。

New Microsoft paper argues that transformers generalize better when they learn compact internal states， not just next tokens.

The problem is that normal transformers can look back at every earlier token， so they do not have to squeeze the past into a clean summary. token prediction alone can reward shortcuts that do not become coherent world models.

That can work beautifully on familiar data and still fail when the model has to plan， detour， reason， or carry a hidden structure forward.

NextLat fixes this by adding a training task where the model must predict its next hidden state， not just the next word.

A hidden state is the model's private summary of what it has seen， so predicting the next one pushes the model to learn how situations change over time.

The authors tested this on map-like world modeling， math reasoning， graph planning， story prediction， and regular language modeling.

The main result is that NextLat often learned more compact and useful internal states， solved planning tasks better， and sped up generation by up to 3.3x.

Overall， it gives transformers some of the useful memory behavior of recurrent models without changing the transformer architecture or slowing normal inference.

----

Link - arxiv. org/abs/2511.05963

Title： "Next-Latent Prediction Transformers Learn Compact World Models"

Jayden TeohNext-token prediction is myopic. What if transformers learn to predict their own next latent state? 🌠 We present Next-Latent Prediction (NextLat): a self-super...

Microsoft 推理数据/训练论文/研究

在 X 查看原推导出 Markdown

Rohan Paul@rohanpaul_ai · X

46导出 Markdown