# 在真实拳击场辩论：Transformer 架构与后继者的未来

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-05-29 22:02
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmpr07s6n09s2slnoqs7ey044
- 原文链接：https://x.com/rohanpaul_ai/status/2060361249477795942

## AI 摘要

这是一场关于AI架构的辩论。Transformer阵营指出，其凭借简单、硬件友好、可扩展的优势主导当下，核心是基于键值存储的记忆与注意力机制，并强调任何替代架构必须能在扩展性上与之匹敌，且需达到约10倍优势才能颠覆现有技术栈。Post-Transformer阵营则认为，当前大语言模型的推理更像是后置的文本步骤，真正的突破在于实现模型内部的“潜在推理”与持续学习能力，并指出长上下文不等于真正记忆，未来可能是混合架构。辩论还提到，当前公开基准测试易被优化，而困惑度（Perplexity）仍是评估前沿模型的有效指标。最后指出，尽管Transformer仍占主导，但前沿正在拓宽，并列举了Pathway的BDH、Sakana AI的CTMs和Liquid AI的LFMs等新兴架构作为例证。

## 正文

This is probably the most entertaining way to understand one of AI's hardest AI debates.

Transformer vs Post-Transformer， argued by leading researchers， inside a real physical boxing ring.

Both technically deep and genuinely entertaining.

I was glued for the entire 1 hour 20 minutes. So many super cool points to learn.

🥊 Transformers

- Transformers still own the present because they work at scale. They are simple， trainable， hardware-friendly， and already power the strongest AI systems we use today.

- The Transformer is basically a memory machine. It stores information as keys and values， then uses attention to pull back the most useful parts when answering.

- The real Transformer advantage is not just "attention." The bigger advantage is that it fits modern hardware extremely well， so it can process huge batches of tokens fast.

- Scaling is still the brutal rule. If you give Transformers more compute， more data， and more parameters， they usually keep getting better. Any Post-Transformer architecture has to scale just as well， or better.

- It is not enough to look clever on small tests， because the real question is whether it improves faster than Transformers when scaled up.

- A replacement cannot be slightly better. Because the whole AI stack is already built around Transformers， the next architecture may need to be around 10x better to force everyone to switch.

- Transformers are powerful， but they may be brute force. A human does not need to read the entire internet many times to become smart， but current LLMs need enormous data and compute.

🥊 Post-Transformer

- Post-Transformer people are not saying Transformers are bad. They are saying Transformers may be the best current tool， not the final form of machine intelligence.

- The biggest Post-Transformer target is native reasoning and continual learning. Today's LLM reasoning often feels like text-based step-by-step work added on top， instead of thinking happening naturally inside the model.

- Latent reasoning is one possible next step. That means the model reasons inside its own hidden internal space， instead of writing every thought out as words.

- Continual learning is still a major weakness. Humans keep learning from experience， but most Transformer-based models are trained， frozen， and then only adapt inside the prompt.

- Long context is not the same as real memory. A model can read a huge prompt， but that is different from building a life history， learning from mistakes， and updating beliefs over time.

- The future may be hybrid， not a clean replacement. Transformers may stay as 1 building block while newer systems add better memory， better reasoning， and better learning loops.

- The most interesting possibility is that Transformers may help discover their own successor. AI agents are already getting better at research and coding， so the next architecture may come from AI-assisted architecture search.

-------

- Benchmarks are a problem. Many public benchmarks are easy to game， so they may show leaderboard strength without proving deeper intelligence.

- Perplexity is still probably a great metric to evaluate frontier models，， because it tests prediction quality.

---

Overall， Transformers continue to dominate， but the frontier is clearly widening.

Pathway's BDH （Dragon Hatchling - brain-inspired reasoning architecture）， Sakana AI's CTMs （Continuous Thought Machines - models that think over time）， and Liquid AI's LFMs （Liquid Foundation Models - efficient multimodal foundation models） - all of these show how the frontier is expanding.

---
From "Pathway （pathway【.】com）" Youtube channel （link in comment）

@zuzanna_pathway