Rohan Paul@rohanpaul_ai

2026-05-29 22:02·34天前

AI 摘要

这是一场关于AI架构的辩论。Transformer阵营指出，其凭借简单、硬件友好、可扩展的优势主导当下，核心是基于键值存储的记忆与注意力机制，并强调任何替代架构必须能在扩展性上与之匹敌，且需达到约10倍优势才能颠覆现有技术栈。Post-Transformer阵营则认为，当前大语言模型的推理更像是后置的文本步骤，真正的突破在于实现模型内部的“潜在推理”与持续学习能力，并指出长上下文不等于真正记忆，未来可能是混合架构。辩论还提到，当前公开基准测试易被优化，而困惑度（Perplexity）仍是评估前沿模型的有效指标。最后指出，尽管Transformer仍占主导，但前沿正在拓宽，并列举了Pathway的BDH、Sakana AI的CTMs和Liquid AI的LFMs等新兴架构作为例证。

This is probably the most entertaining way to understand one of AI's hardest AI debates.

Transformer vs Post-Transformer， argued by leading researchers， inside a real physical boxing ring.

Both technically deep and genuinely entertaining.

I was glued for the entire 1 hour 20 minutes. So many super cool points to learn.

🥊 Transformers

Transformers still own the present because they work at scale. They are simple， trainable， hardware-friendly， and already power the strongest AI systems we use today.

The Transformer is basically a memory machine. It stores information as keys and values， then uses attention to pull back the most useful parts when answering.

The real Transformer advantage is not just "attention." The bigger advantage is that it fits modern hardware extremely well， so it can process huge batches of tokens fast.

Scaling is still the brutal rule. If you give Transformers more compute， more data， and more parameters， they usually keep getting better. Any Post-Transformer architecture has to scale just as well， or better.

It is not enough to look clever on small tests， because the real question is whether it improves faster than Transformers when scaled up.

A replacement cannot be slightly better. Because the whole AI stack is already built around Transformers， the next architecture may need to be around 10x better to force everyone to switch.

Transformers are powerful， but they may be brute force. A human does not need to read the entire internet many times to become smart， but current LLMs need enormous data and compute.

🥊 Post-Transformer

Post-Transformer people are not saying Transformers are bad. They are saying Transformers may be the best current tool， not the final form of machine intelligence.