Transformers 天生简洁

2026-06-06 12:11·27天前·brandonb

AI 摘要

一篇题为“Transformers are inherently succinct”的论文在 openreview.net 上发布，从理论上论证 Transformer 架构具有内在的简洁性（succinctness）。

原文 · 未翻译

Published as a conference paper at ICLR 2026

TRANSFORMERS ARE INHERENTLY SUCCINCT

Pascal Bergstr¨ aßer

RPTU Kaiserslautern-Landau Kaiserslautern, Germany

bergstraesser@cs.uni-kl.de

Ryan Cotterell

ETH Z¨ urich Zurich, Switzerland

ryan.cotterell@inf.ethz.ch

Anthony W. Lin

RPTU Kaiserslautern-Landau and MPI-SWS Kaiserslautern, Germany

lin@cs.uni-kl.de

ABSTRACT

We study succinctness as a measure of the expressive power of transformers. Succinctness—how compactly a formalism can describe a language relative to other formalisms—is a classical notion in logic and automata theory. We prove that fixed-precision transformers are remarkably succinct: they can be exponen-tially more succinct than both linear temporal logic (LTL) and recurrent neural networks, and, by extension, state-space models, and doubly exponentially more succinct than finite automata. In other words, there exist families of languages describable by polynomial-size transformers whose smallest equivalent LTL for-mula or recurrent neural network is exponentially large, and whose smallest equiv-alent automaton is doubly exponentially large. We also establish matching upper bounds, showing that any fixed-precision transformer can be converted to an LTL formula with at most an exponential blow-up—improving a prior doubly expo-nential translation. As a consequence of this succinctness, we show that basic verification problems for transformers, such as emptiness and equivalence, are provably intractable: specifically, EXPSPACE -complete.

1 INTRODUCTION

Transformers (Vaswani et al., 2017) are the dominant architecture underlying most modern large language models. A substantial body of recent theoretical work has investigated their expressive power (Strobl et al., 2024; Barcel´ o et al., 2024; Yang et al., 2024; Hahn, 2020; P´ erez et al., 2021; Chiang and Cholak, 2022; Jerad et al., 2025), their trainability and ability to generalize to unseen strings of longer lengths (Zhou et al., 2024; Huang et al., 2025; Chiang and Cholak, 2022), and the extent to which their behavior can be formally verified (S¨ alzer et al., 2025). A key finding of this line of work is that transformers with finite precision—the setting most faithful to real-world hardware—recognize various classes of subregular languages depending on the exact assumptions made (Yang et al., 2024; Barcel´ o et al., 2024; Jerad et al., 2025; Li and Cotterell, 2025). Subregular languages constitute strict subclasses of the regular languages. For instance, the subreg-ular class of star-free languages are precisely those definable by regular expressions that replace the Kleene star with intersection and complementation. The language a∗b∗ is star-free because it can be written as ∅ · b · a · ∅ , whereas (aa )∗ is not star-free (Straubing, 1994). By contrast, recurrent neural networks (RNNs) can recognize all regular languages under a fixed precision assumption (Minsky, 1967; Siegelmann and Sontag, 1995; Merrill et al., 2020; Svete and Cotterell, 2023), making them strictly more expressive than transformers as language recognizers. However, the strong empirical performance of transformers invites the question as to whether expressive capacity is the most revealing lens through which to compare architectures. In this paper, we propose succinctness as an alternative lens for understanding the expressivity of transformers. The succinctness of a language L with respect to a class C of language recognizers (e.g., transformers, finite automata, and formulas in FO [ 0.3 A symbol embedding naturally extends to a homomorphism Σ∗ → (QD )∗, where

Hacker News 热门（buzzing.cc 中文翻译）

56导出 Markdown