Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.18948
Cited By
Exact Expressive Power of Transformers with Padding
25 May 2025
William Merrill
Ashish Sabharwal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exact Expressive Power of Transformers with Padding"
18 / 18 papers shown
Title
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
William Merrill
Ashish Sabharwal
59
8
0
05 Mar 2025
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
Jacob Pfau
William Merrill
Samuel R. Bowman
LRM
46
76
0
24 Apr 2024
Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers
Andy Yang
David Chiang
51
11
0
05 Apr 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRM
AI4CE
37
113
0
20 Feb 2024
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
41
55
0
01 Nov 2023
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
Andy Yang
David Chiang
Dana Angluin
55
17
0
21 Oct 2023
The Expressive Power of Transformers with Chain of Thought
William Merrill
Ashish Sabharwal
LRM
AI4CE
ReLM
34
41
0
11 Oct 2023
Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal
Ziwei Ji
A. S. Rawat
A. Menon
Sanjiv Kumar
Vaishnavh Nagarajan
LRM
67
112
0
03 Oct 2023
Looped Transformers as Programmable Computers
Angeliki Giannou
Shashank Rajput
Jy-yong Sohn
Kangwook Lee
Jason D. Lee
Dimitris Papailiopoulos
56
105
0
30 Jan 2023
Tighter Bounds on the Expressivity of Transformer Encoders
David Chiang
Peter A. Cholak
A. Pillay
62
58
0
25 Jan 2023
A Logic for Expressing Log-Precision Transformers
William Merrill
Ashish Sabharwal
ReLM
NAI
LRM
72
52
0
06 Oct 2022
The Parallelism Tradeoff: Limitations of Log-Precision Transformers
William Merrill
Ashish Sabharwal
18
106
0
02 Jul 2022
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Sophie Hao
Dana Angluin
Robert Frank
28
75
0
13 Apr 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
570
9,009
0
28 Jan 2022
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Maxwell Nye
Anders Andreassen
Guy Gur-Ari
Henryk Michalewski
Jacob Austin
...
Aitor Lewkowycz
Maarten Bosma
D. Luan
Charles Sutton
Augustus Odena
ReLM
LRM
127
728
0
30 Nov 2021
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
37
102
0
30 Jun 2021
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
85
973
0
12 Feb 2020
Root Mean Square Layer Normalization
Biao Zhang
Rico Sennrich
49
712
0
16 Oct 2019
1