Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.03961
Cited By
v1
v2 (latest)
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
5 March 2025
William Merrill
Ashish Sabharwal
Author Contacts:
willm@nyu.edu
ashishs@allenai.org
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers"
23 / 23 papers shown
Title
To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
Kevin Xu
Issei Sato
LRM
51
0
0
25 May 2025
Exact Expressive Power of Transformers with Padding
William Merrill
Ashish Sabharwal
77
0
0
25 May 2025
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Hanlin Zhu
Shibo Hao
Zhiting Hu
Jiantao Jiao
Stuart Russell
Yuandong Tian
OffRL
LRM
104
0
0
18 May 2025
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
98
3
0
04 Mar 2025
Compositional Reasoning with Transformers, RNNs, and Chain of Thought
Gilad Yehudai
Noah Amsel
Joan Bruna
LRM
101
1
0
03 Mar 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
511
1
0
04 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
370
1,692
0
22 Jan 2025
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
Jacob Pfau
William Merrill
Samuel R. Bowman
LRM
75
81
0
24 Apr 2024
The Illusion of State in State-Space Models
William Merrill
Jackson Petty
Ashish Sabharwal
94
57
0
12 Apr 2024
Transformers as Transducers
Lena Strobl
Dana Angluin
David Chiang
Jonathan Rawski
Ashish Sabharwal
74
5
0
02 Apr 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRM
AI4CE
57
124
0
20 Feb 2024
Looped Transformers are Better at Learning Learning Algorithms
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
78
26
0
21 Nov 2023
The Impact of Depth on Compositional Generalization in Transformer Language Models
Jackson Petty
Sjoerd van Steenkiste
Ishita Dasgupta
Fei Sha
Daniel H Garrette
Tal Linzen
AI4CE
VLM
72
18
0
30 Oct 2023
The Expressive Power of Transformers with Chain of Thought
William Merrill
Ashish Sabharwal
LRM
AI4CE
ReLM
76
41
0
11 Oct 2023
Tighter Bounds on the Expressivity of Transformer Encoders
David Chiang
Peter A. Cholak
A. Pillay
82
58
0
25 Jan 2023
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
131
175
0
19 Oct 2022
A Logic for Expressing Log-Precision Transformers
William Merrill
Ashish Sabharwal
ReLM
NAI
LRM
95
52
0
06 Oct 2022
The Parallelism Tradeoff: Limitations of Log-Precision Transformers
William Merrill
Ashish Sabharwal
73
112
0
02 Jul 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
817
9,576
0
28 Jan 2022
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
82
105
0
30 Jun 2021
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
139
993
0
12 Feb 2020
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
85
753
0
10 Jul 2018
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
413
10,494
0
21 Jul 2016
1