ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.03961
  4. Cited By
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
v1v2 (latest)

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

5 March 2025
William Merrill
Ashish Sabharwal
Author Contacts:
willm@nyu.eduashishs@allenai.org
ArXiv (abs)PDFHTML

Papers citing "A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers"

23 / 23 papers shown
Title
To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
Kevin Xu
Issei Sato
LRM
51
0
0
25 May 2025
Exact Expressive Power of Transformers with Padding
Exact Expressive Power of Transformers with Padding
William Merrill
Ashish Sabharwal
77
0
0
25 May 2025
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Hanlin Zhu
Shibo Hao
Zhiting Hu
Jiantao Jiao
Stuart Russell
Yuandong Tian
OffRLLRM
104
0
0
18 May 2025
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
98
3
0
04 Mar 2025
Compositional Reasoning with Transformers, RNNs, and Chain of Thought
Gilad Yehudai
Noah Amsel
Joan Bruna
LRM
101
1
0
03 Mar 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
511
1
0
04 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
370
1,692
0
22 Jan 2025
Let's Think Dot by Dot: Hidden Computation in Transformer Language
  Models
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
Jacob Pfau
William Merrill
Samuel R. Bowman
LRM
75
81
0
24 Apr 2024
The Illusion of State in State-Space Models
The Illusion of State in State-Space Models
William Merrill
Jackson Petty
Ashish Sabharwal
94
57
0
12 Apr 2024
Transformers as Transducers
Transformers as Transducers
Lena Strobl
Dana Angluin
David Chiang
Jonathan Rawski
Ashish Sabharwal
74
5
0
02 Apr 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial
  Problems
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRMAI4CE
57
124
0
20 Feb 2024
Looped Transformers are Better at Learning Learning Algorithms
Looped Transformers are Better at Learning Learning Algorithms
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
78
26
0
21 Nov 2023
The Impact of Depth on Compositional Generalization in Transformer
  Language Models
The Impact of Depth on Compositional Generalization in Transformer Language Models
Jackson Petty
Sjoerd van Steenkiste
Ishita Dasgupta
Fei Sha
Daniel H Garrette
Tal Linzen
AI4CEVLM
72
18
0
30 Oct 2023
The Expressive Power of Transformers with Chain of Thought
The Expressive Power of Transformers with Chain of Thought
William Merrill
Ashish Sabharwal
LRMAI4CEReLM
76
41
0
11 Oct 2023
Tighter Bounds on the Expressivity of Transformer Encoders
Tighter Bounds on the Expressivity of Transformer Encoders
David Chiang
Peter A. Cholak
A. Pillay
82
58
0
25 Jan 2023
Transformers Learn Shortcuts to Automata
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRLLRM
131
175
0
19 Oct 2022
A Logic for Expressing Log-Precision Transformers
A Logic for Expressing Log-Precision Transformers
William Merrill
Ashish Sabharwal
ReLMNAILRM
95
52
0
06 Oct 2022
The Parallelism Tradeoff: Limitations of Log-Precision Transformers
The Parallelism Tradeoff: Limitations of Log-Precision Transformers
William Merrill
Ashish Sabharwal
73
112
0
02 Jul 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
817
9,576
0
28 Jan 2022
Saturated Transformers are Constant-Depth Threshold Circuits
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
82
105
0
30 Jun 2021
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
139
993
0
12 Feb 2020
Universal Transformers
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
85
753
0
10 Jul 2018
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
413
10,494
0
21 Jul 2016
1