v1v2 (latest)

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

5 March 2025

William Merrill

Ashish Sabharwal

Author Contacts:

willm@nyu.edu ashishs@allenai.org

ArXiv (abs)PDF HTML

Papers citing "A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers"

23 / 23 papers shown

Title
To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers Kevin Xu Issei Sato LRM 51 0 0 25 May 2025
Exact Expressive Power of Transformers with Padding William Merrill Ashish Sabharwal 77 0 0 25 May 2025
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought Hanlin Zhu Shibo Hao Zhiting Hu Jiantao Jiao Stuart Russell Yuandong Tian OffRL LRM 104 0 0 18 May 2025
(How) Do Language Models Track State? Belinda Z. Li Zifan Carl Guo Jacob Andreas LRM 98 3 0 04 Mar 2025
Compositional Reasoning with Transformers, RNNs, and Chain of Thought Gilad Yehudai Noah Amsel Joan Bruna LRM 101 1 0 03 Mar 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers Alireza Amiri Xinting Huang Mark Rofin Michael Hahn LRM 511 1 0 04 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning DeepSeek-AI Daya Guo Dejian Yang Haowei Zhang Junxiao Song ... Shiyu Wang S. Yu Shunfeng Zhou Shuting Pan S.S. Li ReLM VLM OffRL AI4TS LRM 370 1,692 0 22 Jan 2025
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models Jacob Pfau William Merrill Samuel R. Bowman LRM 75 81 0 24 Apr 2024
The Illusion of State in State-Space Models William Merrill Jackson Petty Ashish Sabharwal 94 57 0 12 Apr 2024
Transformers as Transducers Lena Strobl Dana Angluin David Chiang Jonathan Rawski Ashish Sabharwal 74 5 0 02 Apr 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems Zhiyuan Li Hong Liu Denny Zhou Tengyu Ma LRM AI4CE 57 124 0 20 Feb 2024
Looped Transformers are Better at Learning Learning Algorithms Liu Yang Kangwook Lee Robert D. Nowak Dimitris Papailiopoulos 78 26 0 21 Nov 2023
The Impact of Depth on Compositional Generalization in Transformer Language Models Jackson Petty Sjoerd van Steenkiste Ishita Dasgupta Fei Sha Daniel H Garrette Tal Linzen AI4CE VLM 72 18 0 30 Oct 2023
The Expressive Power of Transformers with Chain of Thought William Merrill Ashish Sabharwal LRM AI4CE ReLM 76 41 0 11 Oct 2023
Tighter Bounds on the Expressivity of Transformer Encoders David Chiang Peter A. Cholak A. Pillay 82 58 0 25 Jan 2023
Transformers Learn Shortcuts to Automata Bingbin Liu Jordan T. Ash Surbhi Goel A. Krishnamurthy Cyril Zhang OffRL LRM 131 175 0 19 Oct 2022
A Logic for Expressing Log-Precision Transformers William Merrill Ashish Sabharwal ReLM NAI LRM 95 52 0 06 Oct 2022
The Parallelism Tradeoff: Limitations of Log-Precision Transformers William Merrill Ashish Sabharwal 73 112 0 02 Jul 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 817 9,576 0 28 Jan 2022
Saturated Transformers are Constant-Depth Threshold Circuits William Merrill Ashish Sabharwal Noah A. Smith 82 105 0 30 Jun 2021
On Layer Normalization in the Transformer Architecture Ruibin Xiong Yunchang Yang Di He Kai Zheng Shuxin Zheng Chen Xing Huishuai Zhang Yanyan Lan Liwei Wang Tie-Yan Liu AI4CE 139 993 0 12 Feb 2020
Universal Transformers Mostafa Dehghani Stephan Gouws Oriol Vinyals Jakob Uszkoreit Lukasz Kaiser 85 753 0 10 Jul 2018
Layer Normalization Jimmy Lei Ba J. Kiros Geoffrey E. Hinton 413 10,494 0 21 Jul 2016