To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers

To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers

25 May 2025

Author Contacts:

kevinxu@g.ecc.u-tokyo.ac.jp sato@g.ecc.u-tokyo.ac.jp

ArXiv (abs)PDF HTML

Papers citing "To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers"

18 / 18 papers shown

Title
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers William Merrill Ashish Sabharwal 94 9 0 05 Mar 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers Nikunj Saunshi Nishanth Dikkala Zhiyuan Li Sanjiv Kumar Sashank J. Reddi OffRL LRM AI4CE 109 21 0 24 Feb 2025
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding Kevin Xu Issei Sato 91 4 0 02 Oct 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models Sean Welleck Amanda Bertsch Matthew Finlayson Hailey Schoelkopf Alex Xie Graham Neubig Ilia Kulikov Zaid Harchaoui 103 69 0 24 Jun 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning Franz Nowak Anej Svete Alexandra Butoi Ryan Cotterell ReLM LRM 90 15 0 20 Jun 2024
MoEUT: Mixture-of-Experts Universal Transformers Róbert Csordás Kazuki Irie Jürgen Schmidhuber Christopher Potts Christopher D. Manning MoE 76 10 0 25 May 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems Zhiyuan Li Hong Liu Denny Zhou Tengyu Ma LRM AI4CE 60 124 0 20 Feb 2024
Transformers, parallel computation, and logarithmic depth Clayton Sanford Daniel J. Hsu Matus Telgarsky 61 38 0 14 Feb 2024
The Expressive Power of Transformers with Chain of Thought William Merrill Ashish Sabharwal LRM AI4CE ReLM 76 41 0 11 Oct 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective Guhao Feng Bohang Zhang Yuntian Gu Haotian Ye Di He Liwei Wang LRM 100 248 0 24 May 2023
Why think step by step? Reasoning emerges from the locality of experience Ben Prystawski Michael Y. Li Noah D. Goodman LRM ReLM 69 102 0 07 Apr 2023
Looped Transformers as Programmable Computers Angeliki Giannou Shashank Rajput Jy-yong Sohn Kangwook Lee Jason D. Lee Dimitris Papailiopoulos 79 106 0 30 Jan 2023
The Parallelism Tradeoff: Limitations of Log-Precision Transformers William Merrill Ashish Sabharwal 76 112 0 02 Jul 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models Xuezhi Wang Jason W. Wei Dale Schuurmans Quoc Le Ed H. Chi Sharan Narang Aakanksha Chowdhery Denny Zhou ReLM BDL LRM AI4CE 519 3,646 0 21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 817 9,576 0 28 Jan 2022
Universal Transformers Mostafa Dehghani Stephan Gouws Oriol Vinyals Jakob Uszkoreit Lukasz Kaiser 85 753 0 10 Jul 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 713 131,652 0 12 Jun 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Noam M. Shazeer Azalia Mirhoseini Krzysztof Maziarz Andy Davis Quoc V. Le Geoffrey E. Hinton J. Dean MoE 251 2,653 0 23 Jan 2017