Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

15 October 2024

Papers citing "Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent"

14 / 14 papers shown

Title
Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent Bo Chen Zhenmei Shi Zhao-quan Song Jiahao Zhang NAI LRM AI4CE 53 1 0 07 Apr 2025
Theoretical Guarantees for High Order Trajectory Refinement in Generative Flows Chengyue Gong Xiaoyu Li Yingyu Liang Jiangxuan Long Zhenmei Shi Zhao-quan Song Yu Tian 54 3 0 12 Mar 2025
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches Yifang Chen Xuyang Guo Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao-quan Song 65 3 0 03 Mar 2025
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time? Chenyang Li Yingyu Liang Zhenmei Shi Zhao-quan Song 36 3 0 24 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao-quan Song Yufa Zhou 93 18 0 21 Feb 2025
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning Qifan Yu Zhenyu He Sijie Li Xun Zhou Jun Zhang Jingjing Xu Di He OffRL LRM 89 4 0 12 Feb 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time Yifang Chen Jiayan Huo Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao-quan Song 61 11 0 03 Jan 2025
$Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers$ Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$ -based Tensor Attention Transformers Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao-quan Song Mingda Wan 84 8 0 23 Dec 2024
Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study Yekun Ke Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao-quan Song 63 3 0 15 Oct 2024
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix Yingyu Liang Jiangxuan Long Zhenmei Shi Zhao-quan Song Yufa Zhou 61 5 0 15 Oct 2024
HSR-Enhanced Sparse Attention Acceleration Bo Chen Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao-quan Song 87 18 0 14 Oct 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? Khashayar Gatmiry Nikunj Saunshi Sashank J. Reddi Stefanie Jegelka Sanjiv Kumar 67 17 0 10 Oct 2024
A Tighter Complexity Analysis of SparseGPT Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao-quan Song 66 21 0 22 Aug 2024
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression Shuai Li Zhao-quan Song Yu Xia Tong Yu Tianyi Zhou 28 36 0 26 Apr 2023