Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.11268
Cited By
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
15 October 2024
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent"
14 / 14 papers shown
Title
Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent
Bo Chen
Zhenmei Shi
Zhao-quan Song
Jiahao Zhang
NAI
LRM
AI4CE
53
1
0
07 Apr 2025
Theoretical Guarantees for High Order Trajectory Refinement in Generative Flows
Chengyue Gong
Xiaoyu Li
Yingyu Liang
Jiangxuan Long
Zhenmei Shi
Zhao-quan Song
Yu Tian
54
3
0
12 Mar 2025
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Yifang Chen
Xuyang Guo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
65
3
0
03 Mar 2025
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
36
3
0
24 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
93
18
0
21 Feb 2025
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Qifan Yu
Zhenyu He
Sijie Li
Xun Zhou
Jun Zhang
Jingjing Xu
Di He
OffRL
LRM
89
4
0
12 Feb 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen
Jiayan Huo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
61
11
0
03 Jan 2025
Theoretical Constraints on the Expressive Power of
R
o
P
E
\mathsf{RoPE}
RoPE
-based Tensor Attention Transformers
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
Mingda Wan
84
8
0
23 Dec 2024
Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
63
3
0
15 Oct 2024
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Yingyu Liang
Jiangxuan Long
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
61
5
0
15 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
87
18
0
14 Oct 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
67
17
0
10 Oct 2024
A Tighter Complexity Analysis of SparseGPT
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
66
21
0
22 Aug 2024
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Shuai Li
Zhao-quan Song
Yu Xia
Tong Yu
Tianyi Zhou
28
36
0
26 Apr 2023
1