ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.08292
  4. Cited By
Can Looped Transformers Learn to Implement Multi-step Gradient Descent
  for In-context Learning?

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

10 October 2024
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
ArXiv (abs)PDFHTML

Papers citing "Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?"

7 / 7 papers shown
Title
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
125
0
0
20 May 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi
Nishanth Dikkala
Zhiyuan Li
Sanjiv Kumar
Sashank J. Reddi
OffRLLRMAI4CE
141
22
0
24 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
177
19
0
21 Feb 2025
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Qifan Yu
Zhenyu He
Sijie Li
Xun Zhou
Jun Zhang
Jingjing Xu
Di He
OffRLLRM
139
5
0
12 Feb 2025
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
135
7
0
28 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
152
22
0
15 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
120
4
0
02 Oct 2024
1