
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
Papers citing "One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention"
15 / 15 papers shown