Transformers from an Optimization Perspective

27 May 2022

Papers citing "Transformers from an Optimization Perspective"

9 / 9 papers shown

Title
ICLR: In-Context Learning of Representations Core Francisco Park Andrew Lee Ekdeep Singh Lubana Yongyi Yang Maya Okawa Kento Nishi Martin Wattenberg Hidenori Tanaka AIFin 118 3 0 29 Dec 2024
iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer Toshihiro Ota Masato Taki 29 2 0 25 Apr 2023
Optimization-Derived Learning with Essential Convergence Analysis of Training and Hyper-training Risheng Liu Xuan Liu Shangzhi Zeng Jin Zhang Yixuan Zhang 40 6 0 16 Jun 2022
IGLU: Efficient GCN Training via Lazy Updates S. Narayanan Aditya Sinha Prateek Jain Purushottam Kar Sundararajan Sellamanickam BDL 52 9 0 28 Sep 2021
Is Attention Better Than Matrix Decomposition? Zhengyang Geng Meng-Hao Guo Hongxu Chen Xia Li Ke Wei Zhouchen Lin 56 137 0 09 Sep 2021
Elastic Graph Neural Networks Xiaorui Liu W. Jin Yao Ma Yaxin Li Hua Liu Yiqi Wang Ming Yan Jiliang Tang 92 107 0 05 Jul 2021
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges M. Bronstein Joan Bruna Taco S. Cohen Petar Velivcković GNN 174 1,104 0 27 Apr 2021
Input Convex Neural Networks Brandon Amos Lei Xu J. Zico Kolter 178 598 0 22 Sep 2016
A Proximal Stochastic Gradient Method with Progressive Variance Reduction Lin Xiao Tong Zhang ODL 84 736 0 19 Mar 2014