Linear Transformers Are Secretly Fast Weight Programmers

22 February 2021

Papers citing "Linear Transformers Are Secretly Fast Weight Programmers"

16 / 166 papers shown

Title
Simple Local Attentions Remain Competitive for Long-Context Tasks Wenhan Xiong Barlas Ouguz Anchit Gupta Xilun Chen Diana Liskovich Omer Levy Wen-tau Yih Yashar Mehdad 44 29 0 14 Dec 2021
Attention Approximates Sparse Distributed Memory Trenton Bricken Cengiz Pehlevan 27 34 0 10 Nov 2021
Improving Transformers with Probabilistic Attention Keys Tam Nguyen T. Nguyen Dung D. Le Duy Khuong Nguyen Viet-Anh Tran Richard G. Baraniuk Nhat Ho Stanley J. Osher 53 32 0 16 Oct 2021
On Learning the Transformer Kernel Sankalan Pal Chowdhury Adamos Solomou Kumar Avinava Dubey Mrinmaya Sachan ViT 52 14 0 15 Oct 2021
Hybrid Random Features K. Choromanski Haoxian Chen Han Lin Yuanzhe Ma Arijit Sehanobish ... Andy Zeng Valerii Likhosherstov Dmitry Kalashnikov Vikas Sindhwani Adrian Weller 23 21 0 08 Oct 2021
ABC: Attention with Bounded-memory Control Hao Peng Jungo Kasai Nikolaos Pappas Dani Yogatama Zhaofeng Wu Lingpeng Kong Roy Schwartz Noah A. Smith 76 22 0 06 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity Lin Zheng Huijie Pan Lingpeng Kong 28 3 0 06 Oct 2021
Learning with Holographic Reduced Representations Ashwinkumar Ganesan Hang Gao S. Gandhi Edward Raff Tim Oates James Holt Mark McLean 13 23 0 05 Sep 2021
The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers Róbert Csordás Kazuki Irie Jürgen Schmidhuber ViT 30 128 0 26 Aug 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers Kazuki Irie Imanol Schlag Róbert Csordás Jürgen Schmidhuber 33 57 0 11 Jun 2021
Staircase Attention for Recurrent Processing of Sequences Da Ju Stephen Roller Sainbayar Sukhbaatar Jason Weston 29 11 0 08 Jun 2021
Choose a Transformer: Fourier or Galerkin Shuhao Cao 42 225 0 31 May 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention Irwan Bello 281 179 0 17 Feb 2021
Meta Learning Backpropagation And Improving It Louis Kirsch Jürgen Schmidhuber 53 56 0 29 Dec 2020
On the Binding Problem in Artificial Neural Networks Klaus Greff Sjoerd van Steenkiste Jürgen Schmidhuber OCL 229 255 0 09 Dec 2020
A Decomposable Attention Model for Natural Language Inference Ankur P. Parikh Oscar Täckström Dipanjan Das Jakob Uszkoreit 213 1,367 0 06 Jun 2016