ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.11174
  4. Cited By
Linear Transformers Are Secretly Fast Weight Programmers

Linear Transformers Are Secretly Fast Weight Programmers

22 February 2021
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
ArXivPDFHTML

Papers citing "Linear Transformers Are Secretly Fast Weight Programmers"

16 / 166 papers shown
Title
Simple Local Attentions Remain Competitive for Long-Context Tasks
Simple Local Attentions Remain Competitive for Long-Context Tasks
Wenhan Xiong
Barlas Ouguz
Anchit Gupta
Xilun Chen
Diana Liskovich
Omer Levy
Wen-tau Yih
Yashar Mehdad
44
29
0
14 Dec 2021
Attention Approximates Sparse Distributed Memory
Attention Approximates Sparse Distributed Memory
Trenton Bricken
Cengiz Pehlevan
27
34
0
10 Nov 2021
Improving Transformers with Probabilistic Attention Keys
Improving Transformers with Probabilistic Attention Keys
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
53
32
0
16 Oct 2021
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
52
14
0
15 Oct 2021
Hybrid Random Features
Hybrid Random Features
K. Choromanski
Haoxian Chen
Han Lin
Yuanzhe Ma
Arijit Sehanobish
...
Andy Zeng
Valerii Likhosherstov
Dmitry Kalashnikov
Vikas Sindhwani
Adrian Weller
23
21
0
08 Oct 2021
ABC: Attention with Bounded-memory Control
ABC: Attention with Bounded-memory Control
Hao Peng
Jungo Kasai
Nikolaos Pappas
Dani Yogatama
Zhaofeng Wu
Lingpeng Kong
Roy Schwartz
Noah A. Smith
76
22
0
06 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
28
3
0
06 Oct 2021
Learning with Holographic Reduced Representations
Learning with Holographic Reduced Representations
Ashwinkumar Ganesan
Hang Gao
S. Gandhi
Edward Raff
Tim Oates
James Holt
Mark McLean
13
23
0
05 Sep 2021
The Devil is in the Detail: Simple Tricks Improve Systematic
  Generalization of Transformers
The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
ViT
30
128
0
26 Aug 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
33
57
0
11 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
29
11
0
08 Jun 2021
Choose a Transformer: Fourier or Galerkin
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
225
0
31 May 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
281
179
0
17 Feb 2021
Meta Learning Backpropagation And Improving It
Meta Learning Backpropagation And Improving It
Louis Kirsch
Jürgen Schmidhuber
53
56
0
29 Dec 2020
On the Binding Problem in Artificial Neural Networks
On the Binding Problem in Artificial Neural Networks
Klaus Greff
Sjoerd van Steenkiste
Jürgen Schmidhuber
OCL
229
255
0
09 Dec 2020
A Decomposable Attention Model for Natural Language Inference
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
213
1,367
0
06 Jun 2016
Previous
1234