ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.11174
  4. Cited By
Linear Transformers Are Secretly Fast Weight Programmers

Linear Transformers Are Secretly Fast Weight Programmers

22 February 2021
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
ArXivPDFHTML

Papers citing "Linear Transformers Are Secretly Fast Weight Programmers"

50 / 166 papers shown
Title
Chimera: Effectively Modeling Multivariate Time Series with
  2-Dimensional State Space Models
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Ali Behrouz
Michele Santacatterina
Ramin Zabih
Mamba
AI4TS
54
4
0
06 Jun 2024
Attention-based Iterative Decomposition for Tensor Product
  Representation
Attention-based Iterative Decomposition for Tensor Product Representation
Taewon Park
Inchul Choi
Minho Lee
28
1
0
03 Jun 2024
Why Larger Language Models Do In-context Learning Differently?
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
37
18
0
30 May 2024
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified
  Perspective
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Zhen Qin
Xuyang Shen
Weigao Sun
Dong Li
Stanley T. Birchfield
Richard I. Hartley
Yiran Zhong
52
6
0
27 May 2024
Rethinking Transformers in Solving POMDPs
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
35
3
0
27 May 2024
On Understanding Attention-Based In-Context Learning for Categorical Data
On Understanding Attention-Based In-Context Learning for Categorical Data
Aaron T. Wang
William Convertino
Xiang Cheng
Ricardo Henao
Lawrence Carin
61
0
0
27 May 2024
Understanding the differences in Foundation Models: Attention, State
  Space Models, and Recurrent Neural Networks
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber
Carmen Amo Alonso
A. Didier
M. Zeilinger
Antonio Orvieto
AAML
50
8
0
24 May 2024
HGRN2: Gated Linear RNNs with State Expansion
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin
Songlin Yang
Weixuan Sun
Xuyang Shen
Dong Li
Weigao Sun
Yiran Zhong
LRM
47
47
0
11 Apr 2024
Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Tsendsuren Munkhdalai
Manaal Faruqui
Siddharth Gopal
LRM
LLMAG
CLL
91
102
0
10 Apr 2024
Structurally Flexible Neural Networks: Evolving the Building Blocks for
  General Agents
Structurally Flexible Neural Networks: Evolving the Building Blocks for General Agents
J. Pedersen
Erwan Plantec
Eleni Nisioti
Milton L. Montero
Sebastian Risi
55
1
0
06 Apr 2024
Faster Diffusion via Temporal Attention Decomposition
Faster Diffusion via Temporal Attention Decomposition
Haozhe Liu
Wentian Zhang
Jinheng Xie
Francesco Faccio
Mengmeng Xu
Tao Xiang
Mike Zheng Shou
Juan-Manuel Perez-Rua
Jürgen Schmidhuber
DiffM
75
19
0
03 Apr 2024
Linear Attention Sequence Parallelism
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
73
2
0
03 Apr 2024
Mechanistic Design and Scaling of Hybrid Architectures
Mechanistic Design and Scaling of Hybrid Architectures
Michael Poli
Armin W. Thomas
Eric N. D. Nguyen
Pragaash Ponnusamy
Bjorn Deiseroth
...
Brian Hie
Stefano Ermon
Christopher Ré
Ce Zhang
Stefano Massaroli
MoE
57
21
0
26 Mar 2024
Learning Useful Representations of Recurrent Neural Network Weight
  Matrices
Learning Useful Representations of Recurrent Neural Network Weight Matrices
Vincent Herrmann
Francesco Faccio
Jürgen Schmidhuber
23
7
0
18 Mar 2024
Transfer Learning Beyond Bounded Density Ratios
Transfer Learning Beyond Bounded Density Ratios
Alkis Kalavasis
Ilias Zadik
Manolis Zampetakis
50
4
0
18 Mar 2024
Learning Associative Memories with Gradient Descent
Learning Associative Memories with Gradient Descent
Vivien A. Cabannes
Berfin Simsek
A. Bietti
38
6
0
28 Feb 2024
PIDformer: Transformer Meets Control Theory
PIDformer: Transformer Meets Control Theory
Tam Nguyen
César A. Uribe
Tan-Minh Nguyen
Richard G. Baraniuk
50
7
0
25 Feb 2024
Linear Transformers are Versatile In-Context Learners
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
34
13
0
21 Feb 2024
Linear Transformers with Learnable Kernel Functions are Better
  In-Context Models
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Yaroslav Aksenov
Nikita Balagansky
Sofia Maria Lo Cicero Vaina
Boris Shaposhnikov
Alexey Gorbatovski
Daniil Gavrilov
KELM
33
5
0
16 Feb 2024
On the Resurgence of Recurrent Models for Long Sequences -- Survey and
  Research Opportunities in the Transformer Era
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era
Matteo Tiezzi
Michele Casoni
Alessandro Betti
Tommaso Guidi
Marco Gori
S. Melacci
21
9
0
12 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
  Mimicry
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
32
48
0
06 Feb 2024
HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full
  Context Interaction
HyperZ⋅\cdot⋅Z⋅\cdot⋅W Operator Connects Slow-Fast Networks for Full Context Interaction
Harvie Zhang
36
0
0
31 Jan 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
39
12
0
30 Jan 2024
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
29
1
0
18 Dec 2023
Delving Deeper Into Astromorphic Transformers
Delving Deeper Into Astromorphic Transformers
Md. Zesun Ahmed Mia
Malyaban Bal
Abhronil Sengupta
36
1
0
18 Dec 2023
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
28
14
0
13 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
48
142
0
11 Dec 2023
Transformers Implement Functional Gradient Descent to Learn Non-Linear
  Functions In Context
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng
Yuxin Chen
S. Sra
18
35
0
11 Dec 2023
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting
  Computation in Superposition
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition
Nicolas Menet
Michael Hersche
G. Karunaratne
Luca Benini
Abu Sebastian
Abbas Rahimi
33
13
0
05 Dec 2023
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust
  Attention
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
Isabel Leal
Krzysztof Choromanski
Deepali Jain
Kumar Avinava Dubey
Jake Varley
...
Q. Vuong
Tamás Sarlós
Kenneth Oslund
Karol Hausman
Kanishka Rao
44
8
0
04 Dec 2023
Mitigating Over-smoothing in Transformers via Regularized Nonlocal
  Functionals
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
Tam Nguyen
Tan-Minh Nguyen
Richard G. Baraniuk
29
8
0
01 Dec 2023
Efficient Rotation Invariance in Deep Neural Networks through Artificial
  Mental Rotation
Efficient Rotation Invariance in Deep Neural Networks through Artificial Mental Rotation
Lukas Tuggener
Thilo Stadelmann
Jürgen Schmidhuber
OOD
21
1
0
14 Nov 2023
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin
Songlin Yang
Yiran Zhong
36
74
0
08 Nov 2023
p-Laplacian Transformer
p-Laplacian Transformer
Tuan Nguyen
Tam Nguyen
Vinh-Tiep Nguyen
Tan-Minh Nguyen
79
0
0
06 Nov 2023
Simplifying Transformer Blocks
Simplifying Transformer Blocks
Bobby He
Thomas Hofmann
25
30
0
03 Nov 2023
Practical Computational Power of Linear Transformers and Their Recurrent
  and Self-Referential Extensions
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
36
11
0
24 Oct 2023
Learning to (Learn at Test Time)
Learning to (Learn at Test Time)
Yu Sun
Xinhao Li
Karan Dalal
Chloe Hsu
Oluwasanmi Koyejo
Carlos Guestrin
Xiaolong Wang
Tatsunori Hashimoto
Xinlei Chen
SSL
30
6
0
20 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
24
18
0
16 Oct 2023
Do pretrained Transformers Learn In-Context by Gradient Descent?
Do pretrained Transformers Learn In-Context by Gradient Descent?
Lingfeng Shen
Aayush Mishra
Daniel Khashabi
39
7
0
12 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
30
0
0
11 Oct 2023
Reinforcement Learning with Fast and Forgetful Memory
Reinforcement Learning with Fast and Forgetful Memory
Steven D. Morad
Ryan Kortvelesy
Stephan Liwicki
Amanda Prorok
OffRL
24
4
0
06 Oct 2023
Scaling Laws for Associative Memories
Scaling Laws for Associative Memories
Vivien A. Cabannes
Elvis Dohmatob
A. Bietti
24
19
0
04 Oct 2023
The Languini Kitchen: Enabling Language Modelling Research at Different
  Scales of Compute
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
43
9
0
20 Sep 2023
Uncovering mesa-optimization algorithms in Transformers
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
26
53
0
11 Sep 2023
Gated recurrent neural networks discover attention
Gated recurrent neural networks discover attention
Nicolas Zucchet
Seijin Kobayashi
Yassir Akram
J. Oswald
Maxime Larcher
Angelika Steger
João Sacramento
36
8
0
04 Sep 2023
Recurrent Attention Networks for Long-text Modeling
Recurrent Attention Networks for Long-text Modeling
Xianming Li
Zongxi Li
Xiaotian Luo
Haoran Xie
Xing Lee
Yingbin Zhao
Fu Lee Wang
Qing Li
RALM
28
15
0
12 Jun 2023
Birth of a Transformer: A Memory Viewpoint
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
35
82
0
01 Jun 2023
Transformers learn to implement preconditioned gradient descent for
  in-context learning
Transformers learn to implement preconditioned gradient descent for in-context learning
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
28
147
0
01 Jun 2023
Exploring the Promise and Limits of Real-Time Recurrent Learning
Exploring the Promise and Limits of Real-Time Recurrent Learning
Kazuki Irie
Anand Gopalakrishnan
Jürgen Schmidhuber
29
15
0
30 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
39
53
0
25 May 2023
Previous
1234
Next