Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.11174
Cited By
Linear Transformers Are Secretly Fast Weight Programmers
22 February 2021
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Linear Transformers Are Secretly Fast Weight Programmers"
50 / 166 papers shown
Title
Contrastive Training of Complex-Valued Autoencoders for Object Discovery
Aleksandar Stanić
Anand Gopalakrishnan
Kazuki Irie
Jürgen Schmidhuber
OCL
36
14
0
24 May 2023
Brain-inspired learning in artificial neural networks: a review
Samuel Schmidgall
Jascha Achterberg
Thomas Miconi
Louis Kirsch
Rojin Ziaei
S. P. Hajiseyedrazi
Jason Eshraghian
31
52
0
18 May 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
32
84
0
12 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
29
46
0
10 May 2023
Accelerating Neural Self-Improvement via Bootstrapping
Kazuki Irie
Jürgen Schmidhuber
29
1
0
02 May 2023
Meta-Learned Models of Cognition
Marcel Binz
Ishita Dasgupta
Akshay K. Jagadish
M. Botvinick
Jane X. Wang
Eric Schulz
30
25
0
12 Apr 2023
POPGym: Benchmarking Partially Observable Reinforcement Learning
Steven D. Morad
Ryan Kortvelesy
Matteo Bettini
Stephan Liwicki
Amanda Prorok
OffRL
19
37
0
03 Mar 2023
Permutation-Invariant Set Autoencoders with Fixed-Size Embeddings for Multi-Agent Learning
Ryan Kortvelesy
Steven D. Morad
Amanda Prorok
AI4CE
27
2
0
24 Feb 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
22
285
0
21 Feb 2023
Theory of coupled neuronal-synaptic dynamics
David G. Clark
L. F. Abbott
24
18
0
17 Feb 2023
Self-Organising Neural Discrete Representation Learning à la Kohonen
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
SSL
32
1
0
15 Feb 2023
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
34
18
0
09 Feb 2023
Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs
Y. Duan
Zhongfan Jia
Qian Li
Yi Zhong
Kaisheng Ma
AAML
30
2
0
07 Feb 2023
Mnemosyne: Learning to Train Transformers with Transformers
Deepali Jain
K. Choromanski
Kumar Avinava Dubey
Sumeet Singh
Vikas Sindhwani
Tingnan Zhang
Jie Tan
OffRL
39
9
0
02 Feb 2023
Simplex Random Features
Isaac Reid
K. Choromanski
Valerii Likhosherstov
Adrian Weller
34
7
0
31 Jan 2023
Learning One Abstract Bit at a Time Through Self-Invented Experiments Encoded as Neural Networks
Vincent Herrmann
Louis Kirsch
Jürgen Schmidhuber
AI4CE
46
4
0
29 Dec 2022
On Transforming Reinforcement Learning by Transformer: The Development Trajectory
Shengchao Hu
Li Shen
Ya Zhang
Yixin Chen
Dacheng Tao
OffRL
27
25
0
29 Dec 2022
Annotated History of Modern AI and Deep Learning
Juergen Schmidhuber
MLAU
AI4TS
AI4CE
33
22
0
21 Dec 2022
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
30
434
0
15 Dec 2022
Meta-Learning Fast Weight Language Models
Kevin Clark
Kelvin Guu
Ming-Wei Chang
Panupong Pasupat
Geoffrey E. Hinton
Mohammad Norouzi
KELM
32
13
0
05 Dec 2022
What learning algorithm is in-context learning? Investigations with linear models
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
34
441
0
28 Nov 2022
Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks
Kazuki Irie
Jürgen Schmidhuber
KELM
24
1
0
17 Nov 2022
Characterizing Verbatim Short-Term Memory in Neural Language Models
K. Armeni
C. Honey
Tal Linzen
KELM
RALM
30
3
0
24 Oct 2022
Modeling Context With Linear Attention for Scalable Document-Level Translation
Zhaofeng Wu
Hao Peng
Nikolaos Pappas
Noah A. Smith
14
3
0
16 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
3DV
43
9
0
14 Oct 2022
Designing Robust Transformers using Robust Kernel Density Estimation
Xing Han
Tongzheng Ren
T. Nguyen
Khai Nguyen
Joydeep Ghosh
Nhat Ho
23
6
0
11 Oct 2022
LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models
A. Konstantinov
Lev V. Utkin
38
0
0
11 Oct 2022
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
H. H. Mao
69
20
0
09 Oct 2022
Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules
Kazuki Irie
Jürgen Schmidhuber
37
5
0
07 Oct 2022
Deep is a Luxury We Don't Have
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
23
2
0
11 Aug 2022
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
Aleksandar Stanić
Yujin Tang
David R Ha
Jürgen Schmidhuber
ELM
29
13
0
05 Aug 2022
AGBoost: Attention-based Modification of Gradient Boosting Machine
A. Konstantinov
Lev V. Utkin
Stanislav R. Kirpichenko
ODL
13
7
0
12 Jul 2022
Attention and Self-Attention in Random Forests
Lev V. Utkin
A. Konstantinov
40
3
0
09 Jul 2022
Goal-Conditioned Generators of Deep Policies
Francesco Faccio
Vincent Herrmann
Aditya A. Ramesh
Louis Kirsch
Jürgen Schmidhuber
OffRL
40
8
0
04 Jul 2022
Rethinking Query-Key Pairwise Interactions in Vision Transformers
Cheng-rong Li
Yangxin Liu
34
0
0
01 Jul 2022
Short-Term Plasticity Neurons Learning to Learn and Forget
Hector Garcia Rodriguez
Qinghai Guo
Timoleon Moraitis
13
12
0
28 Jun 2022
Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules
Kazuki Irie
Francesco Faccio
Jürgen Schmidhuber
AI4TS
35
11
0
03 Jun 2022
Transformer with Fourier Integral Attentions
T. Nguyen
Minh Pham
Tam Nguyen
Khai Nguyen
Stanley J. Osher
Nhat Ho
25
4
0
01 Jun 2022
BayesPCN: A Continually Learnable Predictive Coding Associative Memory
Jason Yoo
F. Wood
KELM
94
9
0
20 May 2022
Minimal Neural Network Models for Permutation Invariant Agents
J. Pedersen
S. Risi
51
3
0
12 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Linear Complexity Randomized Self-attention Mechanism
Lin Zheng
Chong-Jun Wang
Lingpeng Kong
22
31
0
10 Apr 2022
On the link between conscious function and general intelligence in humans and machines
Arthur Juliani
Kai Arulkumaran
Shuntaro Sasai
Ryota Kanai
34
26
0
24 Mar 2022
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
FAR: Fourier Aerial Video Recognition
D. Kothandaraman
Tianrui Guan
Xijun Wang
Sean Hu
Ming-Shun Lin
Tianyi Zhou
21
13
0
21 Mar 2022
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
23
94
0
11 Mar 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
14
42
0
11 Feb 2022
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
14
26
0
11 Feb 2022
Latency Adjustable Transformer Encoder for Language Understanding
Sajjad Kachuee
M. Sharifkhani
37
0
0
10 Jan 2022
Attention-based Random Forest and Contamination Model
Lev V. Utkin
A. Konstantinov
26
29
0
08 Jan 2022
Previous
1
2
3
4
Next