Linear Transformers Are Secretly Fast Weight Programmers

22 February 2021

Papers citing "Linear Transformers Are Secretly Fast Weight Programmers"

50 / 166 papers shown

Title
Contrastive Training of Complex-Valued Autoencoders for Object Discovery Aleksandar Stanić Anand Gopalakrishnan Kazuki Irie Jürgen Schmidhuber OCL 36 14 0 24 May 2023
Brain-inspired learning in artificial neural networks: a review Samuel Schmidgall Jascha Achterberg Thomas Miconi Louis Kirsch Rojin Ziaei S. P. Hajiseyedrazi Jason Eshraghian 31 52 0 18 May 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers L. Yu Daniel Simig Colin Flaherty Armen Aghajanyan Luke Zettlemoyer M. Lewis 32 84 0 12 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps Yanfang Li Huan Wang Muxia Sun LM&MA AI4TS AI4CE 29 46 0 10 May 2023
Accelerating Neural Self-Improvement via Bootstrapping Kazuki Irie Jürgen Schmidhuber 29 1 0 02 May 2023
Meta-Learned Models of Cognition Marcel Binz Ishita Dasgupta Akshay K. Jagadish M. Botvinick Jane X. Wang Eric Schulz 30 25 0 12 Apr 2023
POPGym: Benchmarking Partially Observable Reinforcement Learning Steven D. Morad Ryan Kortvelesy Matteo Bettini Stephan Liwicki Amanda Prorok OffRL 19 37 0 03 Mar 2023
Permutation-Invariant Set Autoencoders with Fixed-Size Embeddings for Multi-Agent Learning Ryan Kortvelesy Steven D. Morad Amanda Prorok AI4CE 27 2 0 24 Feb 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models Michael Poli Stefano Massaroli Eric Q. Nguyen Daniel Y. Fu Tri Dao S. Baccus Yoshua Bengio Stefano Ermon Christopher Ré VLM 22 285 0 21 Feb 2023
Theory of coupled neuronal-synaptic dynamics David G. Clark L. F. Abbott 24 18 0 17 Feb 2023
Self-Organising Neural Discrete Representation Learning à la Kohonen Kazuki Irie Róbert Csordás Jürgen Schmidhuber SSL 32 1 0 15 Feb 2023
Efficient Attention via Control Variates Lin Zheng Jianbo Yuan Chong-Jun Wang Lingpeng Kong 34 18 0 09 Feb 2023
Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs Y. Duan Zhongfan Jia Qian Li Yi Zhong Kaisheng Ma AAML 30 2 0 07 Feb 2023
Mnemosyne: Learning to Train Transformers with Transformers Deepali Jain K. Choromanski Kumar Avinava Dubey Sumeet Singh Vikas Sindhwani Tingnan Zhang Jie Tan OffRL 39 9 0 02 Feb 2023
Simplex Random Features Isaac Reid K. Choromanski Valerii Likhosherstov Adrian Weller 34 7 0 31 Jan 2023
Learning One Abstract Bit at a Time Through Self-Invented Experiments Encoded as Neural Networks Vincent Herrmann Louis Kirsch Jürgen Schmidhuber AI4CE 46 4 0 29 Dec 2022
On Transforming Reinforcement Learning by Transformer: The Development Trajectory Shengchao Hu Li Shen Ya Zhang Yixin Chen Dacheng Tao OffRL 27 25 0 29 Dec 2022
Annotated History of Modern AI and Deep Learning Juergen Schmidhuber MLAU AI4TS AI4CE 33 22 0 21 Dec 2022
Transformers learn in-context by gradient descent J. Oswald Eyvind Niklasson E. Randazzo João Sacramento A. Mordvintsev A. Zhmoginov Max Vladymyrov MLT 30 434 0 15 Dec 2022
Meta-Learning Fast Weight Language Models Kevin Clark Kelvin Guu Ming-Wei Chang Panupong Pasupat Geoffrey E. Hinton Mohammad Norouzi KELM 32 13 0 05 Dec 2022
What learning algorithm is in-context learning? Investigations with linear models Ekin Akyürek Dale Schuurmans Jacob Andreas Tengyu Ma Denny Zhou 34 441 0 28 Nov 2022
Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks Kazuki Irie Jürgen Schmidhuber KELM 24 1 0 17 Nov 2022
Characterizing Verbatim Short-Term Memory in Neural Language Models K. Armeni C. Honey Tal Linzen KELM RALM 30 3 0 24 Oct 2022
Modeling Context With Linear Attention for Scalable Document-Level Translation Zhaofeng Wu Hao Peng Nikolaos Pappas Noah A. Smith 14 3 0 16 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling Jinchao Zhang Shuyang Jiang Jiangtao Feng Lin Zheng Lingpeng Kong 3DV 43 9 0 14 Oct 2022
Designing Robust Transformers using Robust Kernel Density Estimation Xing Han Tongzheng Ren T. Nguyen Khai Nguyen Joydeep Ghosh Nhat Ho 23 6 0 11 Oct 2022
LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models A. Konstantinov Lev V. Utkin 38 0 0 11 Oct 2022
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights H. H. Mao 69 20 0 09 Oct 2022
Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules Kazuki Irie Jürgen Schmidhuber 37 5 0 07 Oct 2022
Deep is a Luxury We Don't Have Ahmed Taha Yen Nhi Truong Vu Brent Mombourquette Thomas P. Matthews Jason Su Sadanand Singh ViT MedIm 23 2 0 11 Aug 2022
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter Aleksandar Stanić Yujin Tang David R Ha Jürgen Schmidhuber ELM 29 13 0 05 Aug 2022
AGBoost: Attention-based Modification of Gradient Boosting Machine A. Konstantinov Lev V. Utkin Stanislav R. Kirpichenko ODL 13 7 0 12 Jul 2022
Attention and Self-Attention in Random Forests Lev V. Utkin A. Konstantinov 40 3 0 09 Jul 2022
Goal-Conditioned Generators of Deep Policies Francesco Faccio Vincent Herrmann Aditya A. Ramesh Louis Kirsch Jürgen Schmidhuber OffRL 40 8 0 04 Jul 2022
Rethinking Query-Key Pairwise Interactions in Vision Transformers Cheng-rong Li Yangxin Liu 34 0 0 01 Jul 2022
Short-Term Plasticity Neurons Learning to Learn and Forget Hector Garcia Rodriguez Qinghai Guo Timoleon Moraitis 13 12 0 28 Jun 2022
Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules Kazuki Irie Francesco Faccio Jürgen Schmidhuber AI4TS 35 11 0 03 Jun 2022
Transformer with Fourier Integral Attentions T. Nguyen Minh Pham Tam Nguyen Khai Nguyen Stanley J. Osher Nhat Ho 25 4 0 01 Jun 2022
BayesPCN: A Continually Learnable Predictive Coding Associative Memory Jason Yoo F. Wood KELM 94 9 0 20 May 2022
Minimal Neural Network Models for Permutation Invariant Agents J. Pedersen S. Risi 51 3 0 12 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops Jungo Kasai Keisuke Sakaguchi Ronan Le Bras Dragomir R. Radev Yejin Choi Noah A. Smith 26 6 0 11 Apr 2022
Linear Complexity Randomized Self-attention Mechanism Lin Zheng Chong-Jun Wang Lingpeng Kong 22 31 0 10 Apr 2022
On the link between conscious function and general intelligence in humans and machines Arthur Juliani Kai Arulkumaran Shuntaro Sasai Ryota Kanai 34 26 0 24 Mar 2022
Linearizing Transformer with Key-Value Memory Yizhe Zhang Deng Cai 20 5 0 23 Mar 2022
FAR: Fourier Aerial Video Recognition D. Kothandaraman Tianrui Guan Xijun Wang Sean Hu Ming-Shun Lin Tianyi Zhou 21 13 0 21 Mar 2022
Block-Recurrent Transformers DeLesley S. Hutchins Imanol Schlag Yuhuai Wu Ethan Dyer Behnam Neyshabur 23 94 0 11 Mar 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention Kazuki Irie Róbert Csordás Jürgen Schmidhuber 14 42 0 11 Feb 2022
A Modern Self-Referential Weight Matrix That Learns to Modify Itself Kazuki Irie Imanol Schlag Róbert Csordás Jürgen Schmidhuber 14 26 0 11 Feb 2022
Latency Adjustable Transformer Encoder for Language Understanding Sajjad Kachuee M. Sharifkhani 37 0 0 10 Jan 2022
Attention-based Random Forest and Contamination Model Lev V. Utkin A. Konstantinov 26 29 0 08 Jan 2022