Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.01775
Cited By
Gated recurrent neural networks discover attention
4 September 2023
Nicolas Zucchet
Seijin Kobayashi
Yassir Akram
J. Oswald
Maxime Larcher
Angelika Steger
João Sacramento
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gated recurrent neural networks discover attention"
33 / 33 papers shown
Title
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
146
2,670
0
01 Dec 2023
Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
Antonio Orvieto
Soham De
Çağlar Gülçehre
Razvan Pascanu
Samuel L. Smith
41
20
0
21 Jul 2023
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
124
332
0
17 Jul 2023
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
Arvind V. Mahankali
Tatsunori B. Hashimoto
Tengyu Ma
MLT
39
97
0
07 Jul 2023
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
76
201
0
16 Jun 2023
Transformers learn to implement preconditioned gradient descent for in-context learning
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
76
169
0
01 Jun 2023
Online learning of long-range dependencies
Nicolas Zucchet
Robert Meier
Simon Schug
Asier Mujika
João Sacramento
CLL
57
20
0
25 May 2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
207
593
0
22 May 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
318
287
0
11 Mar 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
110
396
0
28 Dec 2022
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
99
488
0
15 Dec 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
314
514
0
24 Sep 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
129
505
0
01 Aug 2022
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Stephanie C. Y. Chan
Adam Santoro
Andrew Kyle Lampinen
Jane X. Wang
Aaditya K. Singh
Pierre Harvey Richemond
J. Mcclelland
Felix Hill
122
262
0
22 Apr 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
111
305
0
27 Mar 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu
Karan Goel
Christopher Ré
205
1,773
0
31 Oct 2021
A Practical Survey on Faster and Lighter Transformers
Quentin Fournier
G. Caron
Daniel Aloise
88
100
0
26 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
99
357
0
03 Mar 2021
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
117
246
0
22 Feb 2021
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
136
718
0
08 Nov 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
179
1,580
0
30 Sep 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
198
1,755
0
29 Jun 2020
Array Programming with NumPy
Charles R. Harris
K. Millman
S. Walt
R. Gommers
Pauli Virtanen
...
Tyler Reddy
Warren Weckesser
Hameer Abbasi
C. Gohlke
T. Oliphant
147
14,953
0
18 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
743
41,932
0
28 May 2020
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
117
258
0
30 Aug 2019
Efficient Attention: Attention with Linear Complexities
Zhuoran Shen
Mingyuan Zhang
Haiyu Zhao
Shuai Yi
Hongsheng Li
92
527
0
04 Dec 2018
Differentiable plasticity: training plastic neural networks with backpropagation
Thomas Miconi
Jeff Clune
Kenneth O. Stanley
AI4CE
58
154
0
06 Apr 2018
Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems
Lyudmila Grigoryeva
Juan-Pablo Ortega
35
66
0
03 Dec 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
687
131,526
0
12 Jun 2017
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
237
2,397
0
23 Dec 2016
Using Fast Weights to Attend to the Recent Past
Jimmy Ba
Geoffrey E. Hinton
Volodymyr Mnih
Joel Z Leibo
Catalin Ionescu
63
271
0
20 Oct 2016
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
Kyunghyun Cho
B. V. Merrienboer
Dzmitry Bahdanau
Yoshua Bengio
AI4CE
AIMat
237
6,775
0
03 Sep 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
546
27,300
0
01 Sep 2014
1