Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.07341
Cited By
LaMemo: Language Modeling with Look-Ahead Memory
15 April 2022
Haozhe Ji
Rongsheng Zhang
Zhenyu Yang
Zhipeng Hu
Minlie Huang
KELM
RALM
CLL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LaMemo: Language Modeling with Look-Ahead Memory"
17 / 17 papers shown
Title
Do Long-Range Language Models Actually Use Long-Range Context?
Simeng Sun
Kalpesh Krishna
Andrew Mattarella-Micke
Mohit Iyyer
RALM
59
84
0
19 Sep 2021
∞
\infty
∞
-former: Infinite Memory Transformer
Pedro Henrique Martins
Zita Marinho
André F. T. Martins
79
11
0
01 Sep 2021
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
147
720
0
08 Nov 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
201
1,771
0
29 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
216
1,706
0
08 Jun 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
171
4,071
0
10 Apr 2020
Adaptively Sparse Transformers
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
84
256
0
30 Aug 2019
Adaptive Attention Span in Transformers
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
76
285
0
19 May 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
241
3,730
0
09 Jan 2019
Long-Term Feature Banks for Detailed Video Understanding
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
171
480
0
12 Dec 2018
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
101
390
0
28 Sep 2018
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
Urvashi Khandelwal
He He
Peng Qi
Dan Jurafsky
RALM
52
295
0
12 May 2018
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
319
2,876
0
26 Sep 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
413
10,494
0
21 Jul 2016
Neural Turing Machines
Alex Graves
Greg Wayne
Ivo Danihelka
97
2,328
0
20 Oct 2014
Memory Networks
Jason Weston
S. Chopra
Antoine Bordes
GNN
KELM
147
1,706
0
15 Oct 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
558
27,311
0
01 Sep 2014
1