Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.08913
Cited By
Memorizing Transformers
16 March 2022
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Memorizing Transformers"
40 / 140 papers shown
Title
Alternating Updates for Efficient Transformers
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
MoE
48
5
0
30 Jan 2023
Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute
Michiel de Jong
Yury Zemlyanskiy
Nicholas FitzGerald
Joshua Ainslie
Sumit Sanghai
Fei Sha
William W. Cohen
RALM
27
16
0
25 Jan 2023
Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
David M. Chan
Shalini Ghosh
Ariya Rastrow
Björn Hoffmeister
OffRL
18
6
0
06 Jan 2023
Natural Language to Code Generation in Interactive Data Science Notebooks
Pengcheng Yin
Wen-Ding Li
Kefan Xiao
Abhishek Rao
Yeming Wen
...
Paige Bailey
Michele Catasta
Henryk Michalewski
Oleksandr Polozov
Charles Sutton
33
57
0
19 Dec 2022
FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference
Michiel de Jong
Yury Zemlyanskiy
Joshua Ainslie
Nicholas FitzGerald
Sumit Sanghai
Fei Sha
William W. Cohen
VLM
21
32
0
15 Dec 2022
G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks
Zhongwei Wan
Yichun Yin
Wei Zhang
Jiaxin Shi
Lifeng Shang
Guangyong Chen
Xin Jiang
Qun Liu
VLM
CLL
36
16
0
07 Dec 2022
Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer
Zhengbao Jiang
Luyu Gao
Jun Araki
Haibo Ding
Zhiruo Wang
Jamie Callan
Graham Neubig
RALM
27
40
0
05 Dec 2022
Learning Label Modular Prompts for Text Classification in the Wild
Hailin Chen
Amrita Saha
Shafiq R. Joty
Steven C. H. Hoi
OOD
VLM
18
5
0
30 Nov 2022
Efficient Transformers with Dynamic Token Pooling
Piotr Nawrot
J. Chorowski
Adrian Lañcucki
E. Ponti
20
42
0
17 Nov 2022
Token Turing Machines
Michael S. Ryoo
K. Gopalakrishnan
Kumara Kahatapitiya
Ted Xiao
Kanishka Rao
Austin Stone
Yao Lu
Julian Ibarz
Anurag Arnab
27
21
0
16 Nov 2022
You can't pick your neighbors, or can you? When and how to rely on retrieval in the
k
k
k
NN-LM
Andrew Drozdov
Shufan Wang
Razieh Rahimi
Andrew McCallum
Hamed Zamani
Mohit Iyyer
RALM
119
17
0
28 Oct 2022
Nearest Neighbor Language Models for Stylistic Controllable Generation
Severino Trotta
Lucie Flek
Charles F Welch
16
4
0
27 Oct 2022
Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters
Hongyu Zhao
Hao Tan
Hongyuan Mei
MoE
33
16
0
18 Oct 2022
OpenCQA: Open-ended Question Answering with Charts
Shankar Kantharaj
Do Xuan Long
Rixie Tiffany Ko Leong
J. Tan
Enamul Hoque
Shafiq R. Joty
29
47
0
12 Oct 2022
A Memory Transformer Network for Incremental Learning
Ahmet Iscen
Thomas Bird
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLL
121
14
0
10 Oct 2022
Memory in humans and deep language models: Linking hypotheses for model augmentation
Omri Raccah
Pheobe Chen
Ted Willke
David Poeppel
Vy A. Vo
RALM
23
1
0
04 Oct 2022
Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling
Qingyang Wu
Zhou Yu
RALM
19
0
0
15 Sep 2022
Retrieval-based Controllable Molecule Generation
Zichao Wang
Weili Nie
Zhuoran Qiao
Chaowei Xiao
Richard Baraniuk
Anima Anandkumar
24
36
0
23 Aug 2022
Retrieval-Augmented Transformer for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
24
57
0
26 Jul 2022
Confident Adaptive Language Modeling
Tal Schuster
Adam Fisch
Jai Gupta
Mostafa Dehghani
Dara Bahri
Vinh Q. Tran
Yi Tay
Donald Metzler
43
160
0
14 Jul 2022
DocPrompting: Generating Code by Retrieving the Docs
Shuyan Zhou
Uri Alon
Frank F. Xu
Zhiruo Wang
Zhengbao Jiang
Graham Neubig
LLMAG
24
129
0
13 Jul 2022
Embedding Recycling for Language Models
Jon Saad-Falcon
Amanpreet Singh
Luca Soldaini
Mike DÁrcy
Arman Cohan
Doug Downey
KELM
18
4
0
11 Jul 2022
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
37
231
0
27 Jun 2022
Repository-Level Prompt Generation for Large Language Models of Code
Disha Shrivastava
Hugo Larochelle
Daniel Tarlow
28
137
0
26 Jun 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
60
2,344
0
15 Jun 2022
An Empirical Study of Retrieval-enhanced Graph Neural Networks
Dingmin Wang
Shengchao Liu
Hanchen Wang
Bernardo Cuenca Grau
Linfeng Song
Jian Tang
Song Le
Qi Liu
21
0
0
01 Jun 2022
NaturalProver: Grounded Mathematical Proof Generation with Language Models
Sean Welleck
Jiacheng Liu
Ximing Lu
Hannaneh Hajishirzi
Yejin Choi
AIMat
LRM
27
65
0
25 May 2022
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
239
128
0
25 May 2022
Autoformalization with Large Language Models
Yuhuai Wu
Albert Q. Jiang
Wenda Li
M. Rabe
Charles Staats
M. Jamnik
Christian Szegedy
AI4CE
110
157
0
25 May 2022
Can deep learning match the efficiency of human visual long-term memory in storing object details?
Emin Orhan
VLM
OCL
25
0
0
27 Apr 2022
Semi-Parametric Neural Image Synthesis
A. Blattmann
Robin Rombach
Kaan Oktay
Jonas Muller
Bjorn Ommer
DiffM
33
28
0
25 Apr 2022
ChapterBreak: A Challenge Dataset for Long-Range Language Models
Simeng Sun
Katherine Thai
Mohit Iyyer
12
19
0
22 Apr 2022
KNN-Diffusion: Image Generation via Large-Scale Retrieval
Shelly Sheynin
Oron Ashual
Adam Polyak
Uriel Singer
Oran Gafni
Eliya Nachmani
Yaniv Taigman
VLM
SyDa
DiffM
21
113
0
06 Apr 2022
Fine-tuning Image Transformers using Learnable Memory
Mark Sandler
A. Zhmoginov
Max Vladymyrov
Andrew Jackson
ViT
29
47
0
29 Mar 2022
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
23
94
0
11 Mar 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
43
65
0
15 Feb 2022
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
112
41
0
25 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
78
77
0
12 Jul 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,015
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
246
580
0
12 Mar 2020
Previous
1
2
3