Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.12742
Cited By
Cached Transformers: Improving Transformers with Differentiable Memory Cache
20 December 2023
Zhaoyang Zhang
Wenqi Shao
Yixiao Ge
Xiaogang Wang
Liang Feng
Ping Luo
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Cached Transformers: Improving Transformers with Differentiable Memory Cache"
26 / 26 papers shown
Title
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
122
1,676
0
25 Jun 2021
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
154
1,917
0
29 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
467
21,603
0
25 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
136
834
0
19 Mar 2021
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
87
29
0
01 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
535
3,740
0
24 Feb 2021
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
157
730
0
08 Nov 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
186
1,602
0
30 Sep 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
556
2,103
0
28 Jul 2020
Memory Transformer
Andrey Kravchenko
Yuri Kuratov
Anton Peganov
Grigory V. Sapunov
RALM
70
72
0
20 Jun 2020
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron
Ishan Misra
Julien Mairal
Priya Goyal
Piotr Bojanowski
Armand Joulin
OCL
SSL
264
4,101
0
17 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
216
1,716
0
08 Jun 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
440
13,130
0
26 May 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
179
4,100
0
10 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
337
604
0
12 Mar 2020
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
174
842
0
01 Nov 2019
Learning Deep Transformer Models for Machine Translation
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
80
672
0
05 Jun 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
117
3,156
0
01 Apr 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
260
3,747
0
09 Jan 2019
ListOps: A Diagnostic Dataset for Latent Tree Learning
Nikita Nangia
Samuel R. Bowman
66
138
0
17 Apr 2018
Learning to Remember Translation History with a Continuous Cache
Zhaopeng Tu
Yang Liu
Shuming Shi
Tong Zhang
CLL
70
181
0
26 Nov 2017
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Chen Sun
Abhinav Shrivastava
Saurabh Singh
Abhinav Gupta
VLM
207
2,407
0
10 Jul 2017
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
369
27,253
0
20 Mar 2017
Improving Neural Language Models with a Continuous Cache
Edouard Grave
Armand Joulin
Nicolas Usunier
KELM
66
302
0
13 Dec 2016
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
Kyunghyun Cho
B. V. Merrienboer
Dzmitry Bahdanau
Yoshua Bengio
AI4CE
AIMat
259
6,791
0
03 Sep 2014
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
434
43,832
0
01 May 2014
1