Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.05242
Cited By
Large Memory Layers with Product Keys
10 July 2019
Guillaume Lample
Alexandre Sablayrolles
MarcÁurelio Ranzato
Ludovic Denoyer
Hervé Jégou
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Memory Layers with Product Keys"
38 / 38 papers shown
Title
Large Memory Network for Recommendation
Hui Lu
Zheng Chai
Y. Zheng
Zhe Chen
Deping Xie
Peng Xu
Xun Zhou
56
0
0
08 Feb 2025
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
82
0
0
03 Feb 2025
An Evolved Universal Transformer Memory
Edoardo Cetin
Qi Sun
Tianyu Zhao
Yujin Tang
167
0
0
17 Oct 2024
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
36
7
0
15 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
31
15
0
28 Sep 2023
Factorizers for Distributed Sparse Block Codes
Michael Hersche
Aleksandar Terzić
G. Karunaratne
Jovin Langenegger
Angeline Pouget
G. Cherubini
Luca Benini
Abu Sebastian
Abbas Rahimi
39
4
0
24 Mar 2023
A Study on ReLU and Softmax in Transformer
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
27
53
0
13 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
Recurrent Memory Transformer
Aydar Bulatov
Yuri Kuratov
Andrey Kravchenko
CLL
13
102
0
14 Jul 2022
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
James Lee-Thorp
Joshua Ainslie
MoE
32
11
0
24 May 2022
Finding patterns in Knowledge Attribution for Transformers
Jeevesh Juneja
Ritu Agarwal
KELM
16
1
0
03 May 2022
NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results
Ren Yang
Radu Timofte
Mei Zheng
Qunliang Xing
Minglang Qiao
...
Yulin Huang
Junying Chen
I. Lee
Sunder Ali Khowaja
Jiseok Yoon
SupR
36
33
0
20 Apr 2022
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Minsu Kim
Jeong Hun Yeo
Yong Man Ro
13
61
0
04 Apr 2022
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
30
173
0
16 Mar 2022
Pruning Self-attentions into Convolutional Layers in Single Path
Haoyu He
Jianfei Cai
Jing Liu
Zizheng Pan
Jing Zhang
Dacheng Tao
Bohan Zhuang
ViT
34
40
0
23 Nov 2021
Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems
Victoria Mingote
A. Miguel
A. O. Giménez
EDUARDO LLEIDA SOLANO
39
10
0
06 Nov 2021
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
34
99
0
25 Oct 2021
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
Zijun Sun
Xiaoya Li
Xiaofei Sun
Yuxian Meng
Xiang Ao
Qing He
Fei Wu
Jiwei Li
SSeg
57
183
0
30 Jun 2021
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
40
815
0
14 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning
Rayhane Mama
Marc S. Tyndel
Hashiam Kadhim
Cole Clifford
Ragavan Thurairatnam
VGen
29
12
0
08 Jun 2021
Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
Emilio Parisotto
Ruslan Salakhutdinov
42
44
0
04 Apr 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
11
2,075
0
11 Jan 2021
Generating Radiology Reports via Memory-driven Transformer
Zhihong Chen
Yan Song
Tsung-Hui Chang
Xiang Wan
MedIm
21
459
0
30 Oct 2020
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zonghai Yao
Liangliang Cao
Huapu Pan
VLM
15
21
0
12 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
28
44
0
11 Oct 2020
Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems
Andrea Madotto
Samuel Cahyawijaya
Genta Indra Winata
Yan Xu
Zihan Liu
Zhaojiang Lin
Pascale Fung
36
59
0
28 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
109
1,102
0
14 Sep 2020
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
Peratham Wiriyathammabhum
23
8
0
21 May 2020
Vector Quantized Contrastive Predictive Coding for Template-based Music Generation
Gaëtan Hadjeres
Léopold Crestel
31
18
0
21 Apr 2020
PIC: Permutation Invariant Convolution for Recognizing Long-range Activities
Noureldien Hussein
E. Gavves
A. Smeulders
VLM
26
13
0
18 Mar 2020
Memory-Based Graph Networks
Amir Hosein Khas Ahmadi
Kaveh Hassani
Parsa Moradi
Leo Lee
Q. Morris
GNN
29
90
0
21 Feb 2020
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
36
1,992
0
10 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
27
48
0
10 Feb 2020
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
13
621
0
13 Nov 2019
CTRL: A Conditional Transformer Language Model for Controllable Generation
N. Keskar
Bryan McCann
L. Varshney
Caiming Xiong
R. Socher
AI4CE
57
1,236
0
11 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,984
0
20 Apr 2018
1