Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.02150
Cited By
Fast Transformer Decoding: One Write-Head is All You Need
6 November 2019
Noam M. Shazeer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fast Transformer Decoding: One Write-Head is All You Need"
9 / 109 papers shown
Title
Improved Transformer for High-Resolution GANs
Long Zhao
Zizhao Zhang
Ting Chen
Dimitris N. Metaxas
Han Zhang
ViT
34
95
0
14 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
47
520
0
09 May 2021
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
33
200
0
05 Mar 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
281
179
0
17 Feb 2021
Time-based Sequence Model for Personalization and Recommendation Systems
T. Ishkhanov
Maxim Naumov
Xianjie Chen
Yan Zhu
Yuan Zhong
A. Azzolini
Chonglin Sun
Frank Jiang
Andrey Malevich
Liang Xiong
30
17
0
27 Aug 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhehuai Chen
MoE
43
1,116
0
30 Jun 2020
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
27
68
0
26 Nov 2019
Previous
1
2
3