Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2001.04589
Cited By
Faster Transformer Decoding: N-gram Masked Self-Attention
14 January 2020
Ciprian Chelba
Mengzhao Chen
Ankur Bapna
Noam M. Shazeer
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Faster Transformer Decoding: N-gram Masked Self-Attention"
4 / 4 papers shown
Title
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
161
477
0
06 Nov 2019
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Jonathan Shen
Patrick Nguyen
Yonghui Wu
Zhiwen Chen
Mengzhao Chen
...
William Chan
Shubham Toshniwal
Baohua Liao
M. Nirschl
Pat Rondon
VLM
102
211
0
21 Feb 2019
Accelerating Neural Transformer via an Average Attention Network
Biao Zhang
Deyi Xiong
Jinsong Su
106
120
0
02 May 2018
Image Transformer
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
ViT
154
1,689
0
15 Feb 2018
1