ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.04589
  4. Cited By
Faster Transformer Decoding: N-gram Masked Self-Attention

Faster Transformer Decoding: N-gram Masked Self-Attention

14 January 2020
Ciprian Chelba
Mengzhao Chen
Ankur Bapna
Noam M. Shazeer
ArXiv (abs)PDFHTML

Papers citing "Faster Transformer Decoding: N-gram Masked Self-Attention"

4 / 4 papers shown
Title
Fast Transformer Decoding: One Write-Head is All You Need
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
161
477
0
06 Nov 2019
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence
  Modeling
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Jonathan Shen
Patrick Nguyen
Yonghui Wu
Zhiwen Chen
Mengzhao Chen
...
William Chan
Shubham Toshniwal
Baohua Liao
M. Nirschl
Pat Rondon
VLM
102
211
0
21 Feb 2019
Accelerating Neural Transformer via an Average Attention Network
Accelerating Neural Transformer via an Average Attention Network
Biao Zhang
Deyi Xiong
Jinsong Su
106
120
0
02 May 2018
Image Transformer
Image Transformer
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
ViT
154
1,689
0
15 Feb 2018
1