Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.00742
Cited By
Hard-Coded Gaussian Attention for Neural Machine Translation
2 May 2020
Weiqiu You
Simeng Sun
Mohit Iyyer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hard-Coded Gaussian Attention for Neural Machine Translation"
19 / 19 papers shown
Title
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
Michael Hassid
Hao Peng
Daniel Rotem
Jungo Kasai
Ivan Montero
Noah A. Smith
Roy Schwartz
32
24
0
07 Nov 2022
EtriCA: Event-Triggered Context-Aware Story Generation Augmented by Cross Attention
Chen Tang
Chenghua Lin
Hen-Hsen Huang
Frank Guerin
Zhihao Zhang
39
20
0
22 Oct 2022
WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence Learning Ability
Yufan Zhuang
Zihan Wang
Fangbo Tao
Jingbo Shang
ViT
AI4TS
35
3
0
05 Oct 2022
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
Nurullah Sevim
Ege Ozan Özyedek
Furkan Şahinuç
Aykut Koç
35
11
0
26 Sep 2022
Unveiling Transformers with LEGO: a synthetic reasoning task
Yi Zhang
A. Backurs
Sébastien Bubeck
Ronen Eldan
Suriya Gunasekar
Tal Wagner
LRM
36
85
0
09 Jun 2022
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
James Lee-Thorp
Joshua Ainslie
MoE
34
11
0
24 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
Carmelo Scribano
Giorgia Franchini
M. Prato
Marko Bertogna
18
21
0
02 Mar 2022
ETSformer: Exponential Smoothing Transformers for Time-series Forecasting
Gerald Woo
Chenghao Liu
Doyen Sahoo
Akshat Kumar
Guosheng Lin
AI4TS
31
162
0
03 Feb 2022
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
37
5
0
16 Oct 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
47
518
0
09 May 2021
Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation
Mozhdeh Gheini
Xiang Ren
Jonathan May
LRM
31
105
0
18 Apr 2021
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
44
63
0
24 Mar 2021
Snowflake: Scaling GNNs to High-Dimensional Continuous Control via Parameter Freezing
Charlie Blake
Vitaly Kurin
Maximilian Igl
Shimon Whiteson
AI4CE
21
13
0
01 Mar 2021
Position Information in Transformers: An Overview
Philipp Dufter
Martin Schmitt
Hinrich Schütze
13
139
0
22 Feb 2021
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
32
92
0
24 Feb 2020
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,748
0
26 Sep 2016
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,929
0
17 Aug 2015
1