Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.05895
Cited By
Transformers without Tears: Improving the Normalization of Self-Attention
14 October 2019
Toan Q. Nguyen
Julian Salazar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers without Tears: Improving the Normalization of Self-Attention"
49 / 149 papers shown
Title
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,089
0
08 Jun 2021
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
227
0
31 May 2021
Fast Nearest Neighbor Machine Translation
Yuxian Meng
Xiaoya Li
Xiayu Zheng
Fei Wu
Xiaofei Sun
Tianwei Zhang
Jiwei Li
LRM
19
49
0
30 May 2021
Rethinking Skip Connection with Layer Normalization in Transformers and ResNets
Fenglin Liu
Xuancheng Ren
Zhiyuan Zhang
Xu Sun
Yuexian Zou
AI4CE
34
67
0
15 May 2021
BERT Busters: Outlier Dimensions that Disrupt Transformers
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
24
85
0
14 May 2021
Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms
Ryoto Ishizuka
Ryo Nishikimi
Kazuyoshi Yoshii
32
6
0
12 May 2021
Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution
Toan Q. Nguyen
Kenton W. Murray
David Chiang
16
15
0
04 May 2021
Lessons on Parameter Sharing across Layers in Transformers
Sho Takase
Shun Kiyono
25
85
0
13 Apr 2021
Joint Universal Syntactic and Semantic Parsing
Elias Stengel-Eskin
Kenton W. Murray
Sheng Zhang
Aaron Steven White
Benjamin Van Durme
38
9
0
12 Apr 2021
Non-autoregressive Transformer-based End-to-end ASR using BERT
Fu-Hao Yu
Kuan-Yu Chen
27
23
0
10 Apr 2021
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Axel Berg
Mark O'Connor
M. T. Cruz
27
133
0
01 Apr 2021
Pretraining the Noisy Channel Model for Task-Oriented Dialogue
Qi Liu
Lei Yu
Laura Rimell
Phil Blunsom
47
26
0
18 Mar 2021
Visual Cues and Error Correction for Translation Robustness
Zhenhao Li
Marek Rei
Lucia Specia
20
3
0
12 Mar 2021
Remote Sensing Image Change Detection with Transformers
Hao Chen
Zipeng Qi
Zhenwei Shi
ViT
50
946
0
27 Feb 2021
TransMask: A Compact and Fast Speech Separation Model Based on Transformer
Zining Zhang
Bingsheng He
Zhenjie Zhang
36
21
0
19 Feb 2021
Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT
Ye Bai
Jiangyan Yi
J. Tao
Zhengkun Tian
Zhengqi Wen
Shuai Zhang
RALM
33
51
0
15 Feb 2021
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
24
69
0
30 Dec 2020
Spatial Temporal Transformer Network for Skeleton-based Action Recognition
Chiara Plizzari
Marco Cannici
Matteo Matteucci
ViT
22
195
0
11 Dec 2020
ÚFAL at MRP 2020: Permutation-invariant Semantic Parsing in PERIN
David Samuel
Milan Straka
LRM
27
31
0
02 Nov 2020
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
Hang Le
J. Pino
Changhan Wang
Jiatao Gu
D. Schwab
Laurent Besacier
39
82
0
02 Nov 2020
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Minjia Zhang
Yuxiong He
AI4CE
13
100
0
26 Oct 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Ethan A. Chi
Julian Salazar
Katrin Kirchhoff
AI4TS
25
51
0
24 Oct 2020
Beyond English-Centric Multilingual Machine Translation
Angela Fan
Shruti Bhosale
Holger Schwenk
Zhiyi Ma
Ahmed El-Kishky
...
Vitaliy Liptchinsky
Sergey Edunov
Edouard Grave
Michael Auli
Armand Joulin
LRM
41
832
0
21 Oct 2020
Unsupervised Bitext Mining and Translation via Self-trained Contextual Embeddings
Phillip Keung
Julian Salazar
Y. Lu
Noah A. Smith
SSL
27
25
0
15 Oct 2020
Query-Key Normalization for Transformers
Alex Henry
Prudhvi Raj Dachapally
S. Pawar
Yuxuan Chen
17
77
0
08 Oct 2020
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
Lei Huang
Jie Qin
Yi Zhou
Fan Zhu
Li Liu
Ling Shao
AI4CE
12
255
0
27 Sep 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
38
79
0
17 Sep 2020
Very Deep Transformers for Neural Machine Translation
Xiaodong Liu
Kevin Duh
Liyuan Liu
Jianfeng Gao
19
102
0
18 Aug 2020
Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks
Chiara Plizzari
Marco Cannici
Matteo Matteucci
ViT
MedIm
25
300
0
17 Aug 2020
Towards Understanding Label Smoothing
Yi Tian Xu
Yuanhong Xu
Qi Qian
Hao Li
Rong Jin
UQCV
21
40
0
20 Jun 2020
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
14
21
0
19 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
104
3,044
0
16 May 2020
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Ye Bai
Jiangyan Yi
J. Tao
Zhengkun Tian
Zhengqi Wen
Shuai Zhang
RALM
28
41
0
11 May 2020
Language Model Prior for Low-Resource Neural Machine Translation
Christos Baziotis
Barry Haddow
Alexandra Birch
18
53
0
30 Apr 2020
Understanding the Difficulty of Training Transformers
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
14
247
0
17 Apr 2020
Balancing Training for Multilingual Neural Machine Translation
Xinyi Wang
Yulia Tsvetkov
Graham Neubig
16
99
0
14 Apr 2020
On Optimal Transformer Depth for Low-Resource Language Translation
Elan Van Biljon
Arnu Pretorius
Julia Kreutzer
MoE
24
27
0
09 Apr 2020
PowerNorm: Rethinking Batch Normalization in Transformers
Sheng Shen
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
BDL
24
16
0
17 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
30
276
0
10 Mar 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
35
949
0
12 Feb 2020
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
28
0
0
22 Jan 2020
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
49
395
0
11 Dec 2019
A Resource for Computational Experiments on Mapudungun
M. Duan
Carlos Fasola
Sai Krishna Rallabandi
R. Vega
Antonios Anastasopoulos
Lori S. Levin
A. Black
12
8
0
04 Dec 2019
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
22
87
0
10 Nov 2019
Masked Language Model Scoring
Julian Salazar
Davis Liang
Toan Q. Nguyen
Katrin Kirchhoff
19
13
0
31 Oct 2019
Stabilizing Transformers for Reinforcement Learning
Emilio Parisotto
H. F. Song
Jack W. Rae
Razvan Pascanu
Çağlar Gülçehre
...
Aidan Clark
Seb Noury
M. Botvinick
N. Heess
R. Hadsell
OffRL
22
360
0
13 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
59
70
0
09 Oct 2019
Set Functions for Time Series
Max Horn
Michael Moor
Christian Bock
Bastian Alexander Rieck
Karsten M. Borgwardt
AI4TS
38
146
0
26 Sep 2019
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,929
0
17 Aug 2015
Previous
1
2
3