Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.00247
Cited By
v1
v2 (latest)
Training Tips for the Transformer Model
1 April 2018
Martin Popel
Ondrej Bojar
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training Tips for the Transformer Model"
39 / 139 papers shown
Title
ESPnet-ST: All-in-One Speech Translation Toolkit
Hirofumi Inaguma
Shun Kiyono
Kevin Duh
Shigeki Karita
Nelson Yalta
Tomoki Hayashi
Shinji Watanabe
118
166
0
21 Apr 2020
Understanding the Difficulty of Training Transformers
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
85
258
0
17 Apr 2020
Attend and Decode: 4D fMRI Task State Decoding Using Attention Models
Sam Nguyen
Brenda Ng
Alan Kaplan
Priyadip Ray
63
25
0
10 Apr 2020
On Optimal Transformer Depth for Low-Resource Language Translation
Elan Van Biljon
Arnu Pretorius
Julia Kreutzer
MoE
62
27
0
09 Apr 2020
Better Sign Language Translation with STMC-Transformer
Kayo Yin
Jesse Read
SLR
67
24
0
01 Apr 2020
Disentangling Adaptive Gradient Methods from Learning Rates
Naman Agarwal
Rohan Anil
Elad Hazan
Tomer Koren
Cyril Zhang
109
38
0
26 Feb 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
160
1,002
0
12 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
348
201
0
07 Feb 2020
Exploring Benefits of Transfer Learning in Neural Machine Translation
Tom Kocmi
60
17
0
06 Jan 2020
Learning Accurate Integer Transformer Machine-Translation Models
Ephrem Wu
53
4
0
03 Jan 2020
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
131
332
0
04 Dec 2019
Long-span language modeling for speech recognition
S. Parthasarathy
W. Gale
Xie Chen
George Polovets
Shuangyu Chang
RALM
34
10
0
11 Nov 2019
A Bilingual Generative Transformer for Semantic Sentence Embedding
John Wieting
Graham Neubig
Taylor Berg-Kirkpatrick
78
29
0
10 Nov 2019
Data Diversification: A Simple Strategy For Neural Machine Translation
Xuan-Phi Nguyen
Shafiq Joty
Wu Kui
Ai Ti Aw
117
15
0
05 Nov 2019
Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed
Thuong-Hai Pham
Dominik Machácek
Ondrej Bojar
53
11
0
24 Oct 2019
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
96
231
0
14 Oct 2019
Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation
Kenton W. Murray
Jeffery Kinnison
Toan Q. Nguyen
Walter J. Scheirer
David Chiang
63
21
0
01 Oct 2019
Hotel2vec: Learning Attribute-Aware Hotel Embeddings with Self-Supervision
A. Sadeghian
Shervin Minaee
Ioannis Partalas
Xinxin Li
D. Wang
Brooke Cowan
DML
SSL
3DV
49
8
0
30 Sep 2019
In-training Matrix Factorization for Parameter-frugal Neural Machine Translation
Zachary Kaden
Teven Le Scao
R. Olivier
43
1
0
27 Sep 2019
Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study
Erion Çano
Ondrej Bojar
41
11
0
14 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
94
722
0
13 Sep 2019
CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction
Jakub Náplava
Milan Straka
43
7
0
12 Sep 2019
Context-Aware Monolingual Repair for Neural Machine Translation
Elena Voita
Rico Sennrich
Ivan Titov
79
98
0
03 Sep 2019
On the Variance of the Adaptive Learning Rate and Beyond
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
352
1,916
0
08 Aug 2019
Predicting Actions to Help Predict Translations
Zixiu "Alex" Wu
Julia Ive
Josiah Wang
Pranava Madhyastha
Lucia Specia
67
7
0
05 Aug 2019
Retrosynthesis with Attention-Based NMT Model and Chemical Analysis of the "Wrong" Predictions
H. Duan
Ling Wang
Chengyun Zhang
Jianjun Li
59
29
0
02 Aug 2019
CUNI System for the WMT19 Robustness Task
Jindřich Helcl
Jindrich Libovický
Martin Popel
56
10
0
21 Jun 2019
One Epoch Is All You Need
Aran Komatsuzaki
78
51
0
16 Jun 2019
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Alham Fikri Aji
Kenneth Heafield
68
13
0
08 Jun 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
157
1,152
0
23 May 2019
Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models
D. Çavdar
V. Codreanu
C. Karakuş
John A. Lockman
Damian Podareanu
...
Quy Ta
S. Varadharajan
Lucas A. Wilson
Rengan Xu
Pei Yang
28
3
0
10 May 2019
Competence-based Curriculum Learning for Neural Machine Translation
Emmanouil Antonios Platanios
Otilia Stretcu
Graham Neubig
Barnabás Póczós
Tom Michael Mitchell
93
344
0
23 Mar 2019
CVIT-MT Systems for WAT-2018
Jerin Philip
Vinay P. Namboodiri
C. V. Jawahar
27
10
0
19 Mar 2019
DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion
Mor Geva
Eric Malmi
Idan Szpektor
Jonathan Berant
110
52
0
27 Feb 2019
Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
Saurav Jha
A. Sudhakar
Anil Kumar Singh
43
4
0
21 Nov 2018
Machine Translation between Vietnamese and English: an Empirical Study
Hong-Hai Phan-Vu
Viet-Trung Tran
V. Nguyen
Hoang-Vu Dang
Phan-Thuan Do
45
17
0
30 Oct 2018
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Tom Kocmi
Ondrej Bojar
101
173
0
02 Sep 2018
An Operation Sequence Model for Explainable Neural Machine Translation
Felix Stahlberg
Danielle Saunders
Bill Byrne
LRM
MILM
77
29
0
29 Aug 2018
The University of Cambridge's Machine Translation Systems for WMT18
Felix Stahlberg
Adria de Gispert
Bill Byrne
56
20
0
28 Aug 2018
Previous
1
2
3