ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.00247
  4. Cited By
Training Tips for the Transformer Model
v1v2 (latest)

Training Tips for the Transformer Model

1 April 2018
Martin Popel
Ondrej Bojar
ArXiv (abs)PDFHTML

Papers citing "Training Tips for the Transformer Model"

39 / 139 papers shown
Title
ESPnet-ST: All-in-One Speech Translation Toolkit
ESPnet-ST: All-in-One Speech Translation Toolkit
Hirofumi Inaguma
Shun Kiyono
Kevin Duh
Shigeki Karita
Nelson Yalta
Tomoki Hayashi
Shinji Watanabe
118
166
0
21 Apr 2020
Understanding the Difficulty of Training Transformers
Understanding the Difficulty of Training Transformers
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
85
258
0
17 Apr 2020
Attend and Decode: 4D fMRI Task State Decoding Using Attention Models
Attend and Decode: 4D fMRI Task State Decoding Using Attention Models
Sam Nguyen
Brenda Ng
Alan Kaplan
Priyadip Ray
63
25
0
10 Apr 2020
On Optimal Transformer Depth for Low-Resource Language Translation
On Optimal Transformer Depth for Low-Resource Language Translation
Elan Van Biljon
Arnu Pretorius
Julia Kreutzer
MoE
62
27
0
09 Apr 2020
Better Sign Language Translation with STMC-Transformer
Better Sign Language Translation with STMC-Transformer
Kayo Yin
Jesse Read
SLR
67
24
0
01 Apr 2020
Disentangling Adaptive Gradient Methods from Learning Rates
Disentangling Adaptive Gradient Methods from Learning Rates
Naman Agarwal
Rohan Anil
Elad Hazan
Tomer Koren
Cyril Zhang
109
38
0
26 Feb 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
160
1,002
0
12 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
348
201
0
07 Feb 2020
Exploring Benefits of Transfer Learning in Neural Machine Translation
Exploring Benefits of Transfer Learning in Neural Machine Translation
Tom Kocmi
60
17
0
06 Jan 2020
Learning Accurate Integer Transformer Machine-Translation Models
Learning Accurate Integer Transformer Machine-Translation Models
Ephrem Wu
53
4
0
03 Jan 2020
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DVAI4TSMedIm
131
332
0
04 Dec 2019
Long-span language modeling for speech recognition
Long-span language modeling for speech recognition
S. Parthasarathy
W. Gale
Xie Chen
George Polovets
Shuangyu Chang
RALM
34
10
0
11 Nov 2019
A Bilingual Generative Transformer for Semantic Sentence Embedding
A Bilingual Generative Transformer for Semantic Sentence Embedding
John Wieting
Graham Neubig
Taylor Berg-Kirkpatrick
78
29
0
10 Nov 2019
Data Diversification: A Simple Strategy For Neural Machine Translation
Data Diversification: A Simple Strategy For Neural Machine Translation
Xuan-Phi Nguyen
Shafiq Joty
Wu Kui
Ai Ti Aw
117
15
0
05 Nov 2019
Promoting the Knowledge of Source Syntax in Transformer NMT Is Not
  Needed
Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed
Thuong-Hai Pham
Dominik Machácek
Ondrej Bojar
53
11
0
24 Oct 2019
Transformers without Tears: Improving the Normalization of
  Self-Attention
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
96
231
0
14 Oct 2019
Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and
  Performance for Low-Resource Machine Translation
Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation
Kenton W. Murray
Jeffery Kinnison
Toan Q. Nguyen
Walter J. Scheirer
David Chiang
63
21
0
01 Oct 2019
Hotel2vec: Learning Attribute-Aware Hotel Embeddings with
  Self-Supervision
Hotel2vec: Learning Attribute-Aware Hotel Embeddings with Self-Supervision
A. Sadeghian
Shervin Minaee
Ioannis Partalas
Xinxin Li
D. Wang
Brooke Cowan
DMLSSL3DV
49
8
0
30 Sep 2019
In-training Matrix Factorization for Parameter-frugal Neural Machine
  Translation
In-training Matrix Factorization for Parameter-frugal Neural Machine Translation
Zachary Kaden
Teven Le Scao
R. Olivier
43
1
0
27 Sep 2019
Efficiency Metrics for Data-Driven Models: A Text Summarization Case
  Study
Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study
Erion Çano
Ondrej Bojar
41
11
0
14 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
94
722
0
13 Sep 2019
CUNI System for the Building Educational Applications 2019 Shared Task:
  Grammatical Error Correction
CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction
Jakub Náplava
Milan Straka
43
7
0
12 Sep 2019
Context-Aware Monolingual Repair for Neural Machine Translation
Context-Aware Monolingual Repair for Neural Machine Translation
Elena Voita
Rico Sennrich
Ivan Titov
79
98
0
03 Sep 2019
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
352
1,916
0
08 Aug 2019
Predicting Actions to Help Predict Translations
Predicting Actions to Help Predict Translations
Zixiu "Alex" Wu
Julia Ive
Josiah Wang
Pranava Madhyastha
Lucia Specia
67
7
0
05 Aug 2019
Retrosynthesis with Attention-Based NMT Model and Chemical Analysis of
  the "Wrong" Predictions
Retrosynthesis with Attention-Based NMT Model and Chemical Analysis of the "Wrong" Predictions
H. Duan
Ling Wang
Chengyun Zhang
Jianjun Li
59
29
0
02 Aug 2019
CUNI System for the WMT19 Robustness Task
CUNI System for the WMT19 Robustness Task
Jindřich Helcl
Jindrich Libovický
Martin Popel
56
10
0
21 Jun 2019
One Epoch Is All You Need
One Epoch Is All You Need
Aran Komatsuzaki
78
51
0
16 Jun 2019
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Alham Fikri Aji
Kenneth Heafield
68
13
0
08 Jun 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
157
1,152
0
23 May 2019
Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI
  Collective Performance during Tensor Accumulation for Parallelized Training
  of Neural Machine Translation Models
Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models
D. Çavdar
V. Codreanu
C. Karakuş
John A. Lockman
Damian Podareanu
...
Quy Ta
S. Varadharajan
Lucas A. Wilson
Rengan Xu
Pei Yang
28
3
0
10 May 2019
Competence-based Curriculum Learning for Neural Machine Translation
Competence-based Curriculum Learning for Neural Machine Translation
Emmanouil Antonios Platanios
Otilia Stretcu
Graham Neubig
Barnabás Póczós
Tom Michael Mitchell
93
344
0
23 Mar 2019
CVIT-MT Systems for WAT-2018
CVIT-MT Systems for WAT-2018
Jerin Philip
Vinay P. Namboodiri
C. V. Jawahar
27
10
0
19 Mar 2019
DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion
DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion
Mor Geva
Eric Malmi
Idan Szpektor
Jonathan Berant
110
52
0
27 Feb 2019
Learning cross-lingual phonological and orthagraphic adaptations: a case
  study in improving neural machine translation between low-resource languages
Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
Saurav Jha
A. Sudhakar
Anil Kumar Singh
43
4
0
21 Nov 2018
Machine Translation between Vietnamese and English: an Empirical Study
Machine Translation between Vietnamese and English: an Empirical Study
Hong-Hai Phan-Vu
Viet-Trung Tran
V. Nguyen
Hoang-Vu Dang
Phan-Thuan Do
45
17
0
30 Oct 2018
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Tom Kocmi
Ondrej Bojar
101
173
0
02 Sep 2018
An Operation Sequence Model for Explainable Neural Machine Translation
An Operation Sequence Model for Explainable Neural Machine Translation
Felix Stahlberg
Danielle Saunders
Bill Byrne
LRMMILM
77
29
0
29 Aug 2018
The University of Cambridge's Machine Translation Systems for WMT18
The University of Cambridge's Machine Translation Systems for WMT18
Felix Stahlberg
Adria de Gispert
Bill Byrne
56
20
0
28 Aug 2018
Previous
123