ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.00247
  4. Cited By
Training Tips for the Transformer Model
v1v2 (latest)

Training Tips for the Transformer Model

1 April 2018
Martin Popel
Ondrej Bojar
ArXiv (abs)PDFHTML

Papers citing "Training Tips for the Transformer Model"

50 / 139 papers shown
Title
Training Transformers Together
Training Transformers Together
Alexander Borzunov
Max Ryabinin
Tim Dettmers
Quentin Lhoest
Lucile Saulnier
Michael Diskin
Yacine Jernite
Thomas Wolf
ViT
63
10
0
07 Jul 2022
AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for
  Language Modeling
AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling
Haoqin Tu
Zhongliang Yang
Jinshuai Yang
Yong Huang
42
12
0
12 May 2022
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine
  Translation
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
Idris Abdulmumin
S. Dash
Musa Abdullahi Dawud
Shantipriya Parida
Shamsuddeen Hassan Muhammad
Ibrahim Said Ahmad
Subhadarshi Panda
Ondrej Bojar
B. Galadanci
Bello Shehu Bello
59
18
0
02 May 2022
Gradient Descent, Stochastic Optimization, and Other Tales
Gradient Descent, Stochastic Optimization, and Other Tales
Jun Lu
58
8
0
02 May 2022
Transformers in Time-series Analysis: A Tutorial
Transformers in Time-series Analysis: A Tutorial
Sabeen Ahmed
Ian E. Nielsen
Aakash Tripathi
Shamoon Siddiqui
Ghulam Rasool
R. Ramachandran
AI4TS
88
163
0
28 Apr 2022
Distributionally Robust Models with Parametric Likelihood Ratios
Distributionally Robust Models with Parametric Likelihood Ratios
Paul Michel
Tatsunori Hashimoto
Graham Neubig
OOD
87
18
0
13 Apr 2022
Small Batch Sizes Improve Training of Low-Resource Neural MT
Small Batch Sizes Improve Training of Low-Resource Neural MT
Àlex R. Atrio
Andrei Popescu-Belis
64
6
0
20 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
139
168
0
07 Mar 2022
Speech Emotion Recognition using Self-Supervised Features
Speech Emotion Recognition using Self-Supervised Features
E. Morais
R. Hoory
Weizhong Zhu
Itai Gat
Matheus Damasceno
Hagai Aronowitz
SSLMDE
61
118
0
07 Feb 2022
Compositionality as Lexical Symmetry
Compositionality as Lexical Symmetry
Ekin Akyürek
Jacob Andreas
CoGe
122
8
0
30 Jan 2022
Persformer: A Transformer Architecture for Topological Machine Learning
Persformer: A Transformer Architecture for Topological Machine Learning
Raphael Reinauer
Matteo Caorsi
Nicolas Berkouk
85
15
0
30 Dec 2021
Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues
Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues
M. Moradshahi
Victoria Tsai
Giovanni Campagna
M. Lam
81
16
0
04 Nov 2021
Why don't people use character-level machine translation?
Why don't people use character-level machine translation?
Jindrich Libovický
Helmut Schmid
Alexander Fraser
135
29
0
15 Oct 2021
Text Simplification for Comprehension-based Question-Answering
Text Simplification for Comprehension-based Question-Answering
Tanvi Dadu
Kartikey Pant
Seema Nagar
F. Barbhuiya
Kuntal Dey
55
4
0
28 Sep 2021
Can the Transformer Be Used as a Drop-in Replacement for RNNs in
  Text-Generating GANs?
Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?
Kevin Blin
Andrei Kucharavy
122
2
0
26 Aug 2021
Improving Distinction between ASR Errors and Speech Disfluencies with
  Feature Space Interpolation
Improving Distinction between ASR Errors and Speech Disfluencies with Feature Space Interpolation
Seongmin Park
D. Shin
Sangyoun Paik
Subong Choi
Alena Kazakova
Jihwa Lee
58
1
0
04 Aug 2021
GTNet:Guided Transformer Network for Detecting Human-Object Interactions
GTNet:Guided Transformer Network for Detecting Human-Object Interactions
A S M Iftekhar
Satish Kumar
R. McEver
Suya You
B. S. Manjunath
ViT
165
13
0
02 Aug 2021
Rethinking Adam: A Twofold Exponential Moving Average Approach
Rethinking Adam: A Twofold Exponential Moving Average Approach
Yizhou Wang
Yue Kang
Can Qin
Huan Wang
Yi Xu
Yulun Zhang
Y. Fu
ODL
70
7
0
22 Jun 2021
Distributed Deep Learning in Open Collaborations
Distributed Deep Learning in Open Collaborations
Michael Diskin
Alexey Bukhtiyarov
Max Ryabinin
Lucile Saulnier
Quentin Lhoest
...
Denis Mazur
Ilia Kobelev
Yacine Jernite
Thomas Wolf
Gennady Pekhimenko
FedML
129
59
0
18 Jun 2021
CoMAE: A Multi-factor Hierarchical Framework for Empathetic Response
  Generation
CoMAE: A Multi-factor Hierarchical Framework for Empathetic Response Generation
Chujie Zheng
Yong Liu
Wei Chen
Yongcai Leng
Minlie Huang
98
76
0
18 May 2021
On the Distributional Properties of Adaptive Gradients
On the Distributional Properties of Adaptive Gradients
Z. Zhiyi
Liu Ziyin
48
4
0
15 May 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
363
2,546
0
20 Apr 2021
Three-level Hierarchical Transformer Networks for Long-sequence and
  Multiple Clinical Documents Classification
Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification
Yuqi Si
Kirk Roberts
84
9
0
17 Apr 2021
Domain Adaptation and Multi-Domain Adaptation for Neural Machine
  Translation: A Survey
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
Danielle Saunders
AI4CE
130
91
0
14 Apr 2021
Transformers: "The End of History" for NLP?
Transformers: "The End of History" for NLP?
Anton Chernyavskiy
Dmitry Ilvovsky
Preslav Nakov
117
30
0
09 Apr 2021
Extended Parallel Corpus for Amharic-English Machine Translation
Extended Parallel Corpus for Amharic-English Machine Translation
A. Gezmu
A. Nürnberger
T. Bati
77
17
0
08 Apr 2021
Towards Automated Psychotherapy via Language Modeling
Towards Automated Psychotherapy via Language Modeling
Houjun Liu
AI4MH
94
3
0
05 Apr 2021
Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder
  Translation Models
Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder Translation Models
Lorenzo Lupo
Marco Dinarelli
Laurent Besacier
59
16
0
31 Mar 2021
Prototypical Representation Learning for Relation Extraction
Prototypical Representation Learning for Relation Extraction
Ning Ding
Xiaobin Wang
Yao Fu
Guangwei Xu
Rui Wang
Pengjun Xie
Ying Shen
Fei Huang
Haitao Zheng
Rui Zhang
58
60
0
22 Mar 2021
Enhancing the Transformer Decoder with Transition-based Syntax
Enhancing the Transformer Decoder with Transition-based Syntax
Leshem Choshen
Omri Abend
82
1
0
29 Jan 2021
Optimizing Deeper Transformers on Small Datasets
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
113
69
0
30 Dec 2020
Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
Chen Xu
Bojie Hu
Yufan Jiang
Kai Feng
Zeyang Wang
Shen Huang
Qi Ju
Tong Xiao
Jingbo Zhu
101
22
0
30 Nov 2020
Siamese Tracking with Lingual Object Constraints
Siamese Tracking with Lingual Object Constraints
Maximilian Filtenborg
E. Gavves
D. K. Gupta
41
3
0
23 Nov 2020
Character-level Representations Improve DRS-based Semantic Parsing Even
  in the Age of BERT
Character-level Representations Improve DRS-based Semantic Parsing Even in the Age of BERT
Rik van Noord
Antonio Toral
Johan Bos
64
4
0
09 Nov 2020
Exploiting Neural Query Translation into Cross Lingual Information
  Retrieval
Exploiting Neural Query Translation into Cross Lingual Information Retrieval
Liang Yao
Baosong Yang
Haibo Zhang
Weihua Luo
Boxing Chen
33
12
0
26 Oct 2020
Constraint Translation Candidates: A Bridge between Neural Query
  Translation and Cross-lingual Information Retrieval
Constraint Translation Candidates: A Bridge between Neural Query Translation and Cross-lingual Information Retrieval
Tianchi Bi
Liang Yao
Baosong Yang
Haibo Zhang
Weihua Luo
Boxing Chen
399
15
0
26 Oct 2020
Addressing Exposure Bias With Document Minimum Risk Training: Cambridge
  at the WMT20 Biomedical Translation Task
Addressing Exposure Bias With Document Minimum Risk Training: Cambridge at the WMT20 Biomedical Translation Task
Danielle Saunders
Bill Byrne
44
10
0
11 Oct 2020
Localizing Open-Ontology QA Semantic Parsers in a Day Using Machine
  Translation
Localizing Open-Ontology QA Semantic Parsers in a Day Using Machine Translation
M. Moradshahi
Giovanni Campagna
Sina J. Semnani
Silei Xu
M. Lam
26
1
0
10 Oct 2020
Self-Paced Learning for Neural Machine Translation
Self-Paced Learning for Neural Machine Translation
Boyi Deng
Baosong Yang
Derek F. Wong
Yikai Zhou
Lidia S. Chao
Haibo Zhang
Boxing Chen
137
49
0
09 Oct 2020
Guiding Attention for Self-Supervised Learning with Transformers
Guiding Attention for Self-Supervised Learning with Transformers
Ameet Deshpande
Karthik Narasimhan
69
21
0
06 Oct 2020
Recent Trends in the Use of Deep Learning Models for Grammar Error
  Handling
Recent Trends in the Use of Deep Learning Models for Grammar Error Handling
Mina Naghshnejad
Tarun Joshi
V. Nair
VLM
43
6
0
04 Sep 2020
On the Importance of Local Information in Transformer Based Models
On the Importance of Local Information in Transformer Based Models
Madhura Pande
Aakriti Budhraja
Preksha Nema
Pratyush Kumar
Mitesh M. Khapra
42
2
0
13 Aug 2020
Revisiting Low Resource Status of Indian Languages in Machine
  Translation
Revisiting Low Resource Status of Indian Languages in Machine Translation
Jerin Philip
Shashank Siripragada
Vinay P. Namboodiri
C. V. Jawahar
87
28
0
11 Aug 2020
Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords
Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords
Tom Kocmi
Martin Popel
Ondrej Bojar
47
38
0
06 Jul 2020
Learn Faster and Forget Slower via Fast and Stable Task Adaptation
Learn Faster and Forget Slower via Fast and Stable Task Adaptation
Farshid Varno
Lucas May Petry
Lisa Di-Jorio
Stan Matwin
CLL
60
2
0
02 Jul 2020
ELITR Non-Native Speech Translation at IWSLT 2020
ELITR Non-Native Speech Translation at IWSLT 2020
Dominik Machávcek
Jonávs Kratochvíl
Sangeet Sagar
Matúvs vZilinec
Ondrej Bojar
T. Nguyen
Felix Schneider
P. Williams
Yuekun Yao
55
11
0
05 Jun 2020
Character-level Transformer-based Neural Machine Translation
Character-level Transformer-based Neural Machine Translation
Nikolay Banar
Walter Daelemans
M. Kestemont
47
21
0
22 May 2020
Applying the Transformer to Character-level Transduction
Applying the Transformer to Character-level Transduction
Shijie Wu
Ryan Cotterell
Mans Hulden
AI4CE
68
107
0
20 May 2020
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient
  Direction Change
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
Hongfei Xu
Josef van Genabith
Deyi Xiong
Qiuhui Liu
40
11
0
05 May 2020
A Comprehensive Survey of Grammar Error Correction
A Comprehensive Survey of Grammar Error Correction
Yu Wang
Yuelin Wang
Jie Liu
Zhuowei Liu
118
34
0
02 May 2020
Previous
123
Next