Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.00247
Cited By
v1
v2 (latest)
Training Tips for the Transformer Model
1 April 2018
Martin Popel
Ondrej Bojar
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training Tips for the Transformer Model"
50 / 139 papers shown
Title
Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com
Sergei Krutikov
Bulat Khaertdinov
Rodion Kiriukhin
Shubham Agrawal
Mozhdeh Ariannezhad
Kees Jan de Vries
LMTD
94
0
0
01 Jul 2025
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
Marco Gaido
Sara Papi
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
47
0
0
29 May 2025
A Physics-Inspired Optimizer: Velocity Regularized Adam
Pranav Vaidhyanathan
Lucas Schorling
Natalia Ares
Michael A. Osborne
ODL
64
0
0
19 May 2025
Text Compression for Efficient Language Generation
David Gu
Peter Belcak
Roger Wattenhofer
106
0
0
14 Mar 2025
A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation
Shanhe You
Xuewen Luo
Xinhe Liang
Jiashu Yu
Chen Zheng
Jiangtao Gong
124
0
0
07 Mar 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
152
2
0
26 Feb 2025
A Unified Hyperparameter Optimization Pipeline for Transformer-Based Time Series Forecasting Models
Jingjing Xu
Caesar Wu
Yuan-Fang Li
Grégoire Danoy
Pascal Bouvry
TPM
AI4TS
128
0
0
03 Jan 2025
DBF-Net: A Dual-Branch Network with Feature Fusion for Ultrasound Image Segmentation
Guoping Xu
Ximing Wu
Wentao Liao
Xinglong Wu
Qing Huang
Chang Li
81
0
0
17 Nov 2024
SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba State Space Models
José Ignacio Olalde-Verano
Sascha Kirch
Clara Pérez-Molina
Sergio Martin
Mamba
60
0
0
31 Oct 2024
Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers
Akhilesh Kakolu Ramarao
Kevin Tang
Dinah Baer-Henney
111
0
0
28 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
141
7
0
14 Oct 2024
EPi-cKANs: Elasto-Plasticity Informed Kolmogorov-Arnold Networks Using Chebyshev Polynomials
Farinaz Mostajeran
Salah A Faroughi
106
7
0
12 Oct 2024
Computational design of target-specific linear peptide binders with TransformerBeta
Haowen Zhao
Francesco A. Aprile
Barbara Bravi
77
0
0
07 Oct 2024
The Impact of LoRA Adapters for LLMs on Clinical NLP Classification Under Data Limitations
Thanh-Dung Le
T. Nguyen
Vu Nguyen Ha
Symeon Chatzinotas
P. Jouvet
R. Noumeir
96
0
0
27 Jul 2024
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
Seong-Gyun Leem
Daniel Fulford
J. Onnela
David Gard
Carlos Busso
71
1
0
25 Jul 2024
A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning
Ramakrishna Appicharla
Baban Gain
Santanu Pal
Asif Ekbal
Pushpak Bhattacharyya
56
2
0
03 Jul 2024
When Will Gradient Regularization Be Harmful?
Yang Zhao
Hao Zhang
Xiuyuan Hu
AI4CE
65
1
0
14 Jun 2024
Transformer models classify random numbers
Rishabh Goel
YiZi Xiao
Ramin Ramezani
123
0
0
06 May 2024
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELM
CLL
109
63
0
13 Mar 2024
Search Intenion Network for Personalized Query Auto-Completion in E-Commerce
Wei Bao
Mi Zhang
Tao Zhang
Chengfu Huo
57
1
0
05 Mar 2024
Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation
J. Hu
Roberto Cavicchioli
Giulia Berardinelli
Alessandro Capotondi
75
2
0
26 Dec 2023
Efficient Pre-training for Localized Instruction Generation of Videos
Anil Batra
Davide Moltisanti
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
90
0
0
27 Nov 2023
Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training
Michael Benington
Leo Phan
Chris Pierre Paul
Evan Shoemaker
Priyanka Ranade
Torstein Collett
Grant Hodgson Perez
Christopher Krieger
28
1
0
09 Oct 2023
Investigating Efficient Deep Learning Architectures For Side-Channel Attacks on AES
Yohai-Eliel Berreby
L. Sauvage
AAML
40
2
0
22 Sep 2023
SkinDistilViT: Lightweight Vision Transformer for Skin Lesion Classification
Vlad-Constantin Lungu-Stan
Dumitru-Clementin Cercel
Florin-Catalin Pop
MedIm
33
12
0
16 Aug 2023
A Case Study on Context Encoding in Multi-Encoder based Document-Level Neural Machine Translation
Ramakrishna Appicharla
Baban Gain
Santanu Pal
Asif Ekbal
64
1
0
11 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
59
0
0
05 Aug 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
70
4
0
02 Jul 2023
Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
Peggy Tang
Junbin Gao
Lei Zhang
Zhiyong Wang
55
2
0
06 Jun 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
Ibrahim Said Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
92
7
0
28 May 2023
DPFormer: Learning Differentially Private Transformer on Long-Tailed Data
Youlong Ding
Xueyang Wu
Hongya Wang
Weike Pan
100
1
0
28 May 2023
Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging
Fabian David Schmidt
Ivan Vulić
Goran Glavaš
75
9
0
26 May 2023
Spatiotemporal Transformer for Stock Movement Prediction
Daniel Boyle
Jugal Kalita
AI4TS
42
2
0
05 May 2023
eWaSR -- an embedded-compute-ready maritime obstacle detection network
Matija Tersek
Lojze Žust
Matej Kristan
71
10
0
21 Apr 2023
Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs
Da Silva Gameiro Henrique
Andrei Kucharavy
R. Guerraoui
DeLMO
76
8
0
18 Apr 2023
A Neural Network Transformer Model for Composite Microstructure Homogenization
Emil Pitz
K. Pochiraju
AI4CE
70
10
0
16 Apr 2023
Training Strategies for Vision Transformers for Object Detection
Apoorv Singh
73
4
0
05 Apr 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
128
5
0
22 Mar 2023
NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation
Quchen Fu
Zhongwei Teng
Marco Georgaklis
Jules White
C. Schmidt
40
7
0
15 Feb 2023
Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation
Lorenzo Lupo
Marco Dinarelli
Laurent Besacier
147
9
0
13 Feb 2023
Curriculum-Guided Abstractive Summarization
Sajad Sotudeh
Hanieh Deilamsalehy
Franck Dernoncourt
Nazli Goharian
84
2
0
02 Feb 2023
Curriculum-guided Abstractive Summarization for Mental Health Online Posts
Sajad Sotudeh
Nazli Goharian
Hanieh Deilamsalehy
Franck Dernoncourt
AI4MH
15
5
0
02 Feb 2023
Tackling Low-Resourced Sign Language Translation: UPC at WMT-SLT 22
Laia Tarrés
Gerard I. Gállego
Xavier Giró-i-Nieto
Jordi Torres
SLR
81
5
0
02 Dec 2022
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingcheng Song
Di Wu
Binbin Zhang
Zhiyong Wu
Wenpeng Li
...
Peng Zhang
Zhendong Peng
Fuping Pan
Changbao Zhu
Zhongqin Wu
51
2
0
31 Oct 2022
Focused Concatenation for Context-Aware Neural Machine Translation
Lorenzo Lupo
Marco Dinarelli
Laurent Besacier
57
8
0
24 Oct 2022
Revisiting Checkpoint Averaging for Neural Machine Translation
Yingbo Gao
Christian Herold
Zijian Yang
Hermann Ney
MoMe
139
12
0
21 Oct 2022
A Deep Investigation of RNN and Self-attention for the Cyrillic-Traditional Mongolian Bidirectional Conversion
Muhan Na
Rui Liu
Feilong
Guanglai Gao
40
0
0
24 Sep 2022
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
53
12
0
20 Sep 2022
Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages
Li Miao
Jian Wu
Piyush Behre
Shuangyu Chang
S. Parthasarathy
38
2
0
08 Sep 2022
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
103
54
0
29 Jul 2022
1
2
3
Next