Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.11365
Cited By
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention
29 August 2019
Biao Zhang
Ivan Titov
Rico Sennrich
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention"
27 / 27 papers shown
Title
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
51
0
0
06 Mar 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
83
0
0
28 Jan 2025
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Johnny Jingze Li
V. George
Gabriel A. Silva
ODL
44
0
0
26 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
60
21
0
27 Jun 2024
Delving into Differentially Private Transformer
Youlong Ding
Xueyang Wu
Yining Meng
Yonggang Luo
Hao Wang
Weike Pan
44
5
0
28 May 2024
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Biao Zhang
Barry Haddow
Rico Sennrich
17
3
0
21 Feb 2023
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Wenjie Hao
Hongfei Xu
Lingling Mu
Hongying Zan
MoE
38
4
0
24 Dec 2022
CUNI Submission in WMT22 General Task
Josef Jon
Martin Popel
Ondrej Bojar
4
6
0
29 Nov 2022
Insights into Pre-training via Simpler Synthetic Tasks
Yuhuai Wu
Felix Li
Percy Liang
AIMat
28
20
0
21 Jun 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
Peihao Wang
Wenqing Zheng
Tianlong Chen
Zhangyang Wang
ViT
33
127
0
09 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
28
157
0
01 Mar 2022
CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task
Josef Jon
Michal Novák
João Paulo Aires
Duvsan Varivs
Ondrej Bojar
33
3
0
20 Sep 2021
The NiuTrans System for WNGT 2020 Efficiency Task
Chi Hu
Bei Li
Ye Lin
Yinqiao Li
Yanyang Li
Chenglong Wang
Tong Xiao
Jingbo Zhu
25
7
0
16 Sep 2021
The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
ViT
30
129
0
26 Aug 2021
Tiny Neural Models for Seq2Seq
A. Kandoor
34
0
0
07 Aug 2021
An Efficient Transformer Decoder with Compressed Sub-layers
Yanyang Li
Ye Lin
Tong Xiao
Jingbo Zhu
33
29
0
03 Jan 2021
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
24
69
0
30 Dec 2020
Learning Light-Weight Translation Models from Deep Transformer
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
120
40
0
27 Dec 2020
On the Sub-Layer Functionalities of Transformer Decoder
Yilin Yang
Longyue Wang
Shuming Shi
Prasad Tadepalli
Stefan Lee
Zhaopeng Tu
26
27
0
06 Oct 2020
AutoTrans: Automating Transformer Design via Reinforced Architecture Search
Wei-wei Zhu
Xiaoling Wang
Xipeng Qiu
Yuan Ni
Guotong Xie
30
18
0
04 Sep 2020
Rewiring the Transformer with Depth-Wise LSTMs
Hongfei Xu
Yang Song
Qiuhui Liu
Josef van Genabith
Deyi Xiong
47
6
0
13 Jul 2020
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
Hongfei Xu
Josef van Genabith
Deyi Xiong
Qiuhui Liu
14
10
0
05 May 2020
Multiscale Collaborative Deep Models for Neural Machine Translation
Xiangpeng Wei
Heng Yu
Yue Hu
Yue Zhang
Rongxiang Weng
Weihua Luo
27
28
0
29 Apr 2020
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
22
87
0
10 Nov 2019
Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding
Dhanasekar Sundararaman
Vivek Subramanian
Guoyin Wang
Shijing Si
Dinghan Shen
Dong Wang
Lawrence Carin
19
40
0
10 Nov 2019
Lipschitz Constrained Parameter Initialization for Deep Transformers
Hongfei Xu
Qiuhui Liu
Josef van Genabith
Deyi Xiong
Jingyi Zhang
ODL
12
26
0
08 Nov 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
718
6,748
0
26 Sep 2016
1