Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.01787
Cited By
Learning Deep Transformer Models for Machine Translation
5 June 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning Deep Transformer Models for Machine Translation"
44 / 344 papers shown
Title
Deep Transformers with Latent Depth
Xian Li
Asa Cooper Stickland
Yuqing Tang
X. Kong
71
23
0
28 Sep 2020
Weight Distillation: Transferring the Knowledge in Neural Network Parameters
Ye Lin
Yanyang Li
Ziyang Wang
Bei Li
Quan Du
Tong Xiao
Jingbo Zhu
62
24
0
19 Sep 2020
Towards Fully 8-bit Integer Inference for the Transformer Model
Ye Lin
Yanyang Li
Tengbo Liu
Tong Xiao
Tongran Liu
Jingbo Zhu
MQ
78
63
0
17 Sep 2020
AutoTrans: Automating Transformer Design via Reinforced Architecture Search
Wei-wei Zhu
Xiaoling Wang
Xipeng Qiu
Yuan Ni
Guotong Xie
81
18
0
04 Sep 2020
A Survey of Deep Active Learning
Pengzhen Ren
Yun Xiao
Xiaojun Chang
Po-Yao (Bernie) Huang
Zhihui Li
Brij B. Gupta
Xiaojiang Chen
Xin Wang
138
1,160
0
30 Aug 2020
Very Deep Transformers for Neural Machine Translation
Xiaodong Liu
Kevin Duh
Liyuan Liu
Jianfeng Gao
87
104
0
18 Aug 2020
A Survey of Orthographic Information in Machine Translation
Bharathi Raja Chakravarthi
P. Rani
Mihael Arcan
John P. Mccrae
55
34
0
04 Aug 2020
On Learning Universal Representations Across Languages
Xiangpeng Wei
Rongxiang Weng
Yue Hu
Luxi Xing
Heng Yu
Weihua Luo
SSL
VLM
99
87
0
31 Jul 2020
Neural Machine Translation model for University Email Application
Sandhya Aneja
Siti Nur Afikah Bte Abdul Mazid
Nagender Aneja
16
3
0
20 Jul 2020
Translate Reverberated Speech to Anechoic Ones: Speech Dereverberation with BERT
Yang Jiao
38
1
0
16 Jul 2020
Rewiring the Transformer with Depth-Wise LSTMs
Hongfei Xu
Yang Song
Qiuhui Liu
Josef van Genabith
Deyi Xiong
74
6
0
13 Jul 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
180
1,198
0
30 Jun 2020
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai
Nikolaos Pappas
Hao Peng
James Cross
Noah A. Smith
137
140
0
18 Jun 2020
The Lipschitz Constant of Self-Attention
Hyunjik Kim
George Papamakarios
A. Mnih
92
146
0
08 Jun 2020
Norm-Based Curriculum Learning for Neural Machine Translation
Xuebo Liu
Houtim Lai
Derek F. Wong
Lidia S. Chao
66
120
0
03 Jun 2020
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Z. Yao
A. Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
ODL
152
287
0
01 Jun 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
107
262
0
28 May 2020
Exploring Transformers for Large-Scale Speech Recognition
Liang Lu
Changliang Liu
Jinyu Li
Jiawei Liu
63
41
0
19 May 2020
Many-to-Many Voice Transformer Network
Hirokazu Kameoka
Wen-Chin Huang
Kou Tanaka
Takuhiro Kaneko
Nobukatsu Hojo
Tomoki Toda
ViT
83
30
0
18 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
231
3,177
0
16 May 2020
Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding
Fenglin Liu
Xuancheng Ren
Guangxiang Zhao
Chenyu You
Xuewei Ma
Xian Wu
Xu Sun
77
2
0
16 May 2020
Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?
Nadezhda Chirkova
E. Lobacheva
Dmitry Vetrov
OOD
MoE
51
9
0
14 May 2020
Learning Architectures from an Extended Search Space for Language Modeling
Yinqiao Li
Chi Hu
Yuhao Zhang
Nuo Xu
Yufan Jiang
Tong Xiao
Jingbo Zhu
Tongran Liu
Changliang Li
71
10
0
06 May 2020
On the Inference Calibration of Neural Machine Translation
Shuo Wang
Zhaopeng Tu
Shuming Shi
Yang Liu
113
83
0
03 May 2020
A Transformer-based Approach for Source Code Summarization
Wasi Uddin Ahmad
Saikat Chakraborty
Baishakhi Ray
Kai-Wei Chang
ViT
110
392
0
01 May 2020
Multiscale Collaborative Deep Models for Neural Machine Translation
Xiangpeng Wei
Heng Yu
Yue Hu
Yue Zhang
Rongxiang Weng
Weihua Luo
121
29
0
29 Apr 2020
Understanding the Difficulty of Training Transformers
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
85
258
0
17 Apr 2020
Neural Machine Translation: Challenges, Progress and Future
Jiajun Zhang
Chengqing Zong
56
55
0
13 Apr 2020
Detecting and Understanding Generalization Barriers for Neural Machine Translation
Guanlin Li
Lemao Liu
Conghui Zhu
Tiejun Zhao
Shuming Shi
28
0
0
05 Apr 2020
PowerNorm: Rethinking Batch Normalization in Transformers
Sheng Shen
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
BDL
114
16
0
17 Mar 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
138
151
0
26 Feb 2020
A Survey of Deep Learning Techniques for Neural Machine Translation
Shu Yang
Yuxin Wang
Xiaowen Chu
VLM
AI4TS
AI4CE
106
140
0
18 Feb 2020
Neural Machine Translation with Joint Representation
Yanyang Li
Qiang Wang
Tong Xiao
Tongran Liu
Jingbo Zhu
28
9
0
16 Feb 2020
Transformers as Soft Reasoners over Language
Peter Clark
Oyvind Tafjord
Kyle Richardson
ReLM
OffRL
LRM
133
362
0
14 Feb 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
160
1,002
0
12 Feb 2020
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Qian Zhang
Han Lu
Hasim Sak
Anshuman Tripathi
Erik McDermott
Stephen Koo
Shankar Kumar
108
482
0
07 Feb 2020
Learning Accurate Integer Transformer Machine-Translation Models
Ephrem Wu
53
4
0
03 Jan 2020
Multi-Graph Transformer for Free-Hand Sketch Recognition
Peng Xu
Chaitanya K. Joshi
Xavier Bresson
ViT
115
87
0
24 Dec 2019
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
111
401
0
11 Dec 2019
Character-based NMT with Transformer
Rohit Gupta
Laurent Besacier
Marc Dymetman
Matthias Gallé
84
24
0
12 Nov 2019
Lipschitz Constrained Parameter Initialization for Deep Transformers
Hongfei Xu
Qiuhui Liu
Josef van Genabith
Deyi Xiong
Jingyi Zhang
ODL
98
26
0
08 Nov 2019
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
96
231
0
14 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
130
597
0
25 Sep 2019
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention
Biao Zhang
Ivan Titov
Rico Sennrich
62
103
0
29 Aug 2019
Previous
1
2
3
4
5
6
7