Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.01787
Cited By
Learning Deep Transformer Models for Machine Translation
5 June 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning Deep Transformer Models for Machine Translation"
50 / 344 papers shown
Title
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Liang Shang
Yanli Liu
Zhengyang Lou
Shuxue Quan
N. Adluru
Bochen Guan
W. Sethares
99
2
0
10 Aug 2023
The Prospect of Enhancing Large-Scale Heterogeneous Federated Learning with Transformers
Yulan Gao
Zhaoxiang Hou
Che-Sheng Yang
Zengxiang Li
Han Yu
FedML
76
3
0
07 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
76
3
0
07 Aug 2023
EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning
Dustin Pulver
Prithila Angkan
Paul Hungler
Ali Etemad
84
5
0
01 Aug 2023
Layer-wise Representation Fusion for Compositional Generalization
Yafang Zheng
Lei Lin
Shantao Liu
Binling Wang
Zhaohong Lai
Wenhao Rao
Biao Fu
Yidong Chen
Xiaodon Shi
AI4CE
111
2
0
20 Jul 2023
3D Medical Image Segmentation based on multi-scale MPU-Net
Zeqiu Yu
Shuo Han
Ziheng Song
3DV
43
3
0
11 Jul 2023
NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning
Yun Yi
Haokui Zhang
Rong Xiao
Nan Wang
Xiaoyu Wang
GNN
69
3
0
19 Jun 2023
Understanding Parameter Sharing in Transformers
Ye Lin
Mingxuan Wang
Zhexi Zhang
Xiaohui Wang
Tong Xiao
Jingbo Zhu
MoE
77
2
0
15 Jun 2023
Warpformer: A Multi-scale Modeling Approach for Irregular Clinical Time Series
Jiawen Zhang
Shun Zheng
Wei Cao
Jiang Bian
Jia Li
AI4TS
62
30
0
14 Jun 2023
Neural Machine Translation for the Indigenous Languages of the Americas: An Introduction
Manuel Mager
Rajat Bhatnagar
Graham Neubig
Ngoc Thang Vu
Katharina Kann
93
10
0
11 Jun 2023
Policy-Based Self-Competition for Planning Problems
Jonathan Pirnay
Q. Göttl
Jakob Burger
D. G. Grimm
89
3
0
07 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
63
2
0
07 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
103
28
0
07 Jun 2023
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Xiao Xu
Bei Li
Chenfei Wu
Shao-Yen Tseng
Anahita Bhiwandiwalla
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
AIFin
VLM
70
4
0
31 May 2023
Bridging the Granularity Gap for Acoustic Modeling
Chen Xu
Yuhao Zhang
Chengbo Jiao
Xiaoqian Liu
Chi Hu
Xin Zeng
Tong Xiao
Anxiang Ma
Huizhen Wang
JingBo Zhu
61
6
0
27 May 2023
Revisiting Non-Autoregressive Translation at Scale
Zhihao Wang
Longyue Wang
Jinsong Su
Junfeng Yao
Zhaopeng Tu
74
3
0
25 May 2023
Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification
Saisai Ding
Juncheng Li
Jun Wang
Shihui Ying
Jun Shi
ViT
MedIm
65
10
0
25 May 2023
TriMLP: Revenge of a MLP-like Architecture in Sequential Recommendation
Yiheng Jiang
Yuanbo Xu
Yongjian Yang
Funing Yang
Pengyang Wang
Hui Xiong
92
2
0
24 May 2023
Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization
Lei Lin
Shuangtao Li
Yafang Zheng
Biao Fu
Shantao Liu
Yidong Chen
Xiaodon Shi
CoGe
86
3
0
20 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
97
18
0
18 May 2023
EENED: End-to-End Neural Epilepsy Detection based on Convolutional Transformer
Chenyu Liu
Xin-qiu Zhou
Yang Liu
ViT
MedIm
94
1
0
17 May 2023
Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction
Wang-Cheng Kang
Jianmo Ni
Nikhil Mehta
M. Sathiamoorthy
Lichan Hong
Ed H. Chi
D. Cheng
66
123
0
10 May 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
69
0
0
10 May 2023
BranchNorm: Robustly Scaling Extremely Deep Transformers
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
77
3
0
04 May 2023
Quantifying the Dissimilarity of Texts
Benjamin Shade
E. Altmann
73
1
0
03 May 2023
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Da Xu
Maha Elbayad
Kenton W. Murray
Jean Maillard
Vedanuj Goswami
MoE
62
3
0
03 May 2023
ResiDual: Transformer with Dual Residual Connections
Shufang Xie
Huishuai Zhang
Junliang Guo
Xu Tan
Jiang Bian
Hany Awadalla
Arul Menezes
Tao Qin
Rui Yan
99
19
0
28 Apr 2023
Just Tell Me: Prompt Engineering in Business Process Management
Kiran Busch
Alexander Rochlitzer
Diana Sola
Henrik Leopold
86
29
0
14 Apr 2023
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
Emilio Ferrara
SILM
121
264
0
07 Apr 2023
About optimal loss function for training physics-informed neural networks under respecting causality
V. A. Es'kin
Danil V. Davydov
Ekaterina D. Egorova
Alexey O. Malkhanov
Mikhail A. Akhukov
Mikhail E. Smorkalov
PINN
93
7
0
05 Apr 2023
TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns
Soma Onishi
Kenta Oono
Kohei Hayashi
LMTD
70
16
0
28 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
167
48
0
21 Mar 2023
Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning
Rongxiang Weng
Qiang Wang
Wensen Cheng
Changfeng Zhu
Min Zhang
72
2
0
20 Mar 2023
Block-wise Bit-Compression of Transformer-based Models
Gaochen Dong
W. Chen
132
0
0
16 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
93
47
0
10 Mar 2023
ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents
Sana Khamekhem Jemni
Sourour Ammar
Mohamed Ali Souibgui
Yousri Kessentini
A. Cheddad
86
3
0
06 Mar 2023
Are More Layers Beneficial to Graph Transformers?
Haiteng Zhao
Shuming Ma
Dongdong Zhang
Zhi-Hong Deng
Furu Wei
67
14
0
01 Mar 2023
Policy Dispersion in Non-Markovian Environment
B. Qu
Xiaofeng Cao
Jielong Yang
Hechang Chen
Chang Yi
Ivor W.Tsang
Yew-Soon Ong
56
0
0
28 Feb 2023
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Bobby He
James Martens
Guodong Zhang
Aleksandar Botev
Andy Brock
Samuel L. Smith
Yee Whye Teh
85
30
0
20 Feb 2023
Spatial Functa: Scaling Functa to ImageNet Classification and Generation
Matthias Bauer
Emilien Dupont
Andy Brock
Dan Rosenbaum
Jonathan Richard Schwarz
Hyunjik Kim
DiffM
128
41
0
06 Feb 2023
Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture
Zeping Min
29
0
0
01 Feb 2023
Program Generation from Diverse Video Demonstrations
Anthony Manchin
Jamie Sherrah
Qi Wu
Anton Van Den Hengel
VGen
27
0
0
01 Feb 2023
ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text
Sandra Mitrović
Davide Andreoletti
Omran Ayoub
DeLMO
91
154
0
30 Jan 2023
Tighter Bounds on the Expressivity of Transformer Encoders
David Chiang
Peter A. Cholak
A. Pillay
122
58
0
25 Jan 2023
TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning Problems
Matteo Gallici
Mario Martin
Ivan Masmitja
OffRL
26
10
0
13 Jan 2023
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Wenjie Hao
Hongfei Xu
Lingling Mu
Hongying Zan
MoE
97
4
0
24 Dec 2022
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
119
2
0
20 Dec 2022
DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding
Jianhao Yan
Jin Xu
Fandong Meng
Jie Zhou
Yue Zhang
107
4
0
08 Dec 2022
The RoyalFlush System for the WMT 2022 Efficiency Task
Bo Qin
Aixin Jia
Qiang Wang
Jian Lu
Shuqin Pan
Haibo Wang
Ming-Tso Chen
68
1
0
03 Dec 2022
Masked Reconstruction Contrastive Learning with Information Bottleneck Principle
Ziwen Liu
Bonan li
Congying Han
Tiande Guo
Xuecheng Nie
SSL
64
2
0
15 Nov 2022
Previous
1
2
3
4
5
6
7
Next