Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.01787
Cited By
Learning Deep Transformer Models for Machine Translation
5 June 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning Deep Transformer Models for Machine Translation"
50 / 344 papers shown
Title
Examining Scaling and Transfer of Language Model Architectures for Machine Translation
Biao Zhang
Behrooz Ghorbani
Ankur Bapna
Yong Cheng
Xavier Garcia
Jonathan Shen
Orhan Firat
84
23
0
01 Feb 2022
Supervised Visual Attention for Simultaneous Multimodal Machine Translation
Veneta Haralampieva
Ozan Caglayan
Lucia Specia
LRM
75
4
0
23 Jan 2022
Domain Adaptation via Bidirectional Cross-Attention Transformer
Xiyu Wang
Pengxin Guo
Yu Zhang
ViT
79
20
0
15 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
137
254
0
12 Jan 2022
Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models
Zhengzhe Yu
Jiaxin Guo
Minghan Wang
Daimeng Wei
Hengchao Shang
...
Chang Su
Hao Fei
Lizhi Lei
Shimin Tao
Hao Yang
34
3
0
22 Dec 2021
Faster Nearest Neighbor Machine Translation
Shuhe Wang
Jiwei Li
Yuxian Meng
Rongbin Ouyang
Guoyin Wang
Xiaoya Li
Tianwei Zhang
Shi Zong
45
12
0
15 Dec 2021
Towards More Efficient Insertion Transformer with Fractional Positional Encoding
Zhisong Zhang
Yizhe Zhang
W. Dolan
95
0
0
12 Dec 2021
Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition
Liangfei Zhang
Xiaopeng Hong
Ognjen Arandjelovic
Guoying Zhao
ViT
84
58
0
10 Dec 2021
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks
Linghui Meng
Muning Wen
Yaodong Yang
Chenyang Le
Xiyun Li
Weinan Zhang
Ying Wen
Haifeng Zhang
Jun Wang
Bo Xu
OffRL
98
43
0
06 Dec 2021
Visual-Semantic Transformer for Scene Text Recognition
Xin Tang
Yongquan Lai
Ying Liu
Yuanyuan Fu
Rui Fang
ViT
66
9
0
02 Dec 2021
Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity
Byungseok Roh
Jaewoong Shin
Wuhyun Shin
Saehoon Kim
ViT
52
149
0
29 Nov 2021
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
135
169
0
22 Nov 2021
Taming Sparsely Activated Transformer with Stochastic Experts
Simiao Zuo
Xiaodong Liu
Jian Jiao
Young Jin Kim
Hany Hassan
Ruofei Zhang
T. Zhao
Jianfeng Gao
MoE
123
115
0
08 Oct 2021
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Shuo Yang
Le Hou
Xiaodan Song
Qiang Liu
Denny Zhou
150
9
0
08 Oct 2021
A Case Study to Reveal if an Area of Interest has a Trend in Ongoing Tweets Using Word and Sentence Embeddings
Ismail Aslan
Y. Topcu
39
0
0
02 Oct 2021
RuleBert: Teaching Soft Rules to Pre-trained Language Models
Mohammed Saeed
N. Ahmadi
Preslav Nakov
Paolo Papotti
LRM
347
33
0
24 Sep 2021
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
Lihua Qian
Yi Zhou
Zaixiang Zheng
Yaoming Zhu
Zehui Lin
Jiangtao Feng
Shanbo Cheng
Lei Li
Mingxuan Wang
Hao Zhou
89
18
0
23 Sep 2021
The NiuTrans Machine Translation Systems for WMT21
Yuhao Zhang
Tao Zhou
Bin Wei
Runzhe Cao
Yongyu Mu
...
Weiqiao Shan
Yinqiao Li
Bei Li
Tong Xiao
Jingbo Zhu
70
17
0
22 Sep 2021
The NiuTrans System for WNGT 2020 Efficiency Task
Chi Hu
Bei Li
Ye Lin
Yinqiao Li
Yanyang Li
Chenglong Wang
Tong Xiao
Jingbo Zhu
33
7
0
16 Sep 2021
The NiuTrans System for the WMT21 Efficiency Task
Chenglong Wang
Chi Hu
Yongyu Mu
Zhongxiang Yan
Siming Wu
...
Hang Cao
Bei Li
Ye Lin
Tong Xiao
Jingbo Zhu
74
2
0
16 Sep 2021
Few-Shot Object Detection by Attending to Per-Sample-Prototype
Hojun Lee
Myunggi Lee
Nojun Kwak
ObjD
98
32
0
16 Sep 2021
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
Chi Hu
Chenglong Wang
Xiangnan Ma
Xia Meng
Yinqiao Li
Tong Xiao
Jingbo Zhu
Changliang Li
77
11
0
15 Sep 2021
Empirical Analysis of Training Strategies of Transformer-based Japanese Chit-chat Systems
Hiroaki Sugiyama
M. Mizukami
Tsunehiro Arimoto
Hiromi Narimatsu
Yuya Chiba
Hideharu Nakajima
Toyomi Meguro
178
53
0
11 Sep 2021
Bag of Tricks for Optimizing Transformer Efficiency
Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
57
6
0
09 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Tiezheng Yu
Wenliang Dai
Zihan Liu
Pascale Fung
105
74
0
06 Sep 2021
Cross-category Video Highlight Detection via Set-based Learning
Minghao Xu
Hang Wang
Bingbing Ni
Riheng Zhu
Zhenbang Sun
Changhu Wang
71
47
0
26 Aug 2021
Recurrent multiple shared layers in Depth for Neural Machine Translation
Guoliang Li
Yiyang Li
MoE
48
1
0
23 Aug 2021
GTNet:Guided Transformer Network for Detecting Human-Object Interactions
A S M Iftekhar
Satish Kumar
R. McEver
Suya You
B. S. Manjunath
ViT
165
13
0
02 Aug 2021
LocalGLMnet: interpretable deep learning for tabular data
Ronald Richman
M. Wüthrich
LMTD
FAtt
72
32
0
23 Jul 2021
Confidence-Aware Scheduled Sampling for Neural Machine Translation
Yijin Liu
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
83
14
0
22 Jul 2021
TAPEX: Table Pre-training via Learning a Neural SQL Executor
Qian Liu
Bei Chen
Jiaqi Guo
Morteza Ziyadi
Zeqi Lin
Weizhu Chen
Jian-Guang Lou
LMTD
116
269
0
16 Jul 2021
Transformer Network for Significant Stenosis Detection in CCTA of Coronary Arteries
Xin Ma
Gongning Luo
Wei Wang
Kuanquan Wang
ViT
MedIm
53
26
0
07 Jul 2021
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu
Xiaoqian Liu
Xiaowen Liu
Laohu Wang
Canan Huang
Tong Xiao
Jingbo Zhu
72
5
0
06 Jul 2021
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
Yunhe Gao
Mu Zhou
Dimitris N. Metaxas
MedIm
ViT
81
433
0
02 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
104
268
0
01 Jul 2021
Digging Errors in NMT: Evaluating and Understanding Model Errors from Partial Hypothesis Space
Jianhao Yan
Chenming Wu
Fandong Meng
Jie Zhou
ELM
LRM
56
2
0
29 Jun 2021
Early Convolutions Help Transformers See Better
Tete Xiao
Mannat Singh
Eric Mintun
Trevor Darrell
Piotr Dollár
Ross B. Girshick
82
778
0
28 Jun 2021
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails
Ashok Cutkosky
Harsh Mehta
83
62
0
28 Jun 2021
Time-Series Representation Learning via Temporal and Contextual Contrasting
Emadeldeen Eldele
Mohamed Ragab
Zhenghua Chen
Min-man Wu
C. Kwoh
Xiaoli Li
Cuntai Guan
AI4TS
102
517
0
26 Jun 2021
Language Models are Good Translators
Shuo Wang
Zhaopeng Tu
Zhixing Tan
Wenxuan Wang
Maosong Sun
Yang Liu
72
22
0
25 Jun 2021
Revisiting Deep Learning Models for Tabular Data
Yu. V. Gorishniy
Ivan Rubachev
Valentin Khrulkov
Artem Babenko
LMTD
138
782
0
22 Jun 2021
On Adversarial Robustness of Synthetic Code Generation
Mrinal Anand
Pratik Kayal
M. Singh
130
5
0
22 Jun 2021
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Liyuan Liu
Jialu Liu
Jiawei Han
71
33
0
17 Jun 2021
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures
Ivan Chelombiev
Daniel Justus
Douglas Orr
A. Dietrich
Frithjof Gressmann
A. Koliousis
Carlo Luschi
60
5
0
10 Jun 2021
Salient Object Ranking with Position-Preserved Attention
Haoyang Fang
Daoxin Zhang
Yi Zhang
Minghao Chen
Jiawei Li
Yao Hu
Deng Cai
Xiaofei He
71
21
0
09 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
199
1,147
0
08 Jun 2021
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
78
212
0
03 Jun 2021
Luna: Linear Unified Nested Attention
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
89
113
0
03 Jun 2021
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Matthew A. Wright
Joseph E. Gonzalez
84
23
0
02 Jun 2021
You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
Yuxin Fang
Bencheng Liao
Xinggang Wang
Jiemin Fang
Jiyang Qi
Rui Wu
Jianwei Niu
Wenyu Liu
ViT
80
326
0
01 Jun 2021
Previous
1
2
3
4
5
6
7
Next