ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.01787
  4. Cited By
Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

5 June 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
ArXiv (abs)PDFHTML

Papers citing "Learning Deep Transformer Models for Machine Translation"

50 / 344 papers shown
Title
Examining Scaling and Transfer of Language Model Architectures for
  Machine Translation
Examining Scaling and Transfer of Language Model Architectures for Machine Translation
Biao Zhang
Behrooz Ghorbani
Ankur Bapna
Yong Cheng
Xavier Garcia
Jonathan Shen
Orhan Firat
84
23
0
01 Feb 2022
Supervised Visual Attention for Simultaneous Multimodal Machine
  Translation
Supervised Visual Attention for Simultaneous Multimodal Machine Translation
Veneta Haralampieva
Ozan Caglayan
Lucia Specia
LRM
75
4
0
23 Jan 2022
Domain Adaptation via Bidirectional Cross-Attention Transformer
Domain Adaptation via Bidirectional Cross-Attention Transformer
Xiyu Wang
Pengxin Guo
Yu Zhang
ViT
79
20
0
15 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal
  Representation Learning
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
137
254
0
12 Jan 2022
Joint-training on Symbiosis Networks for Deep Nueral Machine Translation
  models
Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models
Zhengzhe Yu
Jiaxin Guo
Minghan Wang
Daimeng Wei
Hengchao Shang
...
Chang Su
Hao Fei
Lizhi Lei
Shimin Tao
Hao Yang
34
3
0
22 Dec 2021
Faster Nearest Neighbor Machine Translation
Faster Nearest Neighbor Machine Translation
Shuhe Wang
Jiwei Li
Yuxian Meng
Rongbin Ouyang
Guoyin Wang
Xiaoya Li
Tianwei Zhang
Shi Zong
45
12
0
15 Dec 2021
Towards More Efficient Insertion Transformer with Fractional Positional
  Encoding
Towards More Efficient Insertion Transformer with Fractional Positional Encoding
Zhisong Zhang
Yizhe Zhang
W. Dolan
95
0
0
12 Dec 2021
Short and Long Range Relation Based Spatio-Temporal Transformer for
  Micro-Expression Recognition
Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition
Liangfei Zhang
Xiaopeng Hong
Ognjen Arandjelovic
Guoying Zhao
ViT
84
58
0
10 Dec 2021
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence
  Model Tackles All SMAC Tasks
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks
Linghui Meng
Muning Wen
Yaodong Yang
Chenyang Le
Xiyun Li
Weinan Zhang
Ying Wen
Haifeng Zhang
Jun Wang
Bo Xu
OffRL
98
43
0
06 Dec 2021
Visual-Semantic Transformer for Scene Text Recognition
Visual-Semantic Transformer for Scene Text Recognition
Xin Tang
Yongquan Lai
Ying Liu
Yuanyuan Fu
Rui Fang
ViT
66
9
0
02 Dec 2021
Sparse DETR: Efficient End-to-End Object Detection with Learnable
  Sparsity
Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity
Byungseok Roh
Jaewoong Shin
Wuhyun Shin
Saehoon Kim
ViT
52
149
0
29 Nov 2021
RedCaps: web-curated image-text data created by the people, for the
  people
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
135
169
0
22 Nov 2021
Taming Sparsely Activated Transformer with Stochastic Experts
Taming Sparsely Activated Transformer with Stochastic Experts
Simiao Zuo
Xiaodong Liu
Jian Jiao
Young Jin Kim
Hany Hassan
Ruofei Zhang
T. Zhao
Jianfeng Gao
MoE
123
115
0
08 Oct 2021
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Shuo Yang
Le Hou
Xiaodan Song
Qiang Liu
Denny Zhou
150
9
0
08 Oct 2021
A Case Study to Reveal if an Area of Interest has a Trend in Ongoing
  Tweets Using Word and Sentence Embeddings
A Case Study to Reveal if an Area of Interest has a Trend in Ongoing Tweets Using Word and Sentence Embeddings
Ismail Aslan
Y. Topcu
39
0
0
02 Oct 2021
RuleBert: Teaching Soft Rules to Pre-trained Language Models
RuleBert: Teaching Soft Rules to Pre-trained Language Models
Mohammed Saeed
N. Ahmadi
Preslav Nakov
Paolo Papotti
LRM
347
33
0
24 Sep 2021
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
Lihua Qian
Yi Zhou
Zaixiang Zheng
Yaoming Zhu
Zehui Lin
Jiangtao Feng
Shanbo Cheng
Lei Li
Mingxuan Wang
Hao Zhou
89
18
0
23 Sep 2021
The NiuTrans Machine Translation Systems for WMT21
The NiuTrans Machine Translation Systems for WMT21
Yuhao Zhang
Tao Zhou
Bin Wei
Runzhe Cao
Yongyu Mu
...
Weiqiao Shan
Yinqiao Li
Bei Li
Tong Xiao
Jingbo Zhu
70
17
0
22 Sep 2021
The NiuTrans System for WNGT 2020 Efficiency Task
The NiuTrans System for WNGT 2020 Efficiency Task
Chi Hu
Bei Li
Ye Lin
Yinqiao Li
Yanyang Li
Chenglong Wang
Tong Xiao
Jingbo Zhu
33
7
0
16 Sep 2021
The NiuTrans System for the WMT21 Efficiency Task
The NiuTrans System for the WMT21 Efficiency Task
Chenglong Wang
Chi Hu
Yongyu Mu
Zhongxiang Yan
Siming Wu
...
Hang Cao
Bei Li
Ye Lin
Tong Xiao
Jingbo Zhu
74
2
0
16 Sep 2021
Few-Shot Object Detection by Attending to Per-Sample-Prototype
Few-Shot Object Detection by Attending to Per-Sample-Prototype
Hojun Lee
Myunggi Lee
Nojun Kwak
ObjD
98
32
0
16 Sep 2021
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
Chi Hu
Chenglong Wang
Xiangnan Ma
Xia Meng
Yinqiao Li
Tong Xiao
Jingbo Zhu
Changliang Li
77
11
0
15 Sep 2021
Empirical Analysis of Training Strategies of Transformer-based Japanese
  Chit-chat Systems
Empirical Analysis of Training Strategies of Transformer-based Japanese Chit-chat Systems
Hiroaki Sugiyama
M. Mizukami
Tsunehiro Arimoto
Hiromi Narimatsu
Yuya Chiba
Hideharu Nakajima
Toyomi Meguro
178
53
0
11 Sep 2021
Bag of Tricks for Optimizing Transformer Efficiency
Bag of Tricks for Optimizing Transformer Efficiency
Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
57
6
0
09 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal
  Abstractive Summarization
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Tiezheng Yu
Wenliang Dai
Zihan Liu
Pascale Fung
105
74
0
06 Sep 2021
Cross-category Video Highlight Detection via Set-based Learning
Cross-category Video Highlight Detection via Set-based Learning
Minghao Xu
Hang Wang
Bingbing Ni
Riheng Zhu
Zhenbang Sun
Changhu Wang
71
47
0
26 Aug 2021
Recurrent multiple shared layers in Depth for Neural Machine Translation
Recurrent multiple shared layers in Depth for Neural Machine Translation
Guoliang Li
Yiyang Li
MoE
48
1
0
23 Aug 2021
GTNet:Guided Transformer Network for Detecting Human-Object Interactions
GTNet:Guided Transformer Network for Detecting Human-Object Interactions
A S M Iftekhar
Satish Kumar
R. McEver
Suya You
B. S. Manjunath
ViT
165
13
0
02 Aug 2021
LocalGLMnet: interpretable deep learning for tabular data
LocalGLMnet: interpretable deep learning for tabular data
Ronald Richman
M. Wüthrich
LMTDFAtt
72
32
0
23 Jul 2021
Confidence-Aware Scheduled Sampling for Neural Machine Translation
Confidence-Aware Scheduled Sampling for Neural Machine Translation
Yijin Liu
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
83
14
0
22 Jul 2021
TAPEX: Table Pre-training via Learning a Neural SQL Executor
TAPEX: Table Pre-training via Learning a Neural SQL Executor
Qian Liu
Bei Chen
Jiaqi Guo
Morteza Ziyadi
Zeqi Lin
Weizhu Chen
Jian-Guang Lou
LMTD
116
269
0
16 Jul 2021
Transformer Network for Significant Stenosis Detection in CCTA of
  Coronary Arteries
Transformer Network for Significant Stenosis Detection in CCTA of Coronary Arteries
Xin Ma
Gongning Luo
Wei Wang
Kuanquan Wang
ViTMedIm
53
26
0
07 Jul 2021
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline
  Task
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu
Xiaoqian Liu
Xiaowen Liu
Laohu Wang
Canan Huang
Tong Xiao
Jingbo Zhu
72
5
0
06 Jul 2021
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
Yunhe Gao
Mu Zhou
Dimitris N. Metaxas
MedImViT
81
433
0
02 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
104
268
0
01 Jul 2021
Digging Errors in NMT: Evaluating and Understanding Model Errors from
  Partial Hypothesis Space
Digging Errors in NMT: Evaluating and Understanding Model Errors from Partial Hypothesis Space
Jianhao Yan
Chenming Wu
Fandong Meng
Jie Zhou
ELMLRM
56
2
0
29 Jun 2021
Early Convolutions Help Transformers See Better
Early Convolutions Help Transformers See Better
Tete Xiao
Mannat Singh
Eric Mintun
Trevor Darrell
Piotr Dollár
Ross B. Girshick
82
778
0
28 Jun 2021
High-probability Bounds for Non-Convex Stochastic Optimization with
  Heavy Tails
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails
Ashok Cutkosky
Harsh Mehta
83
62
0
28 Jun 2021
Time-Series Representation Learning via Temporal and Contextual
  Contrasting
Time-Series Representation Learning via Temporal and Contextual Contrasting
Emadeldeen Eldele
Mohamed Ragab
Zhenghua Chen
Min-man Wu
C. Kwoh
Xiaoli Li
Cuntai Guan
AI4TS
102
517
0
26 Jun 2021
Language Models are Good Translators
Language Models are Good Translators
Shuo Wang
Zhaopeng Tu
Zhixing Tan
Wenxuan Wang
Maosong Sun
Yang Liu
72
22
0
25 Jun 2021
Revisiting Deep Learning Models for Tabular Data
Revisiting Deep Learning Models for Tabular Data
Yu. V. Gorishniy
Ivan Rubachev
Valentin Khrulkov
Artem Babenko
LMTD
138
782
0
22 Jun 2021
On Adversarial Robustness of Synthetic Code Generation
On Adversarial Robustness of Synthetic Code Generation
Mrinal Anand
Pratik Kayal
M. Singh
130
5
0
22 Jun 2021
Multi-head or Single-head? An Empirical Comparison for Transformer
  Training
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Liyuan Liu
Jialu Liu
Jiawei Han
71
33
0
17 Jun 2021
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
  Structures
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures
Ivan Chelombiev
Daniel Justus
Douglas Orr
A. Dietrich
Frithjof Gressmann
A. Koliousis
Carlo Luschi
60
5
0
10 Jun 2021
Salient Object Ranking with Position-Preserved Attention
Salient Object Ranking with Position-Preserved Attention
Haoyang Fang
Daoxin Zhang
Yi Zhang
Minghao Chen
Jiawei Li
Yao Hu
Deng Cai
Xiaofei He
71
21
0
09 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
199
1,147
0
08 Jun 2021
Anticipative Video Transformer
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
78
212
0
03 Jun 2021
Luna: Linear Unified Nested Attention
Luna: Linear Unified Nested Attention
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
89
113
0
03 Jun 2021
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel
  Machines
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Matthew A. Wright
Joseph E. Gonzalez
84
23
0
02 Jun 2021
You Only Look at One Sequence: Rethinking Transformer in Vision through
  Object Detection
You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
Yuxin Fang
Bencheng Liao
Xinggang Wang
Jiemin Fang
Jiyang Qi
Rui Wu
Jianwei Niu
Wenyu Liu
ViT
80
326
0
01 Jun 2021
Previous
1234567
Next