ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.01787
  4. Cited By
Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

5 June 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
ArXivPDFHTML

Papers citing "Learning Deep Transformer Models for Machine Translation"

50 / 152 papers shown
Title
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Wenjie Hao
Hongfei Xu
Lingling Mu
Hongying Zan
MoE
38
4
0
24 Dec 2022
EIT: Enhanced Interactive Transformer
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
32
2
0
20 Dec 2022
DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding
DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding
Jianhao Yan
Jin Xu
Fandong Meng
Jie Zhou
Yue Zhang
24
3
0
08 Dec 2022
The RoyalFlush System for the WMT 2022 Efficiency Task
The RoyalFlush System for the WMT 2022 Efficiency Task
Bo Qin
Aixin Jia
Qiang Wang
Jian Lu
Shuqin Pan
Haibo Wang
Ming-Tso Chen
49
1
0
03 Dec 2022
Masked Reconstruction Contrastive Learning with Information Bottleneck
  Principle
Masked Reconstruction Contrastive Learning with Information Bottleneck Principle
Ziwen Liu
Bonan li
Congying Han
Tiande Guo
Xuecheng Nie
SSL
34
2
0
15 Nov 2022
Tencent's Multilingual Machine Translation System for WMT22 Large-Scale
  African Languages
Tencent's Multilingual Machine Translation System for WMT22 Large-Scale African Languages
Wenxiang Jiao
Zhaopeng Tu
Jiarui Li
Wenxuan Wang
Jen-tse Huang
Shuming Shi
54
15
0
18 Oct 2022
Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient
  Classification
Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
Muhammad N. ElNokrashy
Badr AlKhamissi
Mona T. Diab
MoMe
25
4
0
30 Sep 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating
  Transformer-based Text Generation
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
72
77
0
22 Sep 2022
Rethinking Personalized Ranking at Pinterest: An End-to-End Approach
Rethinking Personalized Ranking at Pinterest: An End-to-End Approach
Jiajing Xu
Andrew Zhai
Charles R. Rosenberg
25
17
0
18 Sep 2022
Self-Attentive Pooling for Efficient Deep Learning
Self-Attentive Pooling for Efficient Deep Learning
Fang Chen
Gourav Datta
Souvik Kundu
P. Beerel
82
6
0
16 Sep 2022
SATformer: Transformer-Based UNSAT Core Learning
SATformer: Transformer-Based UNSAT Core Learning
Zhengyuan Shi
Min Li
Yi Liu
Sadaf Khan
Junhua Huang
Hui-Ling Zhen
M. Yuan
Qiang Xu
NAI
21
7
0
02 Sep 2022
User-Controllable Latent Transformer for StyleGAN Image Layout Editing
User-Controllable Latent Transformer for StyleGAN Image Layout Editing
Yuki Endo
16
40
0
26 Aug 2022
Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
Xiulong Yang
Sheng-Min Shih
Yinlin Fu
Xiaoting Zhao
Shihao Ji
DiffM
33
56
0
16 Aug 2022
Scene Text Recognition with Permuted Autoregressive Sequence Models
Scene Text Recognition with Permuted Autoregressive Sequence Models
Darwin Bautista
Rowel Atienza
28
169
0
14 Jul 2022
Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth
  Prediction
Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth Prediction
Rizki Ramadhan Fitra
N. Yudistira
W. Mahmudy
27
1
0
10 Jul 2022
Improving Transformer-based Conversational ASR by Inter-Sentential
  Attention Mechanism
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Kun Wei
Pengcheng Guo
Ning Jiang
53
11
0
02 Jul 2022
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Lily H. Zhang
Veronica Tozzo
J. Higgins
Rajesh Ranganath
BDL
MoE
19
16
0
23 Jun 2022
PinnerFormer: Sequence Modeling for User Representation at Pinterest
PinnerFormer: Sequence Modeling for User Representation at Pinterest
Nikil Pancha
Andrew Zhai
J. Leskovec
Charles R. Rosenberg
AI4TS
27
28
0
09 May 2022
Learning to Generalize to More: Continuous Semantic Augmentation for
  Neural Machine Translation
Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation
Xiangpeng Wei
Heng Yu
Yue Hu
Rongxiang Weng
Weihua Luo
Jun Xie
Rong Jin
CLL
17
24
0
14 Apr 2022
TANet: Thread-Aware Pretraining for Abstractive Conversational
  Summarization
TANet: Thread-Aware Pretraining for Abstractive Conversational Summarization
Ze Yang
Liran Wang
Zhoujin Tian
Wei Wu
Zhoujun Li
30
4
0
09 Apr 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
19
14
0
27 Mar 2022
Grounding Commands for Autonomous Vehicles via Layer Fusion with
  Region-specific Dynamic Layer Attention
Grounding Commands for Autonomous Vehicles via Layer Fusion with Region-specific Dynamic Layer Attention
Hou Pong Chan
M. Guo
Chengguang Xu
32
4
0
14 Mar 2022
Towards Personalized Intelligence at Scale
Towards Personalized Intelligence at Scale
Yiping Kang
Ashish Mahendra
Christopher Clarke
Lingjia Tang
Jason Mars
31
1
0
13 Mar 2022
Masked Visual Pre-training for Motor Control
Masked Visual Pre-training for Motor Control
Tete Xiao
Ilija Radosavovic
Trevor Darrell
Jitendra Malik
SSL
34
242
0
11 Mar 2022
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Yufeng Yang
Peidong Wang
DeLiang Wang
22
12
0
01 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
30
157
0
01 Mar 2022
Overcoming a Theoretical Limitation of Self-Attention
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
36
79
0
24 Feb 2022
How to Understand Masked Autoencoders
How to Understand Masked Autoencoders
Shuhao Cao
Peng Xu
David Clifton
29
40
0
08 Feb 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal
  Representation Learning
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
49
238
0
12 Jan 2022
Towards More Efficient Insertion Transformer with Fractional Positional
  Encoding
Towards More Efficient Insertion Transformer with Fractional Positional Encoding
Zhisong Zhang
Yizhe Zhang
W. Dolan
49
0
0
12 Dec 2021
Short and Long Range Relation Based Spatio-Temporal Transformer for
  Micro-Expression Recognition
Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition
Liangfei Zhang
Xiaopeng Hong
Ognjen Arandjelovic
Guoying Zhao
ViT
36
47
0
10 Dec 2021
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence
  Model Tackles All SMAC Tasks
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks
Linghui Meng
Muning Wen
Yaodong Yang
Chenyang Le
Xiyun Li
Weinan Zhang
Ying Wen
Haifeng Zhang
Jun Wang
Bo Xu
OffRL
28
38
0
06 Dec 2021
Visual-Semantic Transformer for Scene Text Recognition
Visual-Semantic Transformer for Scene Text Recognition
Xin Tang
Yongquan Lai
Ying Liu
Yuanyuan Fu
Rui Fang
ViT
33
8
0
02 Dec 2021
Sparse DETR: Efficient End-to-End Object Detection with Learnable
  Sparsity
Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity
Byungseok Roh
Jaewoong Shin
Wuhyun Shin
Saehoon Kim
ViT
24
142
0
29 Nov 2021
RedCaps: web-curated image-text data created by the people, for the
  people
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
31
162
0
22 Nov 2021
Taming Sparsely Activated Transformer with Stochastic Experts
Taming Sparsely Activated Transformer with Stochastic Experts
Simiao Zuo
Xiaodong Liu
Jian Jiao
Young Jin Kim
Hany Hassan
Ruofei Zhang
T. Zhao
Jianfeng Gao
MoE
44
109
0
08 Oct 2021
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Shuo Yang
Le Hou
Xiaodan Song
Qiang Liu
Denny Zhou
110
9
0
08 Oct 2021
A Case Study to Reveal if an Area of Interest has a Trend in Ongoing
  Tweets Using Word and Sentence Embeddings
A Case Study to Reveal if an Area of Interest has a Trend in Ongoing Tweets Using Word and Sentence Embeddings
Ismail Aslan
Y. Topcu
32
0
0
02 Oct 2021
RuleBert: Teaching Soft Rules to Pre-trained Language Models
RuleBert: Teaching Soft Rules to Pre-trained Language Models
Mohammed Saeed
N. Ahmadi
Preslav Nakov
Paolo Papotti
LRM
261
31
0
24 Sep 2021
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
Lihua Qian
Yi Zhou
Zaixiang Zheng
Yaoming Zhu
Zehui Lin
Jiangtao Feng
Shanbo Cheng
Lei Li
Mingxuan Wang
Hao Zhou
31
18
0
23 Sep 2021
The NiuTrans Machine Translation Systems for WMT21
The NiuTrans Machine Translation Systems for WMT21
Yuhao Zhang
Tao Zhou
Bin Wei
Runzhe Cao
Yongyu Mu
...
Weiqiao Shan
Yinqiao Li
Bei Li
Tong Xiao
Jingbo Zhu
32
17
0
22 Sep 2021
The NiuTrans System for WNGT 2020 Efficiency Task
The NiuTrans System for WNGT 2020 Efficiency Task
Chi Hu
Bei Li
Ye Lin
Yinqiao Li
Yanyang Li
Chenglong Wang
Tong Xiao
Jingbo Zhu
25
7
0
16 Sep 2021
The NiuTrans System for the WMT21 Efficiency Task
The NiuTrans System for the WMT21 Efficiency Task
Chenglong Wang
Chi Hu
Yongyu Mu
Zhongxiang Yan
Siming Wu
...
Hang Cao
Bei Li
Ye Lin
Tong Xiao
Jingbo Zhu
29
2
0
16 Sep 2021
Few-Shot Object Detection by Attending to Per-Sample-Prototype
Few-Shot Object Detection by Attending to Per-Sample-Prototype
Hojun Lee
Myunggi Lee
Nojun Kwak
ObjD
33
30
0
16 Sep 2021
Empirical Analysis of Training Strategies of Transformer-based Japanese
  Chit-chat Systems
Empirical Analysis of Training Strategies of Transformer-based Japanese Chit-chat Systems
Hiroaki Sugiyama
M. Mizukami
Tsunehiro Arimoto
Hiromi Narimatsu
Yuya Chiba
Hideharu Nakajima
Toyomi Meguro
129
49
0
11 Sep 2021
Bag of Tricks for Optimizing Transformer Efficiency
Bag of Tricks for Optimizing Transformer Efficiency
Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
34
6
0
09 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal
  Abstractive Summarization
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Tiezheng Yu
Wenliang Dai
Zihan Liu
Pascale Fung
32
73
0
06 Sep 2021
LocalGLMnet: interpretable deep learning for tabular data
LocalGLMnet: interpretable deep learning for tabular data
Ronald Richman
M. Wüthrich
LMTD
FAtt
32
27
0
23 Jul 2021
TAPEX: Table Pre-training via Learning a Neural SQL Executor
TAPEX: Table Pre-training via Learning a Neural SQL Executor
Qian Liu
Bei Chen
Jiaqi Guo
Morteza Ziyadi
Zeqi Lin
Weizhu Chen
Jian-Guang Lou
LMTD
39
259
0
16 Jul 2021
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline
  Task
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu
Xiaoqian Liu
Xiaowen Liu
Laohu Wang
Canan Huang
Tong Xiao
Jingbo Zhu
34
5
0
06 Jul 2021
Previous
1234
Next