ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.01787
  4. Cited By
Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

5 June 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
ArXivPDFHTML

Papers citing "Learning Deep Transformer Models for Machine Translation"

50 / 152 papers shown
Title
A Generative Re-ranking Model for List-level Multi-objective Optimization at Taobao
A Generative Re-ranking Model for List-level Multi-objective Optimization at Taobao
Yue Meng
Cheng Guo
Yi Cao
Tong Liu
Bo Zheng
29
0
0
12 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
60
0
0
02 May 2025
IoT Botnet Detection: Application of Vision Transformer to Classification of Network Flow Traffic
IoT Botnet Detection: Application of Vision Transformer to Classification of Network Flow Traffic
Hassan Wasswa
Timothy Lynar
Aziida Nanyonga
Hussein Abbass
58
1
0
26 Apr 2025
Impact of Latent Space Dimension on IoT Botnet Detection Performance: VAE-Encoder Versus ViT-Encoder
Impact of Latent Space Dimension on IoT Botnet Detection Performance: VAE-Encoder Versus ViT-Encoder
Hassan Wasswa
Aziida Nanyonga
Timothy Lynar
DRL
53
2
0
21 Apr 2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo
Tong Zheng
Yongyu Mu
Yangqiu Song
Qinghong Zhang
...
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
AI4CE
245
0
0
09 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
51
0
0
06 Mar 2025
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
Muhammad Waseem Akram
Stefano Dettori
V. Colla
Giorgio Buttazzo
57
0
0
17 Feb 2025
Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification
Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification
Zicheng Liu
Siyuan Li
Zhiyuan Chen
Lei Xin
Fang Wu
Chang Yu
Qirong Yang
Yucheng Guo
Yifan Yang
Stan Z. Li
SyDa
AI4CE
97
0
0
11 Feb 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
75
5
0
09 Feb 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
86
0
0
28 Jan 2025
Circuit Complexity Bounds for Visual Autoregressive Model
Circuit Complexity Bounds for Visual Autoregressive Model
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
45
5
0
08 Jan 2025
Training Neural Networks as Recognizers of Formal Languages
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
44
2
0
11 Nov 2024
ResiDual Transformer Alignment with Spectral Decomposition
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
60
1
0
31 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
41
7
0
14 Oct 2024
Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Dinh-Phu Tran
Quoc-Anh Nguyen
Van-Truong Pham
Thi-Thao Tran
ViT
MedIm
29
5
0
24 Jul 2024
Automata Extraction from Transformers
Automata Extraction from Transformers
Yihao Zhang
Zeming Wei
Meng Sun
AI4CE
45
1
0
08 Jun 2024
PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin
PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin
Stephen Lawrence Bothwell
Brian DuSell
David Chiang
Brian Krostenko
42
0
0
25 Apr 2024
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
Chanyeon Kim
Jongwoon Park
Hyun-sool Bae
Woo Chang Kim
44
3
0
03 Apr 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
31
5
0
29 Mar 2024
OrderBkd: Textual backdoor attack through repositioning
OrderBkd: Textual backdoor attack through repositioning
Irina Alekseevskaia
Konstantin Arkhipenko
30
2
0
12 Feb 2024
Separable Physics-Informed Neural Networks for the solution of
  elasticity problems
Separable Physics-Informed Neural Networks for the solution of elasticity problems
V. A. Es'kin
Danil V. Davydov
Julia V. Guréva
Alexey O. Malkhanov
Mikhail E. Smorkalov
PINN
AI4CE
27
2
0
24 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
35
5
0
09 Jan 2024
An Empirical Study of Scaling Law for OCR
An Empirical Study of Scaling Law for OCR
Miao Rang
Zhenni Bi
Chuanjian Liu
Yunhe Wang
Kai Han
45
6
0
29 Dec 2023
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Asim Khan
Umair Nawaz
K. Lochan
Lakmal D. Seneviratne
Irfan Hussain
MedIm
30
4
0
26 Dec 2023
Transformer-Based Multi-Object Smoothing with Decoupled Data Association
  and Smoothing
Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing
Juliano Pinto
Georg Hess
Yuxuan Xia
H. Wymeersch
Lennart Svensson
VOT
32
3
0
22 Dec 2023
Cached Transformers: Improving Transformers with Differentiable Memory
  Cache
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Zhaoyang Zhang
Wenqi Shao
Yixiao Ge
Xiaogang Wang
Liang Feng
Ping Luo
19
2
0
20 Dec 2023
Who Are All The Stochastic Parrots Imitating? They Should Tell Us!
Who Are All The Stochastic Parrots Imitating? They Should Tell Us!
Sagi Shaier
Lawrence E Hunter
K. Wense
43
3
0
16 Oct 2023
Large-Scale OD Matrix Estimation with A Deep Learning Method
Large-Scale OD Matrix Estimation with A Deep Learning Method
Zheli Xiong
Defu Lian
Enhong Chen
Gang Chen
Xiaomin Cheng
15
0
0
09 Oct 2023
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Liang Shang
Yanli Liu
Zhengyang Lou
Shuxue Quan
N. Adluru
Bochen Guan
W. Sethares
39
2
0
10 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
28
3
0
07 Aug 2023
Layer-wise Representation Fusion for Compositional Generalization
Layer-wise Representation Fusion for Compositional Generalization
Yafang Zheng
Lei Lin
Shantao Liu
Binling Wang
Zhaohong Lai
Wenhao Rao
Biao Fu
Yidong Chen
Xiaodon Shi
AI4CE
50
2
0
20 Jul 2023
3D Medical Image Segmentation based on multi-scale MPU-Net
3D Medical Image Segmentation based on multi-scale MPU-Net
Zeqiu Yu
Shuo Han
Ziheng Song
3DV
14
3
0
11 Jul 2023
Policy-Based Self-Competition for Planning Problems
Policy-Based Self-Competition for Planning Problems
Jonathan Pirnay
Q. Göttl
Jakob Burger
D. G. Grimm
46
3
0
07 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
38
1
0
07 Jun 2023
Revisiting Non-Autoregressive Translation at Scale
Revisiting Non-Autoregressive Translation at Scale
Zhihao Wang
Longyue Wang
Jinsong Su
Junfeng Yao
Zhaopeng Tu
36
3
0
25 May 2023
Learning to Compose Representations of Different Encoder Layers towards
  Improving Compositional Generalization
Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization
Lei Lin
Shuangtao Li
Yafang Zheng
Biao Fu
Shantao Liu
Yidong Chen
Xiaodon Shi
CoGe
29
3
0
20 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech
  Recognition, Translation, and Understanding Tasks
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
31
17
0
18 May 2023
EENED: End-to-End Neural Epilepsy Detection based on Convolutional
  Transformer
EENED: End-to-End Neural Epilepsy Detection based on Convolutional Transformer
Chenyu Liu
Xin-qiu Zhou
Yang Liu
ViT
MedIm
26
1
0
17 May 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine
  Translation
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
38
0
0
10 May 2023
Quantifying the Dissimilarity of Texts
Quantifying the Dissimilarity of Texts
Benjamin Shade
E. Altmann
35
1
0
03 May 2023
Towards Being Parameter-Efficient: A Stratified Sparsely Activated
  Transformer with Dynamic Capacity
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Da Xu
Maha Elbayad
Kenton W. Murray
Jean Maillard
Vedanuj Goswami
MoE
47
3
0
03 May 2023
Just Tell Me: Prompt Engineering in Business Process Management
Just Tell Me: Prompt Engineering in Business Process Management
Kiran Busch
Alexander Rochlitzer
Diana Sola
Henrik Leopold
31
29
0
14 Apr 2023
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language
  Models
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
Emilio Ferrara
SILM
36
248
0
07 Apr 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
46
47
0
21 Mar 2023
Block-wise Bit-Compression of Transformer-based Models
Gaochen Dong
W. Chen
24
0
0
16 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
35
42
0
10 Mar 2023
ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical
  Handwritten Documents
ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents
Sana Khamekhem Jemni
Sourour Ammar
Mohamed Ali Souibgui
Yousri Kessentini
A. Cheddad
23
3
0
06 Mar 2023
Spatial Functa: Scaling Functa to ImageNet Classification and Generation
Spatial Functa: Scaling Functa to ImageNet Classification and Generation
Matthias Bauer
Emilien Dupont
Andy Brock
Dan Rosenbaum
Jonathan Richard Schwarz
Hyunjik Kim
DiffM
36
35
0
06 Feb 2023
Tighter Bounds on the Expressivity of Transformer Encoders
Tighter Bounds on the Expressivity of Transformer Encoders
David Chiang
Peter A. Cholak
A. Pillay
27
53
0
25 Jan 2023
1234
Next