ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.01787
  4. Cited By
Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

5 June 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
ArXiv (abs)PDFHTML

Papers citing "Learning Deep Transformer Models for Machine Translation"

50 / 344 papers shown
Title
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Xiao Xu
L. Qin
Wanxiang Che
Min-Yen Kan
MoEVLM
30
0
0
13 Jun 2025
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
Teerapong Panboonyuen
153
0
0
12 Jun 2025
TACTIC: Translation Agents with Cognitive-Theoretic Interactive Collaboration
Weiya Li
Junjie Chen
Bei Li
Boyang Liu
Zichen Wen
...
Xiaoqian Liu
Anping Liu
Huajie Liu
Hu Song
Linfeng Zhang
LLMAG
38
0
0
10 Jun 2025
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
Alessio Giorlandino
Sebastian Goldt
23
0
0
30 May 2025
Transformers Are Universally Consistent
Transformers Are Universally Consistent
Sagar Ghosh
Kushal Bose
Swagatam Das
28
0
0
30 May 2025
Taming Transformer Without Using Learning Rate Warmup
Taming Transformer Without Using Learning Rate Warmup
Xianbiao Qi
Yelin He
Jiaquan Ye
Chun-Guang Li
Bojia Zi
Xili Dai
Qin Zou
Rong Xiao
38
0
0
28 May 2025
IRCopilot: Automated Incident Response with Large Language Models
IRCopilot: Automated Incident Response with Large Language Models
Xihuan Lin
Jie Zhang
Gelei Deng
Tianzhe Liu
Xiaolong Liu
Changcai Yang
Tianwei Zhang
Qing Guo
Riqing Chen
34
0
0
27 May 2025
Combining the Best of Both Worlds: A Method for Hybrid NMT and LLM Translation
Combining the Best of Both Worlds: A Method for Hybrid NMT and LLM Translation
Zhanglin Wu
Daimeng Wei
Xiaoyu Chen
Hengchao Shang
Jiaxin Guo
Zongyao Li
Yuanchang Luo
Jinlong Yang
Zhiqiang Rao
Hao Yang
47
0
0
19 May 2025
A Generative Re-ranking Model for List-level Multi-objective Optimization at Taobao
A Generative Re-ranking Model for List-level Multi-objective Optimization at Taobao
Yue Meng
Cheng Guo
Yi Cao
Tong Liu
Bo Zheng
46
1
0
12 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
131
0
0
02 May 2025
IoT Botnet Detection: Application of Vision Transformer to Classification of Network Flow Traffic
IoT Botnet Detection: Application of Vision Transformer to Classification of Network Flow Traffic
Hassan Wasswa
Timothy Lynar
Aziida Nanyonga
Hussein Abbass
116
3
0
26 Apr 2025
Impact of Latent Space Dimension on IoT Botnet Detection Performance: VAE-Encoder Versus ViT-Encoder
Impact of Latent Space Dimension on IoT Botnet Detection Performance: VAE-Encoder Versus ViT-Encoder
Hassan Wasswa
Aziida Nanyonga
Timothy Lynar
DRL
120
4
0
21 Apr 2025
A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities
A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities
Enzhe Sun
Yongchuan Cui
Peng Liu
Jining Yan
92
1
0
01 Apr 2025
LakotaBERT: A Transformer-based Model for Low Resource Lakota Language
LakotaBERT: A Transformer-based Model for Low Resource Lakota Language
Kanishka Parankusham
Rodrigue Rizk
KC Santosh
92
0
0
23 Mar 2025
"Principal Components" Enable A New Language of Images
Xin Wen
Bingchen Zhao
Ismail Elezi
Jiankang Deng
Xiaojuan Qi
114
1
0
11 Mar 2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo
Tong Zheng
Yongyu Mu
Yangqiu Song
Qinghong Zhang
...
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
AI4CE
510
3
0
09 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
114
0
0
06 Mar 2025
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
Muhammad Waseem Akram
Stefano Dettori
V. Colla
Giorgio Buttazzo
90
0
0
17 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoEAI4CE
141
1
0
13 Feb 2025
Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification
Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification
Zicheng Liu
Siyuan Li
Zhiyuan Chen
Lei Xin
Fang Wu
Chang Yu
Qirong Yang
Yucheng Guo
Yifan Yang
Stan Z. Li
SyDaAI4CE
207
2
0
11 Feb 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
136
7
0
09 Feb 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
157
3
0
31 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
147
0
0
28 Jan 2025
Circuit Complexity Bounds for Visual Autoregressive Model
Circuit Complexity Bounds for Visual Autoregressive Model
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
103
8
0
08 Jan 2025
Paraformer: Parameterization of Sub-grid Scale Processes Using
  Transformers
Paraformer: Parameterization of Sub-grid Scale Processes Using Transformers
Shuochen Wang
Nishant Yadav
A. Ganguly
AI4CE
116
0
0
21 Dec 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
  Post-LN
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li
Lu Yin
Shiwei Liu
116
8
0
18 Dec 2024
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal
  Approach
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach
Vaishnavi Khindkar
V. Balasubramanian
Chetan Arora
A. Subramanian
C. V. Jawahar
116
0
0
20 Nov 2024
Training Neural Networks as Recognizers of Formal Languages
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
122
7
0
11 Nov 2024
Predictor-Corrector Enhanced Transformers with Exponential Moving
  Average Coefficient Learning
Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning
Yangqiu Song
Tong Zheng
Ran Wang
Jiahao Liu
Qingyan Guo
...
Xu Tan
Tong Xiao
Jingbo Zhu
Jiadong Wang
Xunliang Cai
114
2
0
05 Nov 2024
ResiDual Transformer Alignment with Spectral Decomposition
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
150
2
0
31 Oct 2024
Scalable Message Passing Neural Networks: No Need for Attention in Large
  Graph Representation Learning
Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning
Haitz Sáez de Ocáriz Borde
Artem Lukoianov
Anastasis Kratsios
Michael M. Bronstein
Xiaowen Dong
GNN
73
2
0
29 Oct 2024
A Temporal Linear Network for Time Series Forecasting
A Temporal Linear Network for Time Series Forecasting
Remi Genet
Hugo Inzirillo
AI4TS
72
3
0
28 Oct 2024
PESFormer: Boosting Macro- and Micro-expression Spotting with Direct
  Timestamp Encoding
PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding
Wang-Wang Yu
Kai-Fu Yang
Xiangrui Hu
Jingwen Jiang
Hong-Mei Yan
Yong-Jie Li
62
0
0
24 Oct 2024
Is Smoothness the Key to Robustness? A Comparison of Attention and
  Convolution Models Using a Novel Metric
Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric
Baiyuan Chen
MLT
90
0
0
23 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
141
7
0
14 Oct 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation
  Learning
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Siyuan Li
Juanxi Tian
Zedong Wang
Luyuan Zhang
Zicheng Liu
Weiyang Jin
Yang Liu
Baigui Sun
Stan Z. Li
95
0
0
08 Oct 2024
Multilingual Transfer and Domain Adaptation for Low-Resource Languages
  of Spain
Multilingual Transfer and Domain Adaptation for Low-Resource Languages of Spain
Yuanchang Luo
Zhanglin Wu
Daimeng Wei
Hengchao Shang
Zongyao Li
...
Shaojun Li
Jinlong Yang
Yuhao Xie
Jiawei Zheng Bin Wei
Hao Yang
40
1
0
24 Sep 2024
HW-TSC's Submission to the CCMT 2024 Machine Translation Tasks
HW-TSC's Submission to the CCMT 2024 Machine Translation Tasks
Zhanglin Wu
Yuanchang Luo
Daimeng Wei
Jiawei Zheng
Bin Wei
...
Jiaxin Guo
Shaojun Li
Mengli Zhu
Ning Xie
Hao Yang
103
1
0
23 Sep 2024
Choose the Final Translation from NMT and LLM hypotheses Using MBR
  Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task
Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task
Zhanglin Wu
Daimeng Wei
Zongyao Li
Hengchao Shang
Jiaxin Guo
Shaojun Li
Zhiqiang Rao
Yuanchang Luo
Ning Xie
Hao Yang
62
5
0
23 Sep 2024
Deep Transfer Learning for Breast Cancer Classification
Deep Transfer Learning for Breast Cancer Classification
Prudence Djagba
J. K. Buwa Mbouobda
28
0
0
05 Sep 2024
PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal
  Remediation
PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation
Junjie Huang
Quanyan Zhu
86
22
0
25 Jul 2024
Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Dinh-Phu Tran
Quoc-Anh Nguyen
Van-Truong Pham
Thi-Thao Tran
ViTMedIm
65
6
0
24 Jul 2024
Progressive Pretext Task Learning for Human Trajectory Prediction
Progressive Pretext Task Learning for Human Trajectory Prediction
Xiaotong Lin
Tianming Liang
Jian-Huang Lai
Jian-Fang Hu
90
8
0
16 Jul 2024
LayerShuffle: Enhancing Robustness in Vision Transformers by Randomizing
  Layer Execution Order
LayerShuffle: Enhancing Robustness in Vision Transformers by Randomizing Layer Execution Order
Matthias Anton Freiberger
Peter Kun
A. Løvlie
Sebastian Risi
84
0
0
05 Jul 2024
Translatotron-V(ison): An End-to-End Model for In-Image Machine
  Translation
Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation
Zhibin Lan
Liqiang Niu
Fandong Meng
Jie Zhou
Min Zhang
Jinsong Su
VLM
69
8
0
03 Jul 2024
Semantically Guided Representation Learning For Action Anticipation
Semantically Guided Representation Learning For Action Anticipation
Anxhelo Diko
D. Avola
Bardh Prenkaj
Federico Fontana
Luigi Cinque
AI4TS
64
3
0
02 Jul 2024
Automata Extraction from Transformers
Automata Extraction from Transformers
Yihao Zhang
Zeming Wei
Meng Sun
AI4CE
80
1
0
08 Jun 2024
On Limitation of Transformer for Learning HMMs
On Limitation of Transformer for Learning HMMs
Jiachen Hu
Qinghua Liu
Chi Jin
95
3
0
06 Jun 2024
Amalgam: A Framework for Obfuscated Neural Network Training on the Cloud
Amalgam: A Framework for Obfuscated Neural Network Training on the Cloud
Sifat Ut Taki
Spyridon Mastorakis
FedML
78
1
0
02 Jun 2024
UnitNorm: Rethinking Normalization for Transformers in Time Series
UnitNorm: Rethinking Normalization for Transformers in Time Series
Nan Huang
C. Kümmerle
Xiang Zhang
AI4TS
70
4
0
24 May 2024
1234567
Next