ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.06594
156
0

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

9 March 2025
Yingfeng Luo
Tong Zheng
Yongyu Mu
B. Li
Qinghong Zhang
Yongqi Gao
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
    AI4CE
ArXivPDFHTML
Abstract

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve 2.4∼6.5×2.4 \sim 6.5 \times2.4∼6.5× inference speedups and a 75%75\%75% reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.

View on arXiv
@article{luo2025_2503.06594,
  title={ Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation },
  author={ Yingfeng Luo and Tong Zheng and Yongyu Mu and Bei Li and Qinghong Zhang and Yongqi Gao and Ziqiang Xu and Peinan Feng and Xiaoqian Liu and Tong Xiao and Jingbo Zhu },
  journal={arXiv preprint arXiv:2503.06594},
  year={ 2025 }
}
Comments on this paper