Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation

6 February 2025

Abstract

Multilingual neural machine translation (MNMT) aims at using one single model for multiple translation directions. Recent work applies non-autoregressive Transformers to improve the efficiency of MNMT, but requires expensive knowledge distillation (KD) processes. To this end, we propose an M-DAT approach to non-autoregressive multilingual machine translation. Our system leverages the recent advance of the directed acyclic Transformer (DAT), which does not require KD. We further propose a pivot back-translation (PivotBT) approach to improve the generalization to unseen translation directions. Experiments show that our M-DAT achieves state-of-the-art performance in non-autoregressive MNMT.

View on arXiv

@article{huang2025_2502.04537,
  title={ Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation },
  author={ Chenyang Huang and Fei Huang and Zaixiang Zheng and Osmar R. Zaïane and Hao Zhou and Lili Mou },
  journal={arXiv preprint arXiv:2502.04537},
  year={ 2025 }
}

Comments on this paper