A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation

24 May 2019

Papers citing "A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation"

21 / 21 papers shown

Title
Self-Vocabularizing Training for Neural Machine Translation Pin-Jie Lin Ernie Chang Yangyang Shi Vikas Chandra 71 0 0 18 Mar 2025
Human Evaluation of English--Irish Transformer-Based NMT Séamus Lankford Haithem Afli Andy Way 45 10 0 04 Mar 2024
Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing Vilém Zouhar AAML 40 0 0 29 Jan 2024
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code Nadezhda Chirkova Sergey Troshin 21 8 0 01 Aug 2023
MobileNMT: Enabling Translation in 15MB and 30ms Ye Lin Xiaohui Wang Zhexi Zhang Mingxuan Wang Tong Xiao Jingbo Zhu MQ 38 1 0 07 Jun 2023
Considerations for meaningful sign language machine translation based on glosses Mathias Müller Zifan Jiang Amit Moryossef Annette Rios Gonzales Sarah Ebling SLR 30 38 0 28 Nov 2022
The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT) Faheem Kirefu Vivek Iyer Pinzhen Chen Laurie Burchell MoE 26 1 0 20 Oct 2022
How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training? Shiyue Zhang Vishrav Chaudhary Naman Goyal James Cross Guillaume Wenzek Joey Tianyi Zhou Francisco Guzman 38 16 0 29 Apr 2022
Impact of Tokenization on Language Models: An Analysis for Turkish Cagri Toraman E. Yilmaz Furkan Şahinuç Oguzhan Ozcelik 38 74 0 19 Apr 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation Nishant Kambhatla Logan Born Anoop Sarkar 21 16 0 01 Apr 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech Guangyan Zhang Kaitao Song Xu Tan Daxin Tan Yuzi Yan ... G. Wang Wei Zhou Tao Qin Tan Lee Sheng Zhao SSL 25 21 0 31 Mar 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP Sabrina J. Mielke Zaid Alyafeai Elizabeth Salesky Colin Raffel Manan Dey ... Arun Raja Chenglei Si Wilson Y. Lee Benoît Sagot Samson Tan 32 143 0 20 Dec 2021
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation Long Doan L. T. Nguyen Nguyen Luong Tran T. Hoang Dat Quoc Nguyen 33 22 0 23 Oct 2021
Multi-Sentence Resampling: A Simple Approach to Alleviate Dataset Length Bias and Beam-Search Degradation Ivan Provilkov A. Malinin 25 4 0 13 Sep 2021
Survey of Low-Resource Machine Translation Barry Haddow Rachel Bawden Antonio Valerio Miceli Barone Jindvrich Helcl Alexandra Birch AIMat 39 150 0 01 Sep 2021
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey Danielle Saunders AI4CE 27 86 0 14 Apr 2021
Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem Danielle Saunders Bill Byrne AI4CE 27 137 0 09 Apr 2020
BPE-Dropout: Simple and Effective Subword Regularization Ivan Provilkov Dmitrii Emelianenko Elena Voita 38 276 0 29 Oct 2019
On the use of BERT for Neural Machine Translation S. Clinchant K. Jung Vassilina Nikoulina 27 89 0 27 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhehuai Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 718 6,748 0 26 Sep 2016
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 218 7,929 0 17 Aug 2015