Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.10453
Cited By
A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation
24 May 2019
Shuoyang Ding
Adithya Renduchintala
Kevin Duh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation"
21 / 21 papers shown
Title
Self-Vocabularizing Training for Neural Machine Translation
Pin-Jie Lin
Ernie Chang
Yangyang Shi
Vikas Chandra
71
0
0
18 Mar 2025
Human Evaluation of English--Irish Transformer-Based NMT
Séamus Lankford
Haithem Afli
Andy Way
45
10
0
04 Mar 2024
Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
Vilém Zouhar
AAML
40
0
0
29 Jan 2024
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
Nadezhda Chirkova
Sergey Troshin
23
8
0
01 Aug 2023
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
38
1
0
07 Jun 2023
Considerations for meaningful sign language machine translation based on glosses
Mathias Müller
Zifan Jiang
Amit Moryossef
Annette Rios Gonzales
Sarah Ebling
SLR
38
38
0
28 Nov 2022
The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT)
Faheem Kirefu
Vivek Iyer
Pinzhen Chen
Laurie Burchell
MoE
28
1
0
20 Oct 2022
How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?
Shiyue Zhang
Vishrav Chaudhary
Naman Goyal
James Cross
Guillaume Wenzek
Joey Tianyi Zhou
Francisco Guzman
38
16
0
29 Apr 2022
Impact of Tokenization on Language Models: An Analysis for Turkish
Cagri Toraman
E. Yilmaz
Furkan Şahinuç
Oguzhan Ozcelik
38
74
0
19 Apr 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
Nishant Kambhatla
Logan Born
Anoop Sarkar
21
16
0
01 Apr 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Guangyan Zhang
Kaitao Song
Xu Tan
Daxin Tan
Yuzi Yan
...
G. Wang
Wei Zhou
Tao Qin
Tan Lee
Sheng Zhao
SSL
25
21
0
31 Mar 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
34
143
0
20 Dec 2021
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation
Long Doan
L. T. Nguyen
Nguyen Luong Tran
T. Hoang
Dat Quoc Nguyen
33
22
0
23 Oct 2021
Multi-Sentence Resampling: A Simple Approach to Alleviate Dataset Length Bias and Beam-Search Degradation
Ivan Provilkov
A. Malinin
25
4
0
13 Sep 2021
Survey of Low-Resource Machine Translation
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
45
150
0
01 Sep 2021
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
Danielle Saunders
AI4CE
29
86
0
14 Apr 2021
Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem
Danielle Saunders
Bill Byrne
AI4CE
27
137
0
09 Apr 2020
BPE-Dropout: Simple and Effective Subword Regularization
Ivan Provilkov
Dmitrii Emelianenko
Elena Voita
38
276
0
29 Oct 2019
On the use of BERT for Neural Machine Translation
S. Clinchant
K. Jung
Vassilina Nikoulina
27
89
0
27 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhehuai Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
718
6,750
0
26 Sep 2016
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
220
7,930
0
17 Aug 2015
1