ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.10959
  4. Cited By
Subword Regularization: Improving Neural Network Translation Models with
  Multiple Subword Candidates

Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates

29 April 2018
Taku Kudo
ArXivPDFHTML

Papers citing "Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates"

50 / 617 papers shown
Title
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Marco Cognetta
Tatsuya Hiraoka
Naoaki Okazaki
Rico Sennrich
Yuval Pinter
29
2
0
30 Mar 2024
A Systematic Analysis of Subwords and Cross-Lingual Transfer in
  Multilingual Translation
A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Francois Meyer
Jan Buys
39
1
0
29 Mar 2024
AlloyBERT: Alloy Property Prediction with Large Language Models
AlloyBERT: Alloy Property Prediction with Large Language Models
Akshat Chaudhari
Chakradhar Guntuboina
Hongshuo Huang
A. Farimani
37
4
0
28 Mar 2024
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote
  Sensing Image Understanding
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding
Run Shao
Zhaoyang Zhang
Chao Tao
Yunsheng Zhang
Chengli Peng
Haifeng Li
VLM
43
5
0
27 Mar 2024
Can Language Beat Numerical Regression? Language-Based Multimodal
  Trajectory Prediction
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
Inhwan Bae
Junoh Lee
Hae-Gon Jeon
36
15
0
27 Mar 2024
Provably Secure Disambiguating Neural Linguistic Steganography
Provably Secure Disambiguating Neural Linguistic Steganography
Yuang Qi
Kejiang Chen
Kai Zeng
Weiming Zhang
Neng H. Yu
21
2
0
26 Mar 2024
Cross-lingual Contextualized Phrase Retrieval
Cross-lingual Contextualized Phrase Retrieval
Huayang Li
Deng Cai
Zhi Qu
Qu Cui
Hidetaka Kamigaito
Lemao Liu
Taro Watanabe
34
0
0
25 Mar 2024
Synthetic Data Generation and Joint Learning for Robust Code-Mixed
  Translation
Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation
Kamal Kumar
Yinhan Liu
Parth Patwa
Tanmoy
Mihir Adam Roberts
27
1
0
25 Mar 2024
More than Just Statistical Recurrence: Human and Machine Unsupervised
  Learning of Māori Word Segmentation across Morphological Processes
More than Just Statistical Recurrence: Human and Machine Unsupervised Learning of Māori Word Segmentation across Morphological Processes
A. Varatharaj
Simon Todd
14
0
0
21 Mar 2024
Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced
  Arabic Language Models
Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models
M. Alrefaie
Nour Eldin Morsy
Nada Samir
25
6
0
17 Mar 2024
Using Contextual Information for Sentence-level Morpheme Segmentation
Using Contextual Information for Sentence-level Morpheme Segmentation
Prabin Bhandari
Abhishek Paudel
16
1
0
15 Mar 2024
Token Alignment via Character Matching for Subword Completion
Token Alignment via Character Matching for Subword Completion
Ben Athiwaratkun
Shiqi Wang
Mingyue Shang
Yuchen Tian
Zijian Wang
Sujan Kumar Gonugondla
Sanjay Krishna Gouda
Rob Kwiatowski
Ramesh Nallapati
Bing Xiang
50
4
0
13 Mar 2024
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource
  Agglutinative Data-to-Text Generation
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation
Francois Meyer
Jan Buys
29
2
0
12 Mar 2024
MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki
MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki
Timothee Mickus
Stig-Arne Gronroos
Joseph Attieh
M. Boggia
Ona de Gibert
Shaoxiong Ji
Niki Andreas Lopi
Alessandro Raganato
Raúl Vázquez
Jörg Tiedemann
20
4
0
12 Mar 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation
  with Model Performance
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Omer Goldman
Avi Caciularu
Matan Eyal
Kris Cao
Idan Szpektor
Reut Tsarfaty
51
22
0
10 Mar 2024
Authorship Attribution in Bangla Literature (AABL) via Transfer Learning
  using ULMFiT
Authorship Attribution in Bangla Literature (AABL) via Transfer Learning using ULMFiT
Aisha Khatun
Anisur Rahman
Md. Saiful Islam
Hemayet Ahmed Chowdhury
A. Tasnim
31
2
0
08 Mar 2024
Did Translation Models Get More Robust Without Anyone Even Noticing?
Did Translation Models Get More Robust Without Anyone Even Noticing?
Ben Peters
André F. T. Martins
39
3
0
06 Mar 2024
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron
Ahmet Iscen
Alireza Fathi
Cordelia Schmid
40
5
0
04 Mar 2024
Transformers for Low-Resource Languages:Is Féidir Linn!
Transformers for Low-Resource Languages:Is Féidir Linn!
Séamus Lankford
H. Alfi
Tamás Sarlós
40
17
0
04 Mar 2024
Language and Speech Technology for Central Kurdish Varieties
Language and Speech Technology for Central Kurdish Varieties
Sina Ahmadi
Daban Q. Jaff
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
39
2
0
04 Mar 2024
adaptNMT: an open-source, language-agnostic development environment for
  Neural Machine Translation
adaptNMT: an open-source, language-agnostic development environment for Neural Machine Translation
Séamus Lankford
Haithem Afli
Andy Way
34
3
0
04 Mar 2024
Human Evaluation of English--Irish Transformer-Based NMT
Human Evaluation of English--Irish Transformer-Based NMT
Séamus Lankford
Haithem Afli
Andy Way
42
10
0
04 Mar 2024
VBART: The Turkish LLM
VBART: The Turkish LLM
Meliksah Turker
Mehmet Erdi Ari
Aydin Han
VLM
36
4
0
02 Mar 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
43
14
0
02 Mar 2024
Rethinking Tokenization: Crafting Better Tokenizers for Large Language
  Models
Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models
Jinbiao Yang
LLMAG
105
11
0
01 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent
  on Language Models
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
39
25
0
29 Feb 2024
Beyond Language Models: Byte Models are Digital World Simulators
Beyond Language Models: Byte Models are Digital World Simulators
Shangda Wu
Xu Tan
Zili Wang
Rui Wang
Xiaobing Li
Maosong Sun
35
12
0
29 Feb 2024
CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity
  Detection
CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection
Hao Wang
Zeyu Gao
Chao Zhang
Mingyang Sun
Yuchen Zhou
Han Qiu
Xiangwei Xiao
39
9
0
29 Feb 2024
Tokenization Is More Than Compression
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
61
28
0
28 Feb 2024
Natural Language Processing Methods for Symbolic Music Generation and
  Information Retrieval: a Survey
Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey
Dinh-Viet-Toan Le
Louis Bigo
Mikaela Keller
Dorien Herremans
MedIm
32
9
0
27 Feb 2024
CLAP: Learning Transferable Binary Code Representations with Natural
  Language Supervision
CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision
Hao Wang
Zeyu Gao
Chao Zhang
Zihan Sha
Mingyang Sun
Yuchen Zhou
Wenyu Zhu
Wenju Sun
Han Qiu
Xiangwei Xiao
38
17
0
26 Feb 2024
How Important Is Tokenization in French Medical Masked Language Models?
How Important Is Tokenization in French Medical Masked Language Models?
Yanis Labrak
Adrien Bazoge
B. Daille
Mickael Rouvier
Richard Dufour
41
1
0
22 Feb 2024
Tokenization counts: the impact of tokenization on arithmetic in
  frontier LLMs
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
Aaditya K. Singh
DJ Strouse
43
46
0
22 Feb 2024
The Impact of Word Splitting on the Semantic Content of Contextualized
  Word Representations
The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations
Aina Garí Soler
Matthieu Labeau
Chloé Clavel
VLM
42
2
0
22 Feb 2024
Two Counterexamples to Tokenization and the Noiseless Channel
Two Counterexamples to Tokenization and the Noiseless Channel
Marco Cognetta
Vilém Zouhar
Sangwhan Moon
Naoaki Okazaki
27
0
0
22 Feb 2024
Subobject-level Image Tokenization
Subobject-level Image Tokenization
Delong Chen
Samuel Cahyawijaya
Jianfeng Liu
Baoyuan Wang
Pascale Fung
VLM
OCL
54
7
0
22 Feb 2024
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand
  for Multilingual Instructions?
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?
Alexander Arno Weber
Klaudia Thellmann
Jan Ebert
Nicolas Flores-Herr
Jens Lehmann
Michael Fromm
Mehdi Ali
38
4
0
21 Feb 2024
An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient
  Language Model Inference
An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference
Atsuki Yamaguchi
Aline Villavicencio
Nikolaos Aletras
27
7
0
16 Feb 2024
PRISE: LLM-Style Sequence Compression for Learning Temporal Action
  Abstractions in Control
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
Ruijie Zheng
Ching-An Cheng
Hal Daumé
Furong Huang
Andrey Kolobov
33
9
0
16 Feb 2024
Getting the most out of your tokenizer for pre-training and domain
  adaptation
Getting the most out of your tokenizer for pre-training and domain adaptation
Gautier Dagan
Gabriele Synnaeve
Baptiste Rozière
34
20
0
01 Feb 2024
CroissantLLM: A Truly Bilingual French-English Language Model
CroissantLLM: A Truly Bilingual French-English Language Model
Manuel Faysse
Patrick Fernandes
Nuno M. Guerreiro
António Loison
Duarte M. Alves
...
François Yvon
André F.T. Martins
Gautier Viaud
C´eline Hudelot
Pierre Colombo
55
32
0
01 Feb 2024
Byte Pair Encoding Is All You Need For Automatic Bengali Speech
  Recognition
Byte Pair Encoding Is All You Need For Automatic Bengali Speech Recognition
Ahnaf Mozib Samin
20
0
0
28 Jan 2024
Importance-Aware Data Augmentation for Document-Level Neural Machine
  Translation
Importance-Aware Data Augmentation for Document-Level Neural Machine Translation
Ming-Ru Wu
Yufei Wang
George F. Foster
Lizhen Qu
Gholamreza Haffari
43
6
0
27 Jan 2024
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced
  Understanding and Generation
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation
Gokcce Uludougan
Zeynep Yirmibecsouglu Balal
Furkan Akkurt
Melikcsah Turker
Onur Gungor
S. Uskudarli
39
12
0
25 Jan 2024
Revisiting the Optimality of Word Lengths
Revisiting the Optimality of Word Lengths
Tiago Pimentel
Clara Meister
Ethan Gotlieb Wilcox
Kyle Mahowald
Ryan Cotterell
35
7
0
06 Dec 2023
On Significance of Subword tokenization for Low Resource and Efficient
  Named Entity Recognition: A case study in Marathi
On Significance of Subword tokenization for Low Resource and Efficient Named Entity Recognition: A case study in Marathi
Harsh Chaudhari
A. Patil
Dhanashree Lavekar
Pranav Khairnar
Raviraj Joshi
Sachin Pande
44
0
0
03 Dec 2023
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
Fukun Yin
Xin Chen
C. Zhang
Biao Jiang
Zibo Zhao
Jiayuan Fan
Gang Yu
Taihao Li
Tao Chen
32
20
0
29 Nov 2023
Improving Word Sense Disambiguation in Neural Machine Translation with
  Salient Document Context
Improving Word Sense Disambiguation in Neural Machine Translation with Salient Document Context
Elijah Matthew Rippeth
Marine Carpuat
Kevin Duh
Matt Post
18
0
0
27 Nov 2023
PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with
  Unassimilated Loanwords
PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords
Panyut Sriwirote
Jalinee Thapiang
Vasan Timtong
Attapol T. Rutherford
16
5
0
21 Nov 2023
Multi-teacher Distillation for Multilingual Spelling Correction
Multi-teacher Distillation for Multilingual Spelling Correction
Jingfen Zhang
Xuan Guo
S. Bodapati
Christopher Potts
KELM
27
3
0
20 Nov 2023
Previous
123456...111213
Next