Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.10959
Cited By
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
29 April 2018
Taku Kudo
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates"
50 / 628 papers shown
Title
Construction Grammar and Language Models
Harish Tayyar Madabushi
Laurence Romain
P. Milin
Dagmar Divjak
128
5
0
25 Aug 2023
Utilizing Semantic Textual Similarity for Clinical Survey Data Feature Selection
Benjamin C. Warner
Ziqi Xu
S. Haroutounian
Thomas Kannampallil
Chenyang Lu
87
2
0
19 Aug 2023
Reinforced Self-Training (ReST) for Language Modeling
Çağlar Gülçehre
T. Paine
S. Srinivasan
Ksenia Konyushkova
L. Weerts
...
Chenjie Gu
Wolfgang Macherey
Arnaud Doucet
Orhan Firat
Nando de Freitas
OffRL
129
309
0
17 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
44
19
0
15 Aug 2023
SOTASTREAM: A Streaming Approach to Machine Translation Training
Matt Post
Thamme Gowda
Roman Grundkiewicz
Huda Khayrallah
Rohit Jain
Marcin Junczys-Dowmunt
60
5
0
14 Aug 2023
N-gram Boosting: Improving Contextual Biasing with Normalized N-gram Targets
Wang Yau Li
Shreekantha Nadig
K. Chang
Zafarullah Mahmood
Riqiang Wang
Simon Vandieken
Jonas Robertson
Frederic Mailhot
77
0
0
04 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
96
10
0
03 Aug 2023
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
Nadezhda Chirkova
Sergey Troshin
95
8
0
01 Aug 2023
SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Haiyue Song
Raj Dabre
Chenhui Chu
Sadao Kurohashi
Eiichiro Sumita
43
3
0
31 Jul 2023
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
Michael Gunther
Louis Milliken
Jonathan Geuter
Georgios Mastrapas
Bo Wang
Han Xiao
RALM
112
32
0
20 Jul 2023
MorphPiece : A Linguistic Tokenizer for Large Language Models
Jeffrey Hsu
61
4
0
14 Jul 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
261
628
0
12 Jul 2023
Testing the Predictions of Surprisal Theory in 11 Languages
Ethan Gotlieb Wilcox
Tiago Pimentel
Clara Meister
Ryan Cotterell
R. Levy
LRM
165
70
0
07 Jul 2023
Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data
Guangzhi Sun
Chuxu Zhang
Ivan Vulić
Paweł Budzianowski
P. Woodland
80
6
0
04 Jul 2023
Should you marginalize over possible tokenizations?
Nadezhda Chirkova
Germán Kruszewski
Jos Rozen
Marc Dymetman
94
12
0
30 Jun 2023
Tokenization and the Noiseless Channel
Vilém Zouhar
Clara Meister
Juan Luis Gastaldi
Li Du
Mrinmaya Sachan
Ryan Cotterell
86
38
0
29 Jun 2023
Federated Self-Learning with Weak Supervision for Speech Recognition
Milind Rao
Gopinath Chennupati
Gautam Tiwari
Anit Kumar Sahu
A. Raju
Ariya Rastrow
J. Droppo
85
3
0
21 Jun 2023
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Yuchen Hu
Chen Chen
Ruizhe Li
Heqing Zou
Chng Eng Siong
GAN
115
9
0
18 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
120
5
0
18 Jun 2023
How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese
T. Fujii
Koki Shibata
Atsuki Yamaguchi
Terufumi Morishita
Yasuhiro Sogawa
56
17
0
16 Jun 2023
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
Muhammad Umar Farooq
Thomas Hain
39
2
0
14 Jun 2023
Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations
Anima Singh
Trung Vu
Nikhil Mehta
Raghunandan H. Keshavan
M. Sathiamoorthy
...
Lukasz Heldt
Li Wei
Devansh Tandon
Ed H. Chi
Xinyang Yi
84
24
0
13 Jun 2023
AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks
Alexander Tornede
Difan Deng
Theresa Eimer
Joseph Giovanelli
Aditya Mohan
...
Sarah Segel
Daphne Theodorakopoulos
Tanja Tornede
Henning Wachsmuth
Marius Lindauer
119
24
0
13 Jun 2023
Improving Long Context Document-Level Machine Translation
Christian Herold
Hermann Ney
53
11
0
08 Jun 2023
On Search Strategies for Document-Level Neural Machine Translation
Christian Herold
Hermann Ney
53
1
0
08 Jun 2023
Improving Language Model Integration for Neural Machine Translation
Christian Herold
Yingbo Gao
Mohammad Zeineldeen
Hermann Ney
69
2
0
08 Jun 2023
Assessing the Importance of Frequency versus Compositionality for Subword-based Tokenization in NMT
Benoist Wolleb
Romain Silvestri
Giorgos Vernikos
Ljiljana Dolamic
Ljiljana Dolamic Andrei Popescu-Belis
95
4
0
02 Jun 2023
Strategies for improving low resource speech to text translation relying on pre-trained ASR models
Santosh Kesiraju
Marek Sarvaš
T. Pavlíček
Cécile Macaire
Alejandro Ciuba
68
7
0
31 May 2023
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
Xuankai Chang
Brian Yan
Yuya Fujita
Takashi Maekaku
Shinji Watanabe
79
40
0
29 May 2023
Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora
Svanhvít Lilja Ingólfsdóttir
Pétur Orri Ragnarsson
H. Jónsson
Haukur Barri Símonarson
Vilhjálmur Þorsteinsson
Vésteinn Snæbjarnarson
SyDa
80
9
0
29 May 2023
An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation
Amit Moryossef
Mathias Müller
Anne Gohring
Zifan Jiang
Yoav Goldberg
Sarah Ebling
SLR
69
12
0
28 May 2023
From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Li Sun
F. Luisier
Kayhan Batmanghelich
D. Florêncio
Changrong Zhang
VLM
42
6
0
23 May 2023
Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer
Elizabeth Salesky
Neha Verma
Philipp Koehn
Matt Post
93
16
0
23 May 2023
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models
Benjamin Minixhofer
Jonas Pfeiffer
Ivan Vulić
77
7
0
23 May 2023
BM25 Query Augmentation Learned End-to-End
Xiaoyin Chen
Sam Wiseman
57
1
0
23 May 2023
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Jungo Kasai
David R. Mortensen
Noah A. Smith
Yulia Tsvetkov
122
98
0
23 May 2023
Machine Translation by Projecting Text into the Same Phonetic-Orthographic Space Using a Common Encoding
Amit Kumar
Shantipriya Parida
A. Pratap
Anil Kumar Singh
78
2
0
21 May 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALM
LRM
136
107
0
20 May 2023
Pseudo-Label Training and Model Inertia in Neural Machine Translation
B. Hsu
Anna Currey
Xing Niu
Maria Nuadejde
Georgiana Dinu
ODL
89
2
0
19 May 2023
Accelerating Transformer Inference for Translation via Parallel Decoding
Andrea Santilli
Silvio Severino
Emilian Postolache
Valentino Maiorca
Michele Mancusi
R. Marin
Emanuele Rodolà
130
90
0
17 May 2023
Language Model Tokenizers Introduce Unfairness Between Languages
Aleksandar Petrov
Emanuele La Malfa
Philip Torr
Adel Bibi
126
113
0
17 May 2023
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Chen Chen
Heqing Zou
Qiu-shi Zhu
Eng Siong Chng
94
8
0
16 May 2023
Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence Generation
Francois Meyer
Jan Buys
89
8
0
11 May 2023
Effects of sub-word segmentation on performance of transformer language models
Jue Hou
Anisia Katinskaia
Anh Vu
R. Yangarber
72
5
0
09 May 2023
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition
Xuandi Fu
Kanthashree Mysore Sathyendra
Ankur Gandhe
Jing Liu
Grant P. Strimel
Ross McGowan
Athanasios Mouchtaris
100
16
0
09 May 2023
Target-Side Augmentation for Document-Level Machine Translation
Guangsheng Bao
Zhiyang Teng
Yue Zhang
88
10
0
08 May 2023
What changes when you randomly choose BPE merge operations? Not much
Jonne Saleva
Constantine Lignos
53
7
0
04 May 2023
Training and Evaluation of a Multilingual Tokenizer for GPT-SW3
Felix Stollenwerk
83
8
0
28 Apr 2023
Semantic Tokenizer for Enhanced Natural Language Processing
Sandeep Mehta
Darpan Shah
Ravindra Kulkarni
Cornelia Caragea
VLM
26
3
0
24 Apr 2023
Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing
Tatsuya Hiraoka
Tomoya Iwakura
66
0
0
21 Apr 2023
Previous
1
2
3
4
5
6
...
11
12
13
Next