Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.10959
Cited By
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
29 April 2018
Taku Kudo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates"
50 / 619 papers shown
Title
Mukayese: Turkish NLP Strikes Back
Ali Safaya
Emirhan Kurtulucs
Arda Goktougan
Deniz Yuret
28
22
0
02 Mar 2022
Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale
Laurent Sartran
Samuel Barrett
A. Kuncoro
Milovs Stanojević
Phil Blunsom
Chris Dyer
50
49
0
01 Mar 2022
LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation
Keita Nonaka
Kazutaka Yamanouchi
Tomohiro I
Tsuyoshi Okita
Kazutaka Shimada
Hiroshi Sakamoto
24
8
0
28 Feb 2022
Morphology Without Borders: Clause-Level Morphology
Omer Goldman
Reut Tsarfaty
AILaw
49
3
0
25 Feb 2022
Screening Gender Transfer in Neural Machine Translation
Guillaume Wisniewski
Lichao Zhu
Nicolas Bailler
François Yvon
6
4
0
25 Feb 2022
Refining the state-of-the-art in Machine Translation, optimizing NMT for the JA <-> EN language pair by leveraging personal domain expertise
Matthew Bieda
24
1
0
23 Feb 2022
Evaluating Persian Tokenizers
Danial Kamali
Behrooz Janfada
Mohammad Ebrahim Shenasa
B. Minaei-Bidgoli
16
1
0
22 Feb 2022
Korean Tokenization for Beam Search Rescoring in Speech Recognition
Kyuhong Shim
Hyewon Bae
Wonyong Sung
24
0
0
22 Feb 2022
Non-Autoregressive ASR with Self-Conditioned Folded Encoders
Tatsuya Komatsu
28
7
0
17 Feb 2022
USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder
Bolaji Yusuf
Ankur Gandhe
Alex Sokolov
40
8
0
12 Feb 2022
Neural-FST Class Language Model for End-to-End Speech Recognition
A. Bruguier
Duc Le
Rohit Prabhavalkar
Dangna Li
Zhe Liu
Bo Wang
Eun Chang
Fuchun Peng
Ozlem Kalinli
M. Seltzer
20
6
0
28 Jan 2022
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Tiezheng Yu
Rita Frieske
Peng Xu
Samuel Cahyawijaya
Cheuk Tung Shadow Yiu
...
Elham J. Barezi
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
RALM
47
9
0
07 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
60
306
0
05 Jan 2022
Fine-Tuning Transformers: Vocabulary Transfer
Vladislav D. Mosin
Igor Samenko
Alexey Tikhonov
Borislav M. Kozlovskii
Ivan P. Yamshchikov
25
19
0
29 Dec 2021
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
39
100
0
23 Dec 2021
Few-shot Learning with Multilingual Language Models
Xi Lin
Todor Mihaylov
Mikel Artetxe
Tianlu Wang
Shuohui Chen
...
Luke Zettlemoyer
Zornitsa Kozareva
Mona T. Diab
Ves Stoyanov
Xian Li
BDL
ELM
LRM
64
287
0
20 Dec 2021
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
34
143
0
20 Dec 2021
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
31
143
0
15 Dec 2021
Improving Both Domain Robustness and Domain Adaptability in Machine Translation
Wen Lai
Jindrich Libovický
Alexander Fraser
AI4CE
37
14
0
15 Dec 2021
Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding
Kai Wei
Thanh-Binh Tran
Feng-Ju Chang
Kanthashree Mysore Sathyendra
Thejaswi Muniyappa
...
A. Raju
Ross McGowan
Nathan Susanj
Ariya Rastrow
Grant P. Strimel
12
10
0
13 Dec 2021
AtteSTNet -- An attention and subword tokenization based approach for code-switched text hate speech detection
Geet Shingi
Vedangi Wagh
Kishor Wagh
Sharmila Wagh
19
0
0
10 Dec 2021
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Pengcheng He
Jianfeng Gao
Weizhu Chen
74
1,126
0
18 Nov 2021
Character-level HyperNetworks for Hate Speech Detection
Tomer Wullach
A. Adler
Einat Minkov
24
12
0
11 Nov 2021
Context-Aware Transformer Transducer for Speech Recognition
Feng-Ju Chang
Jing Liu
Martin H. Radfar
Athanasios Mouchtaris
M. Omologo
Ariya Rastrow
Siegfried Kunzmann
21
79
0
05 Nov 2021
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
Arij Riabi
Benoît Sagot
Djamé Seddah
31
15
0
26 Oct 2021
Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding
Wei Wang
Shuo Ren
Yao Qian
Shujie Liu
Yu Shi
Y. Qian
Michael Zeng
40
17
0
23 Oct 2021
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation
Guanhua Chen
Shuming Ma
Yun-Nung Chen
Dongdong Zhang
Jia Pan
Wenping Wang
Furu Wei
LRM
31
14
0
16 Oct 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
118
194
0
14 Oct 2021
Automated Essay Scoring Using Transformer Models
Sabrina Ludwig
Christian W. F. Mayer
Christopher Hansen
Kerstin Eilers
Steffen Brandt
19
39
0
13 Oct 2021
Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems
Mohd Abbas Zaidi
Beomseok Lee
Sangha Kim
Chanwoo Kim
30
5
0
13 Oct 2021
Balancing Average and Worst-case Accuracy in Multitask Learning
Paul Michel
Sebastian Ruder
Dani Yogatama
21
11
0
12 Oct 2021
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation
Yosuke Higuchi
Nanxin Chen
Yuya Fujita
Hirofumi Inaguma
Tatsuya Komatsu
Jaesong Lee
Jumon Nozaki
Tianzi Wang
Shinji Watanabe
38
41
0
11 Oct 2021
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy
Yosuke Higuchi
Niko Moritz
Jonathan Le Roux
Takaaki Hori
19
11
0
11 Oct 2021
Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition
Guoli Ye
V. Mazalov
Jinyu Li
Jiawei Liu
25
9
0
10 Oct 2021
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units
Yosuke Higuchi
Keita Karube
Tetsuji Ogawa
Tetsunori Kobayashi
18
23
0
08 Oct 2021
Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models
Robert Wolfe
Aylin Caliskan
95
51
0
01 Oct 2021
BERTweetFR : Domain Adaptation of Pre-Trained Language Models for French Tweets
Yanzhu Guo
Virgile Rennard
Christos Xypolopoulos
Michalis Vazirgiannis
VLM
AI4CE
42
19
0
21 Sep 2021
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
Bo Zheng
Li Dong
Shaohan Huang
Saksham Singhal
Wanxiang Che
Ting Liu
Xia Song
Furu Wei
VLM
21
22
0
15 Sep 2021
Wine is Not v i n. -- On the Compatibility of Tokenizations Across Languages
Antonis Maronikolakis
Philipp Dufter
Hinrich Schütze
26
17
0
13 Sep 2021
Integrating Approaches to Word Representation
Yuval Pinter
NAI
50
5
0
10 Sep 2021
Speechformer: Reducing Information Loss in Direct Speech Translation
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
67
23
0
09 Sep 2021
Subword Mapping and Anchoring across Languages
Giorgos Vernikos
Andrei Popescu-Belis
70
12
0
09 Sep 2021
Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection
Thuy-Trang Vu
Xuanli He
D.Q. Phung
Gholamreza Haffari
45
10
0
09 Sep 2021
ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization
Alireza Salemi
Emad Kebriaei
Ghazal Neisi Minaei
A. Shakery
CVBM
23
5
0
09 Sep 2021
Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario
C. Carrino
Jordi Armengol-Estapé
Asier Gutiérrez-Fandiño
Joan Llop-Palao
Marc Pàmies
Aitor Gonzalez-Agirre
Marta Villegas
18
44
0
08 Sep 2021
IndicBART: A Pre-trained Model for Indic Natural Language Generation
Raj Dabre
Himani Shrotriya
Anoop Kunchukuttan
Ratish Puduppully
Mitesh M. Khapra
Pratyush Kumar
52
70
0
07 Sep 2021
You should evaluate your language model on marginal likelihood over tokenisations
Kris Cao
Laura Rimell
39
23
0
06 Sep 2021
How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology?
Chantal Amrhein
Rico Sennrich
32
13
0
02 Sep 2021
Survey of Low-Resource Machine Translation
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
49
150
0
01 Sep 2021
AraT5: Text-to-Text Transformers for Arabic Language Generation
El Moatez Billah Nagoudi
AbdelRahim Elmadany
Muhammad Abdul-Mageed
92
118
0
31 Aug 2021
Previous
1
2
3
...
7
8
9
...
11
12
13
Next