Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv (abs)
PDF
HTML
Github (10925★)
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 1,950 papers shown
Title
Learning How to Translate North Korean through South Korean
Hwichan Kim
Sangwhan Moon
Naoaki Okazaki
Mamoru Komachi
62
3
0
27 Jan 2022
Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques
Tu Anh Dinh
Danni Liu
Jan Niehues
66
6
0
26 Jan 2022
A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model
Xin Sun
Tao Ge
Shuming Ma
Jingjing Li
Furu Wei
Houfeng Wang
SyDa
110
29
0
26 Jan 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
111
23
0
25 Jan 2022
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
152
1,606
0
20 Jan 2022
Improving Neural Machine Translation by Denoising Training
Liang Ding
Keqin Peng
Dacheng Tao
VLM
AI4CE
86
6
0
19 Jan 2022
Syntax-based data augmentation for Hungarian-English machine translation
Attila Nagy
Patrick Nanys
Balázs Frey Konrád
Bence Bial
Judit Ács
38
2
0
18 Jan 2022
The Dark Side of the Language: Pre-trained Transformers in the DarkNet
Leonardo Ranaldi
Aria Nourbakhsh
Arianna Patrizi
Elena Sofia Ruzzetti
Dario Onorati
Francesca Fallucchi
Fabio Massimo Zanzotto
VLM
63
21
0
14 Jan 2022
Speech Resources in the Tamasheq Language
Marcely Zanon Boito
Fethi Bougares
Florentin Barbier
Souhir Gahbiche
Loïc Barrault
Mickael Rouvier
Yannick Esteve
74
16
0
13 Jan 2022
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
Yeting Jia
Michelle Tadmor Ramanovich
Quan Wang
Heiga Zen
SLR
94
70
0
11 Jan 2022
Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Aditya Siddhant
Ankur Bapna
Orhan Firat
Yuan Cao
Mengzhao Chen
Isaac Caswell
Xavier Garcia
ELM
LRM
71
29
0
09 Jan 2022
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Tiezheng Yu
Rita Frieske
Peng Xu
Samuel Cahyawijaya
Cheuk Tung Shadow Yiu
...
Elham J. Barezi
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
RALM
80
10
0
07 Jan 2022
Fine-Tuning Transformers: Vocabulary Transfer
Vladislav D. Mosin
Igor Samenko
Alexey Tikhonov
Borislav M. Kozlovskii
Ivan P. Yamshchikov
79
20
0
29 Dec 2021
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
123
102
0
23 Dec 2021
Voice Quality and Pitch Features in Transformer-Based Speech Recognition
Guillermo Cámbara
Jordi Luque
Mireia Farrús
52
0
0
21 Dec 2021
Few-shot Learning with Multilingual Language Models
Xi Lin
Todor Mihaylov
Mikel Artetxe
Tianlu Wang
Shuohui Chen
...
Luke Zettlemoyer
Zornitsa Kozareva
Mona T. Diab
Ves Stoyanov
Xian Li
BDL
ELM
LRM
153
308
0
20 Dec 2021
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
107
151
0
20 Dec 2021
Multi-turn RNN-T for streaming recognition of multi-party speech
Ilya Sklyar
A. Piunova
Xianrui Zheng
Yulan Liu
114
24
0
19 Dec 2021
Continual Learning for Monolingual End-to-End Automatic Speech Recognition
Steven Vander Eeckt
Hugo Van hamme
CLL
109
17
0
17 Dec 2021
Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling
Ilia Kulikov
M. Eremeev
Kyunghyun Cho
66
8
0
16 Dec 2021
Isometric MT: Neural Machine Translation for Automatic Dubbing
Surafel Melaku Lakew
Yogesh Virkar
Prashant Mathur
Marcello Federico
68
24
0
16 Dec 2021
Can Multilinguality benefit Non-autoregressive Machine Translation?
Sweta Agrawal
Julia Kreutzer
Colin Cherry
AI4CE
49
1
0
16 Dec 2021
Improving Both Domain Robustness and Domain Adaptability in Machine Translation
Wen Lai
Jindrich Libovický
Alexander Fraser
AI4CE
92
14
0
15 Dec 2021
Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing
Robert Tinn
Hao Cheng
Yu Gu
Naoto Usuyama
Xiaodong Liu
Tristan Naumann
Jianfeng Gao
Hoifung Poon
LM&MA
60
117
0
15 Dec 2021
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Zhiwen Chen
Claire Cui
ALM
MoE
269
832
0
13 Dec 2021
Step-unrolled Denoising Autoencoders for Text Generation
Nikolay Savinov
Junyoung Chung
Mikolaj Binkowski
Erich Elsen
Aaron van den Oord
DiffM
134
120
0
13 Dec 2021
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition
Guodong Ma
Pengfei Hu
Nurmemet Yolwas
Shen Huang
Hao-Ming Huang
86
4
0
13 Dec 2021
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud
A. Mensch
Jordan Hoffmann
Trevor Cai
Eliza Rutherford
...
Simon Osindero
Karen Simonyan
Jack W. Rae
Erich Elsen
Laurent Sifre
KELM
RALM
303
1,107
0
08 Dec 2021
Scaling Up Influence Functions
Andrea Schioppa
Polina Zablotskaia
David Vilar
Artem Sokolov
TDI
117
105
0
06 Dec 2021
Ensembling of Distilled Models from Multi-task Teachers for Constrained Resource Language Pairs
Amr Hendy
Esraa A. Gad
M. Abdelghaffar
Jailan S. ElMosalami
Mohamed Afify
Ahmed Tawfik
Hany Awadalla
MoE
83
3
0
26 Nov 2021
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang
Ceslee Montgomery
Jordi Orbay
Vighnesh Birodkar
Aleksandra Faust
Izzeddin Gur
Natasha Jaques
Austin Waters
Jason Baldridge
Peter Anderson
135
64
0
25 Nov 2021
Out-of-Category Document Identification Using Target-Category Names as Weak Supervision
Dongha Lee
Dongmin Hyun
Jiawei Han
Hwanjo Yu
OOD
66
1
0
24 Nov 2021
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
135
169
0
22 Nov 2021
Combined Scaling for Zero-shot Transfer Learning
Hieu H. Pham
Zihang Dai
Golnaz Ghiasi
Kenji Kawaguchi
Hanxiao Liu
...
Yi-Ting Chen
Minh-Thang Luong
Yonghui Wu
Mingxing Tan
Quoc V. Le
VLM
116
202
0
19 Nov 2021
Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
Chunxi Liu
M. Picheny
Leda Sari
Pooja Chitkara
Alex Xiao
Xiaohui Zhang
Mark Chou
Andres Alvarado
C. Hazirbas
Yatharth Saraf
88
44
0
18 Nov 2021
RoBERTuito: a pre-trained language model for social media text in Spanish
Juan Manuel Pérez
D. Furman
Laura Alonso Alemany
Franco Luque
72
100
0
18 Nov 2021
High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics
Markus Freitag
David Grangier
Qijun Tan
Bowen Liang
133
98
0
17 Nov 2021
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
159
561
0
15 Nov 2021
Attention based end to end Speech Recognition for Voice Search in Hindi and English
Raviraj Joshi
Venkateshan Kannan
51
7
0
15 Nov 2021
Calculating Question Similarity is Enough: A New Method for KBQA Tasks
Hanyu Zhao
Shaoqing Yuan
Jiahong Leng
X. Pan
Guoqiang Wang
Ledell Wu
Jie Tang
27
0
0
15 Nov 2021
SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets
Ann Yuan
Daphne Ippolito
Vitaly Nikolaev
Chris Callison-Burch
Andy Coenen
Sebastian Gehrmann
SyDa
190
23
0
11 Nov 2021
Cross-language Information Retrieval
P. Galuscáková
Douglas W. Oard
Suraj Nair
61
0
0
10 Nov 2021
Improving Structured Text Recognition with Regular Expression Biasing
Baoguang Shi
W. Cheng
Yijuan Lu
Cha Zhang
D. Florêncio
41
2
0
10 Nov 2021
Developing neural machine translation models for Hungarian-English
A. Nagy
87
1
0
07 Nov 2021
How Do Neural Sequence Models Generalize? Local and Global Context Cues for Out-of-Distribution Prediction
Anthony Bau
Jacob Andreas
63
3
0
04 Nov 2021
Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues
M. Moradshahi
Victoria Tsai
Giovanni Campagna
M. Lam
81
16
0
04 Nov 2021
Lingua Custodia's participation at the WMT 2021 Machine Translation using Terminologies shared task
Melissa Ailem
Jinghsu Liu
Raheel Qader
56
6
0
03 Nov 2021
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Jian Yang
Shuming Ma
Haoyang Huang
Dongdong Zhang
Li Dong
...
Alexandre Muzio
Saksham Singhal
Hany Awadalla
Xia Song
Furu Wei
72
46
0
03 Nov 2021
Sequence Transduction with Graph-based Supervision
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
45
6
0
01 Nov 2021
Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC
Chanjun Park
Midan Shim
Sugyeong Eo
Seolhwa Lee
Jaehyung Seo
Hyeonseok Moon
Heuiseok Lim
23
8
0
28 Oct 2021
Previous
1
2
3
...
25
26
27
...
37
38
39
Next