Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv (abs)
PDF
HTML
Github (10925★)
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 1,950 papers shown
Title
Cross-modal Contrastive Learning for Speech Translation
Rong Ye
Mingxuan Wang
Lei Li
SSL
94
91
0
05 May 2022
ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
Marcely Zanon Boito
John E. Ortega
Hugo Riguidel
Antoine Laurent
Loïc Barrault
...
Firas Chaabani
H. Nguyen
Florentin Barbier
Souhir Gahbiche
Yannick Esteve
57
16
0
04 May 2022
Unifying the Convergences in Multilingual Neural Machine Translation
Yi-Chong Huang
Xiaocheng Feng
Xinwei Geng
Bing Qin
82
6
0
03 May 2022
Quality-Aware Decoding for Neural Machine Translation
Patrick Fernandes
António Farinhas
Ricardo Rei
José G. C. de Souza
Perez Ogayo
Graham Neubig
André F. T. Martins
115
58
0
02 May 2022
Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES
Felix Stahlberg
Shankar Kumar
UQLM
121
12
0
02 May 2022
The Implicit Length Bias of Label Smoothing on Beam Search Decoding
Bowen Liang
Pidong Wang
Yuan Cao
67
1
0
02 May 2022
How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?
Shiyue Zhang
Vishrav Chaudhary
Naman Goyal
James Cross
Guillaume Wenzek
Joey Tianyi Zhou
Francisco Guzman
69
16
0
29 Apr 2022
Named Entity Recognition for Audio De-Identification
Guillaume Baril
P. Cardinal
Alessandro Lameiras Koerich
55
4
0
26 Apr 2022
When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?
Zhuoyuan Mao
Chenhui Chu
Raj Dabre
Haiyue Song
Zhen Wan
Sadao Kurohashi
64
3
0
26 Apr 2022
How can NLP Help Revitalize Endangered Languages? A Case Study and Roadmap for the Cherokee Language
Shiyue Zhang
B. Frey
Joey Tianyi Zhou
60
40
0
25 Apr 2022
A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning
Md. Mofijul Islam
Gustavo Aguilar
Pragaash Ponnusamy
Clint Solomon Mathialagan
Chengyuan Ma
Chenlei Guo
VLM
148
10
0
22 Apr 2022
Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
Xun Gong
Y. Qian
Houjun Huang
Yanmin Qian
81
46
0
21 Apr 2022
On the Representation Collapse of Sparse Mixture of Experts
Zewen Chi
Li Dong
Shaohan Huang
Damai Dai
Shuming Ma
...
Payal Bajaj
Xia Song
Xian-Ling Mao
Heyan Huang
Furu Wei
MoMe
MoE
113
106
0
20 Apr 2022
ALBETO and DistilBETO: Lightweight Spanish Language Models
J. Canete
S. Donoso
Felipe Bravo-Marquez
Andrés Carvallo
Vladimir Araujo
74
21
0
19 Apr 2022
Impact of Tokenization on Language Models: An Analysis for Turkish
Cagri Toraman
E. Yilmaz
Furkan Şahinuç
Oguzhan Ozcelik
104
81
0
19 Apr 2022
DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks
Ziyang Luo
Yadong Xi
Jing Ma
Zhiwei Yang
Xiaoxi Mao
Changjie Fan
Rongsheng Zhang
42
3
0
19 Apr 2022
StableMoE: Stable Routing Strategy for Mixture of Experts
Damai Dai
Li Dong
Shuming Ma
Bo Zheng
Zhifang Sui
Baobao Chang
Furu Wei
MoE
73
66
0
18 Apr 2022
Language Contamination Helps Explain the Cross-lingual Capabilities of English Pretrained Models
Terra Blevins
Luke Zettlemoyer
151
92
0
17 Apr 2022
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding
Changtong Zan
Liang Ding
Li Shen
Yu Cao
Weifeng Liu
Dacheng Tao
LRM
103
8
0
16 Apr 2022
BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling
Cunliang Kong
Yujie Wang
Ruining Chong
Liner Yang
Hengyuan Zhang
Erhong Yang
Yaping Huang
48
8
0
16 Apr 2022
SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study
Samuel Cahyawijaya
Tiezheng Yu
Zihan Liu
Tiffany Mak
Xiaopu Zhou
N. Ip
Pascale Fung
57
8
0
14 Apr 2022
Self-critical Sequence Training for Automatic Speech Recognition
Chen Chen
Yuchen Hu
Nana Hou
Xiaofeng Qi
Heqing Zou
Chng Eng Siong
76
16
0
13 Apr 2022
Breaking Character: Are Subwords Good Enough for MRLs After All?
Omri Keren
Tal Avinari
Reut Tsarfaty
Omer Levy
66
16
0
10 Apr 2022
MMTAfrica: Multilingual Machine Translation for African Languages
Chris C. Emezue
Bonaventure F. P. Dossou
73
25
0
08 Apr 2022
CookieEnforcer: Automated Cookie Notice Analysis and Enforcement
Rishabh Khandelwal
Asmit Nayak
Hamza Harkous
Kassem Fawaz
24
9
0
08 Apr 2022
Improving Tokenisation by Alternative Treatment of Spaces
Edward Gow-Smith
Harish Tayyar Madabushi
Carolina Scarton
Aline Villavicencio
89
21
0
08 Apr 2022
Does Simultaneous Speech Translation need Simultaneous Models?
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
97
26
0
08 Apr 2022
Speech Pre-training with Acoustic Piece
Shuo Ren
Shujie Liu
Yu Wu
Long Zhou
Furu Wei
SSL
65
17
0
07 Apr 2022
Quick Starting Dialog Systems with Paraphrase Generation
Louis Marceau
R. Belbahar
Marc Queudot
Nada Naji
Eric Charton
Marie-Jean Meurs
31
3
0
06 Apr 2022
Can language models learn from explanations in context?
Andrew Kyle Lampinen
Ishita Dasgupta
Stephanie C. Y. Chan
Kory Matthewson
Michael Henry Tessler
Antonia Creswell
James L. McClelland
Jane X. Wang
Felix Hill
LRM
ReLM
186
302
0
05 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
567
6,320
0
05 Apr 2022
Deliberation Model for On-Device Spoken Language Understanding
Duc Le
Akshat Shrivastava
Paden Tomasello
Suyoun Kim
Aleksandr Livshits
Ozlem Kalinli
M. Seltzer
AuLLM
70
12
0
04 Apr 2022
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
Guodong Ma
Pengfei Hu
Jian Kang
Shen Huang
Hao-Ming Huang
78
9
0
02 Apr 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
Nishant Kambhatla
Logan Born
Anoop Sarkar
84
16
0
01 Apr 2022
Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models
Felix Stahlberg
Ilia Kulikov
Shankar Kumar
UQLM
136
10
0
01 Apr 2022
Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR
Yusuke Fujita
Tatsuya Komatsu
Yusuke Kida
62
3
0
01 Apr 2022
InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR
Yumi Nakagome
Tatsuya Komatsu
Yusuke Fujita
Shuta Ichimura
Yusuke Kida
91
4
0
01 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
88
17
0
31 Mar 2022
Scaling Language Model Size in Cross-Device Federated Learning
Jae Hun Ro
Theresa Breiner
Lara McConnaughey
Mingqing Chen
A. Suresh
Shankar Kumar
Rajiv Mathews
FedML
61
26
0
31 Mar 2022
Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Jaesong Lee
Lukas Lee
Shinji Watanabe
94
8
0
31 Mar 2022
An Empirical Study of Language Model Integration for Transducer based Speech Recognition
Huahuan Zheng
Keyu An
Zhijian Ou
Chen Huang
Ke Ding
Guanglu Wan
69
5
0
31 Mar 2022
Auto-MLM: Improved Contrastive Learning for Self-supervised Multi-lingual Knowledge Retrieval
Wenshen Xu
M. Maimaiti
Yuanhang Zheng
Xin Tang
Ji Zhang
RALM
SSL
47
2
0
30 Mar 2022
Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture
Karan Singla
S. Jalalvand
Yeon-Jun Kim
Ryan Price
Daniel Pressel
S. Bangalore
36
2
0
29 Mar 2022
Locality Matters: A Locality-Biased Linear Attention for Automatic Speech Recognition
J. Sun
Guiping Zhong
Dinghao Zhou
Baoxiang Li
Yiran Zhong
63
7
0
29 Mar 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
217
1,992
0
29 Mar 2022
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
Ryo Fukuda
Katsuhito Sudoh
Satoshi Nakamura
59
7
0
29 Mar 2022
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data
Chen Chen
Nana Hou
Yuchen Hu
Shashank Shirol
Chng Eng Siong
NoLa
103
43
0
29 Mar 2022
Finnish Parliament ASR corpus - Analysis, benchmarks and statistics
A. Virkkunen
Aku Rouhe
Nhan Phan
M. Kurimo
95
4
0
28 Mar 2022
Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Yuchen Hu
Nana Hou
Chen Chen
Chng Eng Siong
99
15
0
28 Mar 2022
Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation
Sho Takase
Tatsuya Hiraoka
Naoaki Okazaki
44
5
0
25 Mar 2022
Previous
1
2
3
...
23
24
25
...
37
38
39
Next