Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv (abs)
PDF
HTML
Github (10925★)
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 1,950 papers shown
Title
Meta Back-translation
Hieu H. Pham
Xinyi Wang
Yiming Yang
Graham Neubig
64
26
0
15 Feb 2021
Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders
Junwei Liao
Yu Shi
Ming Gong
Linjun Shou
Hong Qu
Michael Zeng
58
11
0
12 Feb 2021
Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
Renjie Zheng
Junkun Chen
Mingbo Ma
Liang Huang
155
69
0
10 Feb 2021
Intermediate Loss Regularization for CTC-based Speech Recognition
Jaesong Lee
Shinji Watanabe
151
140
0
05 Feb 2021
Mind the Gap: Assessing Temporal Generalization in Neural Language Models
Angeliki Lazaridou
A. Kuncoro
E. Gribovskaya
Devang Agrawal
Adam Liska
...
Sebastian Ruder
Dani Yogatama
Kris Cao
Susannah Young
Phil Blunsom
VLM
140
219
0
03 Feb 2021
The Multilingual TEDx Corpus for Speech Recognition and Translation
Elizabeth Salesky
Sanjeev Khudanpur
Jacob Bremerman
R. Cattoni
Matteo Negri
Marco Turchi
Douglas W. Oard
Matt Post
79
126
0
02 Feb 2021
Inducing Meaningful Units from Character Sequences with Dynamic Capacity Slot Attention
Melika Behjati
James Henderson
OCL
48
1
0
01 Feb 2021
Civil Rephrases Of Toxic Texts With Self-Supervised Transformers
Leo Laugier
John Pavlopoulos
Jeffrey Scott Sorensen
Lucas Dixon
87
48
0
01 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
148
117
0
31 Jan 2021
Extending Neural Keyword Extraction with TF-IDF tagset matching
Boshko Koloski
Senja Pollak
Blaž Škrlj
Matej Martinc
16
10
0
31 Jan 2021
Synthesizing Monolingual Data for Neural Machine Translation
Benjamin Marie
Atsushi Fujita
SyDa
22
2
0
29 Jan 2021
LOME: Large Ontology Multilingual Extraction
Patrick Xia
Guanghui Qin
Siddharth Vashishtha
Yunmo Chen
Tongfei Chen
Chandler May
Craig Harman
Kyle Rawlins
A. White
Benjamin Van Durme
95
42
0
28 Jan 2021
Explaining Natural Language Processing Classifiers with Occlusion and Language Modeling
David Harbecke
AAML
48
2
0
28 Jan 2021
KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding
HyunJae Lee
Jaewoong Yoon
Bonggyu Hwang
Seongho Joe
Seungjai Min
Youngjune Gwon
SSeg
58
16
0
27 Jan 2021
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec
Jiatong Shi
Jiatong Shi. Jonathan D. Amith
Rey Castillo García
Esteban Guadalupe Sierra
Kevin Duh
Shinji Watanabe
69
47
0
26 Jan 2021
El Volumen Louder Por Favor: Code-switching in Task-oriented Semantic Parsing
Arash Einolghozati
Abhinav Arora
Lorena Sainz-Maza Lecanda
Anuj Kumar
Sonal Gupta
112
9
0
26 Jan 2021
EGFI: Drug-Drug Interaction Extraction and Generation with Fusion of Enriched Entity and Sentence Information
Lei Huang
Jiecong Lin
Xiangtao Li
Linqi Song
Ka-Chun Wong
101
24
0
25 Jan 2021
WangchanBERTa: Pretraining transformer-based Thai Language Models
Lalita Lowphansirikul
Charin Polpanumas
Nawat Jantrakulchai
Sarana Nutanong
53
76
0
24 Jan 2021
Training Multilingual Pre-trained Language Model with Byte-level Subwords
Junqiu Wei
Qun Liu
Yinpeng Guo
Xin Jiang
58
20
0
23 Jan 2021
Streaming Models for Joint Speech Recognition and Translation
Orion Weller
Matthias Sperber
Christian Gollan
Joris Kluivers
146
13
0
22 Jan 2021
Does a Hybrid Neural Network based Feature Selection Model Improve Text Classification?
Suman Dowlagar
R. Mamidi
45
1
0
22 Jan 2021
HASOCOne@FIRE-HASOC2020: Using BERT and Multilingual BERT models for Hate Speech Detection
Suman Dowlagar
R. Mamidi
35
21
0
22 Jan 2021
CMSAOne@Dravidian-CodeMix-FIRE2020: A Meta Embedding and Transformer model for Code-Mixed Sentiment Analysis on Social Media Text
Suman Dowlagar
R. Mamidi
53
13
0
22 Jan 2021
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Zi-Yi Dou
Graham Neubig
181
271
0
20 Jan 2021
ComQA:Compositional Question Answering via Hierarchical Graph Neural Networks
Bingning Wang
Ting Yao
Weipeng Chen
Jingfang Xu
Xiaochuan Wang
CoGe
70
6
0
16 Jan 2021
Experimental Evaluation of Deep Learning models for Marathi Text Classification
Atharva Kulkarni
Meet Mandhane
Manali Likhitkar
G. Kshirsagar
J. Jagdale
Raviraj Joshi
101
29
0
13 Jan 2021
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing
Minh Nguyen
Viet Dac Lai
Amir Pouran Ben Veyseh
Thien Huu Nguyen
124
137
0
09 Jan 2021
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Hieu H. Pham
Quoc V. Le
128
57
0
05 Jan 2021
Local Translation Services for Neglected Languages
David Noever
Josh Kalin
Matt Ciolino
Dom Hambrick
Gerry V. Dozier
117
4
0
05 Jan 2021
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Changhan Wang
M. Rivière
Ann Lee
Anne Wu
Chaitanya Talnikar
Daniel Haziza
Mary Williamson
J. Pino
Emmanuel Dupoux
SSL
128
498
0
02 Jan 2021
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
Jiatao Gu
X. Kong
91
137
0
31 Dec 2020
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Shuming Ma
Jian Yang
Haoyang Huang
Zewen Chi
Li Dong
...
Akiko Eriguchi
Saksham Singhal
Xia Song
Arul Menezes
Furu Wei
LRM
80
33
0
31 Dec 2020
FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation
Kushal Lakhotia
Bhargavi Paranjape
Asish Ghoshal
Wen-tau Yih
Yashar Mehdad
Srini Iyer
63
28
0
31 Dec 2020
Fully Synthetic Data Improves Neural Machine Translation with Knowledge Distillation
Alham Fikri Aji
Kenneth Heafield
59
3
0
31 Dec 2020
Improving Zero-Shot Translation by Disentangling Positional Information
Danni Liu
Jan Niehues
James Cross
Francisco Guzmán
Xian Li
90
49
0
30 Dec 2020
Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces
Linyang Li
Yunfan Shao
Demin Song
Xipeng Qiu
Xuanjing Huang
AAML
GAN
40
7
0
29 Dec 2020
Neural Text Generation with Artificial Negative Examples
Keisuke Shirai
Kazuma Hashimoto
Akiko Eriguchi
Takashi Ninomiya
Shinsuke Mori
68
8
0
28 Dec 2020
SubICap: Towards Subword-informed Image Captioning
Naeha Sharif
Bennamoun
Wei Liu
Syed Afaq Ali Shah
39
2
0
24 Dec 2020
Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020
Ramchandra Joshi
Rushabh Karnavat
Kaustubh Jirapure
Raviraj Joshi
35
0
0
22 Dec 2020
Finding Sparse Structures for Domain Specific Neural Machine Translation
Jianze Liang
Chengqi Zhao
Mingxuan Wang
Xipeng Qiu
Lei Li
CLL
73
4
0
19 Dec 2020
Exploring Fluent Query Reformulations with Text-to-Text Transformers and Reinforcement Learning
Jerry Zikun Chen
S. Yu
Haoran Wang
444
5
0
18 Dec 2020
Primer AI's Systems for Acronym Identification and Disambiguation
Nicholas Egan
John Bohannon
55
8
0
14 Dec 2020
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
60
9
0
13 Dec 2020
Improved Robustness to Disfluencies in RNN-Transducer Based Speech Recognition
Valentin Mendelev
Tina Raissi
Guglielmo Camporese
Manuel Giollo
48
21
0
11 Dec 2020
Automatic Standardization of Colloquial Persian
Mohammad Sadegh Rasooli
Farzane Bakhtyari
F. Shafiei
Mahsa Ravanbakhsh
Chris Callison-Burch
26
4
0
10 Dec 2020
Exploring Pair-Wise NMT for Indian Languages
Kartheek Akella
Sai Himal Allu
S. Ragupathi
Aman Singhal
Zeeshan Khan
Vinay P. Namboodiri
C. V. Jawahar
61
7
0
10 Dec 2020
Session-Aware Query Auto-completion using Extreme Multi-label Ranking
Nishant Yadav
Rajat Sen
Daniel N. Hill
A. Mazumdar
Inderjit S. Dhillon
64
11
0
09 Dec 2020
Pre-training Protein Language Models with Label-Agnostic Binding Pairs Enhances Performance in Downstream Tasks
Modestas Filipavicius
Matteo Manica
Joris Cadow
María Rodríguez Martínez
153
14
0
05 Dec 2020
GottBERT: a pure German Language Model
Raphael Scheible
Fabian Thomczyk
P. Tippmann
V. Jaravine
M. Boeker
VLM
59
81
0
03 Dec 2020
CPM: A Large-scale Generative Chinese Pre-trained Language Model
Zhengyan Zhang
Xu Han
Hao Zhou
Pei Ke
Yuxian Gu
...
Wentao Han
Jie Tang
Juan-Zi Li
Xiaoyan Zhu
Maosong Sun
68
119
0
01 Dec 2020
Previous
1
2
3
...
31
32
33
...
37
38
39
Next