Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv (abs)
PDF
HTML
Github (10925★)
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 1,950 papers shown
Title
Neural Networks for Entity Matching: A Survey
Nils Barlaug
J. Gulla
132
96
0
21 Oct 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
187
172
0
21 Oct 2020
Towards End-to-End In-Image Neural Machine Translation
Elman Mansimov
Mitchell Stern
Mengzhao Chen
Orhan Firat
Jakob Uszkoreit
Puneet Jain
75
26
0
20 Oct 2020
Revisiting Modularized Multilingual NMT to Meet Industrial Demands
Sungwon Lyu
Bokyung Son
Kichang Yang
Jaekyoung Bae
MoE
64
20
0
19 Oct 2020
Mixed-Lingual Pre-training for Cross-lingual Summarization
Ruochen Xu
Chenguang Zhu
Yu Shi
Michael Zeng
Xuedong Huang
57
26
0
18 Oct 2020
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
T. Tran
Yifan Hu
Changwei Hu
Kevin Yen
Fei Tan
Kyumin Lee
Serim Park
VLM
90
32
0
17 Oct 2020
Cross-Lingual Relation Extraction with Transformers
Jian Ni
Taesun Moon
Parul Awasthy
Radu Florian
ViT
35
6
0
16 Oct 2020
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
Ludwig Kurzinger
Nicolas Lindae
Palle Klewitz
Gerhard Rigoll
54
5
0
15 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
387
628
0
13 Oct 2020
The Tatoeba Translation Challenge -- Realistic Data Sets for Low Resource and Multilingual MT
Jörg Tiedemann
234
171
0
13 Oct 2020
Mathematical Word Problem Generation from Commonsense Knowledge Graph and Equations
Tianqiao Liu
Qian Fang
Wenbiao Ding
Hang Li
Zhongqin Wu
Zitao Liu
120
29
0
13 Oct 2020
Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang
Yulia Tsvetkov
Orhan Firat
Yuan Cao
79
202
0
12 Oct 2020
Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data
Katja Filippova
68
113
0
12 Oct 2020
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
Dongxu Li
Chenchen Xu
Xin Yu
Kaihao Zhang
Ben Swift
H. Suominen
Hongdong Li
SLR
54
124
0
12 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
VLM
LRM
107
276
0
11 Oct 2020
Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels
Harris Chan
J. Kiros
William Chan
LRM
21
0
0
09 Oct 2020
On the importance of pre-training data volume for compact language models
Vincent Micheli
Martin d'Hoffschmidt
Franccois Fleuret
67
42
0
08 Oct 2020
Super-Human Performance in Online Low-latency Recognition of Conversational Speech
T. Nguyen
S. Stueker
A. Waibel
BDL
74
38
0
07 Oct 2020
Cross-lingual Extended Named Entity Classification of Wikipedia Articles
Viet The Bui
Hong Phuong Le
13
2
0
07 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
60
35
0
06 Oct 2020
An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks
Kyubyong Park
Joohong Lee
Seongbo Jang
Dawoon Jung
64
65
0
06 Oct 2020
Multi-task Learning for Multilingual Neural Machine Translation
Yiren Wang
Chengxiang Zhai
Hany Awadalla
88
69
0
06 Oct 2020
Efficient Inference For Neural Machine Translation
Y. Hsu
Sarthak Garg
Yi-Hsiu Liao
Ilya Chatsviorkin
AI4CE
50
12
0
06 Oct 2020
An Ensemble Approach for Automatic Structuring of Radiology Reports
Morteza Pourreza Shahri
A. Tahmasebi
Bingyang Ye
Henghui Zhu
J. Aslam
T. Ferris
13
2
0
05 Oct 2020
Gauravarora@HASOC-Dravidian-CodeMix-FIRE2020: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gaurav Arora
41
27
0
05 Oct 2020
The Grammar of Emergent Languages
Oskar van der Wal
S. D. Boer
Elia Bruni
Dieuwke Hupkes
76
16
0
05 Oct 2020
Improving Target-side Lexical Transfer in Multilingual Neural Machine Translation
Luyu Gao
Xinyi Wang
Graham Neubig
102
7
0
04 Oct 2020
Differentiable Weighted Finite-State Transducers
Awni Y. Hannun
Vineel Pratap
Jacob Kahn
Wei-Ning Hsu
118
29
0
02 Oct 2020
STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++
Jack G. M. FitzGerald
69
13
0
02 Oct 2020
Nearest Neighbor Machine Translation
Urvashi Khandelwal
Angela Fan
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
78
288
0
01 Oct 2020
On Romanization for Model Transfer Between Scripts in Neural Machine Translation
Chantal Amrhein
Rico Sennrich
80
15
0
30 Sep 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
194
1,605
0
30 Sep 2020
Generative latent neural models for automatic word alignment
Anh Khoa Ngo Ho
François Yvon
DRL
56
2
0
28 Sep 2020
iNLTK: Natural Language Toolkit for Indic Languages
Gaurav Arora
VLM
61
66
0
26 Sep 2020
A little goes a long way: Improving toxic language classification despite data scarcity
Mika Juuti
Tommi Gröndahl
Adrian Flanagan
Nirmal Asokan
94
25
0
25 Sep 2020
The importance of fillers for text representations of speech transcripts
Tanvi Dinkar
Pierre Colombo
Matthieu Labeau
Chloé Clavel
150
24
0
23 Sep 2020
Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages
Xavier Garcia
Aditya Siddhant
Orhan Firat
Ankur P. Parikh
86
31
0
23 Sep 2020
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline
Yerbolat Khassanov
Saida Mussakhojayeva
A. Mirzakhmetov
A. Adiyev
Mukhamet Nurpeiissov
H. A. Varol
57
31
0
22 Sep 2020
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Tahmid Hasan
Abhik Bhattacharjee
Kazi Samin Mubasshir
Masum Hasan
Madhusudan Basak
M. Rahman
Rifat Shahriyar
VLM
82
77
0
20 Sep 2020
Will it Unblend?
Yuval Pinter
Cassandra L. Jacobs
Jacob Eisenstein
66
14
0
18 Sep 2020
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches
Juan Cruz-Benito
Sanjay Vishwakarma
Francisco Martín-Fernández
Ismael Faro Ibm Quantum
66
31
0
16 Sep 2020
Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
Insoo Chung
Byeongwook Kim
Yoonjung Choi
S. Kwon
Yongkweon Jeon
Baeseong Park
Sangha Kim
Dongsoo Lee
MQ
91
27
0
16 Sep 2020
Multi-span Style Extraction for Generative Reading Comprehension
Junjie Yang
Zhuosheng Zhang
Hai Zhao
SyDa
51
14
0
15 Sep 2020
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation
Jason D. Lee
Raphael Shu
Kyunghyun Cho
61
26
0
15 Sep 2020
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition
Soohwan Kim
Seyoung Bae
Cheolhwang Won
VLM
24
5
0
07 Sep 2020
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion Analysis
G. Vlad
George-Eduard Zaharia
Dumitru-Clementin Cercel
Costin-Gabriel Chiru
Stefan Trausan-Matu
69
31
0
06 Sep 2020
GREEK-BERT: The Greeks visiting Sesame Street
John Koutsikakis
Ilias Chalkidis
Prodromos Malakasiotis
Ion Androutsopoulos
70
92
0
27 Aug 2020
Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label Imbalance
Selim F. Yilmaz
E. Kaynak
Aykut Koç
H. Dibeklioğlu
Suleyman S. Kozat
64
28
0
26 Aug 2020
JokeMeter at SemEval-2020 Task 7: Convolutional humor
Martin Docekal
Martin Fajcik
Josef Jon
Pavel Smrz
58
2
0
25 Aug 2020
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
Meghana Bhange
Nirant Kasliwal
38
7
0
22 Aug 2020
Previous
1
2
3
...
33
34
35
...
37
38
39
Next