Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 1,923 papers shown
Title
Language Model Prior for Low-Resource Neural Machine Translation
Christos Baziotis
Barry Haddow
Alexandra Birch
18
53
0
30 Apr 2020
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations
Arturo Oncevay
Barry Haddow
Alexandra Birch
19
34
0
30 Apr 2020
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
Asa Cooper Stickland
Xian Li
Marjan Ghazvininejad
36
44
0
30 Apr 2020
Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding
Samson Tan
Chenyu You
Lav Varshney
Min-Yen Kan
17
34
0
30 Apr 2020
Enriched Pre-trained Transformers for Joint Slot Filling and Intent Detection
Momchil Hardalov
Ivan Koychev
Preslav Nakov
VLM
28
17
0
30 Apr 2020
Vocabulary Adaptation for Distant Domain Adaptation in Neural Machine Translation
Shoetsu Sato
Jin Sakuma
Naoki Yoshinaga
Masashi Toyoda
M. Kitsuregawa
30
3
0
30 Apr 2020
Self-Supervised and Controlled Multi-Document Opinion Summarization
Hady ElSahar
Maximin Coavoux
Matthias Gallé
Jos Rozen
29
48
0
30 Apr 2020
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing
Brian Thompson
Matt Post
LRM
19
188
0
30 Apr 2020
WT5?! Training Text-to-Text Models to Explain their Predictions
Sharan Narang
Colin Raffel
Katherine Lee
Adam Roberts
Noah Fiedel
Karishma Malkan
25
197
0
30 Apr 2020
Simulated Multiple Reference Training Improves Low-Resource Machine Translation
Huda Khayrallah
Brian Thompson
Matt Post
Philipp Koehn
20
38
0
30 Apr 2020
An Empirical Study of Pre-trained Transformers for Arabic Information Extraction
Wuwei Lan
Yang Chen
Wei Xu
Alan Ritter
22
4
0
30 Apr 2020
Bilingual Text Extraction as Reading Comprehension
Katsuki Chousa
Masaaki Nagata
Masaaki Nishino
11
0
0
29 Apr 2020
Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation
Ji Ma
I. Korotkov
Yinfei Yang
Keith B. Hall
Ryan T. McDonald
RALM
25
32
0
29 Apr 2020
Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems
Jindrich Libovický
Alexander Fraser
16
0
0
29 Apr 2020
Adversarial Subword Regularization for Robust Neural Machine Translation
Jungsoo Park
Mujeen Sung
Jinhyuk Lee
Jaewoo Kang
22
8
0
29 Apr 2020
Multiresolution and Multimodal Speech Recognition with Transformers
Georgios Paraskevopoulos
Srinivas Parthasarathy
Aparna Khare
Shiva Sundaram
25
29
0
29 Apr 2020
Fast and Memory-Efficient Neural Code Completion
Alexey Svyatkovskiy
Sebastian Lee
A. Hadjitofi
M. Riechert
Juliana Franco
Miltiadis Allamanis
6
91
0
28 Apr 2020
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang
Yu Wu
Shujie Liu
Ming Zhou
Zhenglu Yang
21
108
0
21 Apr 2020
Adversarial Training for Large Neural Language Models
Xiaodong Liu
Hao Cheng
Pengcheng He
Weizhu Chen
Yu-Chiang Frank Wang
Hoifung Poon
Jianfeng Gao
AAML
34
183
0
20 Apr 2020
Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation
Jordi Armengol-Estapé
Marta R. Costa-jussà
Carlos Escolano
15
7
0
17 Apr 2020
Cross-lingual Contextualized Topic Models with Zero-shot Learning
Federico Bianchi
Silvia Terragni
Dirk Hovy
Debora Nozza
Elisabetta Fersini
BDL
34
145
0
16 Apr 2020
Non-Autoregressive Machine Translation with Latent Alignments
Chitwan Saharia
William Chan
Saurabh Saxena
Mohammad Norouzi
19
157
0
16 Apr 2020
Analyzing analytical methods: The case of phonology in neural models of spoken language
Grzegorz Chrupała
Bertrand Higy
A. Alishahi
21
20
0
15 Apr 2020
Balancing Training for Multilingual Neural Machine Translation
Xinyi Wang
Yulia Tsvetkov
Graham Neubig
23
99
0
14 Apr 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
45
379
0
13 Apr 2020
On the Language Neutrality of Pre-trained Multilingual Representations
Jindrich Libovický
Rudolf Rosa
Alexander Fraser
25
101
0
09 Apr 2020
Translation Artifacts in Cross-lingual Transfer Learning
Mikel Artetxe
Gorka Labaka
Eneko Agirre
27
115
0
09 Apr 2020
Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
Stig-Arne Gronroos
Sami Virpioja
M. Kurimo
28
6
0
08 Apr 2020
Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information
K. Stratos
Sam Wiseman
SSL
DRL
11
7
0
08 Apr 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Kaj Bostrom
Greg Durrett
28
200
0
07 Apr 2020
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
Jiyeon Ham
Yo Joong Choe
Kyubyong Park
Ilji Choi
Hyungjoon Soh
19
78
0
07 Apr 2020
Improving Fluency of Non-Autoregressive Machine Translation
Zdeněk Kasner
Jindrich Libovický
Jindvrich Helcl
AI4CE
14
8
0
07 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
27
797
0
06 Apr 2020
Meta-Learning for Few-Shot NMT Adaptation
Amr Sharaf
Hany Hassan
Hal Daumé
19
35
0
06 Apr 2020
Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech
Mihir Kale
Scott Roy
14
14
0
05 Apr 2020
Testing pre-trained Transformer models for Lithuanian news clustering
Lukas Stankevicius
M. Lukoševičius
VLM
25
8
0
03 Apr 2020
The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment
Wei Zhou
Wilfried Michel
Kazuki Irie
M. Kitza
Ralf Schluter
Hermann Ney
11
42
0
02 Apr 2020
How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context
Jey Han Lau
C. S. Armendariz
Shalom Lappin
Matthew Purver
Chang Shu
14
40
0
02 Apr 2020
Low Resource Neural Machine Translation: A Benchmark for Five African Languages
Surafel Melaku Lakew
Matteo Negri
Marco Turchi
AIMat
17
27
0
31 Mar 2020
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
T. Nguyen
Ngoc-Quan Pham
S. Stueker
A. Waibel
11
7
0
22 Mar 2020
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
Matej Martinc
Blaž Škrlj
Senja Pollak
24
38
0
20 Mar 2020
Enhancing Factual Consistency of Abstractive Summarization
Chenguang Zhu
William Fu-Hinthorn
Ruochen Xu
Qingkai Zeng
Michael Zeng
Xuedong Huang
Meng Jiang
HILM
KELM
193
40
0
19 Mar 2020
Document Ranking with a Pretrained Sequence-to-Sequence Model
Rodrigo Nogueira
Zhiying Jiang
Jimmy J. Lin
31
560
0
14 Mar 2020
Video Caption Dataset for Describing Human Actions in Japanese
Yutaro Shigeto
Yuya Yoshikawa
Jiaqing Lin
A. Takeuchi
20
3
0
10 Mar 2020
Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning
Stig-Arne Gronroos
Sami Virpioja
M. Kurimo
VLM
39
21
0
06 Mar 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
92
106
0
05 Mar 2020
PhoBERT: Pre-trained language models for Vietnamese
Dat Quoc Nguyen
A. Nguyen
174
343
0
02 Mar 2020
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation
Arya D. McCarthy
Liezl Puzon
J. Pino
33
24
0
27 Feb 2020
Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction
Danushka Bollegala
Ryuichi Kiryo
K. Tsujino
Haruki Yukawa
16
7
0
25 Feb 2020
Semi-Supervised Speech Recognition via Local Prior Matching
Wei-Ning Hsu
Ann Lee
Gabriel Synnaeve
Awni Y. Hannun
SSL
27
31
0
24 Feb 2020
Previous
1
2
3
...
35
36
37
38
39
Next