Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv (abs)
PDF
HTML
Github (10925★)
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 1,950 papers shown
Title
A Masked Segmental Language Model for Unsupervised Natural Language Segmentation
C.M. Downey
Fei Xia
Gina-Anne Levow
Shane Steinert-Threlkeld
30
13
0
16 Apr 2021
Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain
Nguyen Ha Thanh
Le-Minh Nguyen
ELM
AILaw
19
2
0
15 Apr 2021
Cross-Domain Label-Adaptive Stance Detection
Momchil Hardalov
Arnav Arora
Preslav Nakov
Isabelle Augenstein
92
73
0
15 Apr 2021
Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing
Akshat Shrivastava
P. Chuang
Arun Babu
Shrey Desai
Abhinav Arora
Alexander Zotov
Ahmed Aly
78
21
0
15 Apr 2021
Statistically significant detection of semantic shifts using contextual word embeddings
Yang Liu
A. Medlar
D. Głowacka
55
19
0
08 Apr 2021
Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation
Huiling You
Xingran Zhu
Sara Stymne
56
2
0
08 Apr 2021
Capturing Multi-Resolution Context by Dilated Self-Attention
Niko Moritz
Takaaki Hori
Jonathan Le Roux
32
7
0
07 Apr 2021
Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions
Jumon Nozaki
Tatsuya Komatsu
86
75
0
06 Apr 2021
LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Anton Mitrofanov
Mariya Korenevskaya
Ivan Podluzhny
Yuri Y. Khokhlov
A. Laptev
A. Andrusenko
A. Ilin
M. Korenevsky
Ivan Medennikov
A. Romanenko
KELM
LRM
26
2
0
06 Apr 2021
Non-autoregressive Mandarin-English Code-switching Speech Recognition
Shun-Po Chuang
Heng-Jui Chang
Sung-Feng Huang
Hung-yi Lee
82
15
0
06 Apr 2021
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Yuan Shangguan
Rohit Prabhavalkar
Hang Su
Jay Mahadeokar
Yangyang Shi
...
Chunyang Wu
Duc Le
Ozlem Kalinli
Christian Fuegen
M. Seltzer
55
29
0
06 Apr 2021
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Duc Le
Mahaveer Jain
Gil Keren
Suyoun Kim
Yangyang Shi
...
Yuan Shangguan
Christian Fuegen
Ozlem Kalinli
Yatharth Saraf
M. Seltzer
87
102
0
05 Apr 2021
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Yangyang Shi
Varun K. Nagaraja
Chunyang Wu
Jay Mahadeokar
Duc Le
...
Ching-Feng Yeh
Julian Chan
Christian Fuegen
Ozlem Kalinli
M. Seltzer
55
15
0
05 Apr 2021
Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding
Suyoun Kim
Abhinav Arora
Duc Le
Ching-Feng Yeh
Christian Fuegen
Ozlem Kalinli
M. Seltzer
70
28
0
05 Apr 2021
SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
Patrick K. O’Neill
Vitaly Lavrukhin
Somshubra Majumdar
Vahid Noroozi
Yuekai Zhang
...
Keenan Freyberg
Michael D. Shulman
Boris Ginsburg
Shinji Watanabe
Georg Kucsko
AI4TS
90
64
0
05 Apr 2021
Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers
Loren Lugosch
Piyush Papreja
Mirco Ravanelli
A. Heba
Titouan Parcollet
68
14
0
04 Apr 2021
On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR
Tsz Kin Lam
Mayumi Ohta
Shigehiko Schamoni
Stefan Riezler
93
27
0
03 Apr 2021
Convex Aggregation for Opinion Summarization
Hayate Iso
Xiaolan Wang
Yoshihiko Suhara
Stefanos Angelidis
W. Tan
76
35
0
03 Apr 2021
Sampling and Filtering of Neural Machine Translation Distillation Data
Vilém Zouhar
24
2
0
01 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
94
92
0
01 Apr 2021
Sample size estimation for comparing dynamic treatment regimens in a SMART: a Monte Carlo-based approach and case study with longitudinal overdispersed count outcomes
Jamie Yap
John J. Dziak
David Kabiito
Claire Babirye
J. McKay
Bibhas Chakraborty
J. Nakatumba‐Nabende
64
0
0
31 Mar 2021
Leveraging Neural Machine Translation for Word Alignment
Vilém Zouhar
Daria Pylypenko
15
2
0
31 Mar 2021
Augmenting Poetry Composition with Verse by Verse
David C. Uthus
M. Voitovich
R. Mical
159
10
0
31 Mar 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
103
84
0
28 Mar 2021
Variable Name Recovery in Decompiled Binary Code using Constrained Masked Language Modeling
Pratyay Banerjee
Kuntal Kumar Pal
Fish Wang
Chitta Baral
57
13
0
23 Mar 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation
Md. Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
118
6
0
17 Mar 2021
Multi-view Subword Regularization
Xinyi Wang
Sebastian Ruder
Graham Neubig
82
46
0
15 Mar 2021
Optimal Embedding Calibration for Symbolic Music Similarity
Xinran Zhang
Maosong Sun
Jiafeng Liu
Xiaobing Li
28
1
0
13 Mar 2021
Text Mining of Stocktwits Data for Predicting Stock Prices
Mukul Jaggi
Priyanka Mandal
Shreya Narang
Usman Naseem
Matloob Khushi
AIFin
73
41
0
13 Mar 2021
Comparing the Performance of NLP Toolkits and Evaluation measures in Legal Tech
Muhammad Zohaib Khan
ELM
AILaw
29
3
0
12 Mar 2021
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
A. Laptev
A. Andrusenko
Ivan Podluzhny
Anton Mitrofanov
Ivan Medennikov
Yuri N. Matveev
VLM
45
14
0
12 Mar 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
117
224
0
11 Mar 2021
Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution
Xavier Garcia
Noah Constant
Ankur P. Parikh
Orhan Firat
138
46
0
11 Mar 2021
Unified Pre-training for Program Understanding and Generation
Wasi Uddin Ahmad
Saikat Chakraborty
Baishakhi Ray
Kai-Wei Chang
147
774
0
10 Mar 2021
Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks
Md. Akmal Haidar
Mehdi Rezagholizadeh
111
9
0
10 Mar 2021
Variable-rate discrete representation learning
Sander Dieleman
C. Nash
Jesse Engel
Karen Simonyan
BDL
DRL
82
24
0
10 Mar 2021
Self-Learning for Zero Shot Neural Machine Translation
Surafel Melaku Lakew
Matteo Negri
Marco Turchi
38
1
0
10 Mar 2021
Overcoming Poor Word Embeddings with Word Definitions
Christopher Malon
36
3
0
05 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
161
388
0
05 Mar 2021
Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation
Runzhe Zhan
Xuebo Liu
Derek F. Wong
Lidia S. Chao
81
46
0
03 Mar 2021
Data Augmentation for Abstractive Query-Focused Multi-Document Summarization
Ramakanth Pasunuru
Asli Celikyilmaz
Michel Galley
Chenyan Xiong
Yizhe Zhang
Joey Tianyi Zhou
Jianfeng Gao
89
41
0
02 Mar 2021
Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language
Avia Efrat
Uri Shaham
D. Kilman
Omer Levy
ELM
63
18
0
01 Mar 2021
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
113
30
0
01 Mar 2021
Unbiased Sentence Encoder For Large-Scale Multi-lingual Search Engines
Mahdi Hajiaghayi
Monir Hajiaghayi
Mark R. Bolin
34
0
0
01 Mar 2021
RuSentEval: Linguistic Source, Encoder Force!
Vladislav Mikhailov
Ekaterina Taktasheva
Elina Sigdel
Ekaterina Artemova
VLM
36
6
0
28 Feb 2021
Gradient-guided Loss Masking for Neural Machine Translation
Xinyi Wang
Ankur Bapna
Melvin Johnson
Orhan Firat
65
9
0
26 Feb 2021
Automated essay scoring using efficient transformer-based language models
C. Ormerod
Akanksha Malhotra
Amir Jafari
46
31
0
25 Feb 2021
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
100
127
0
23 Feb 2021
Evaluating Contextualized Language Models for Hungarian
Judit Ács
Dániel Lévai
D. Nemeskey
András Kornai
27
1
0
22 Feb 2021
End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Prashanth Gurunath Shivakumar
Shrikanth Narayanan
53
54
0
19 Feb 2021
Previous
1
2
3
...
30
31
32
...
37
38
39
Next