ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 1,950 papers shown
Title
A Masked Segmental Language Model for Unsupervised Natural Language
  Segmentation
A Masked Segmental Language Model for Unsupervised Natural Language Segmentation
C.M. Downey
Fei Xia
Gina-Anne Levow
Shane Steinert-Threlkeld
30
13
0
16 Apr 2021
Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain
Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain
Nguyen Ha Thanh
Le-Minh Nguyen
ELMAILaw
19
2
0
15 Apr 2021
Cross-Domain Label-Adaptive Stance Detection
Cross-Domain Label-Adaptive Stance Detection
Momchil Hardalov
Arnav Arora
Preslav Nakov
Isabelle Augenstein
92
73
0
15 Apr 2021
Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic
  Parsing
Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing
Akshat Shrivastava
P. Chuang
Arun Babu
Shrey Desai
Abhinav Arora
Alexander Zotov
Ahmed Aly
78
21
0
15 Apr 2021
Statistically significant detection of semantic shifts using contextual
  word embeddings
Statistically significant detection of semantic shifts using contextual word embeddings
Yang Liu
A. Medlar
D. Głowacka
55
19
0
08 Apr 2021
Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for
  Fine-tuning and Feature Extraction in Word-in-Context Disambiguation
Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation
Huiling You
Xingran Zhu
Sara Stymne
56
2
0
08 Apr 2021
Capturing Multi-Resolution Context by Dilated Self-Attention
Capturing Multi-Resolution Context by Dilated Self-Attention
Niko Moritz
Takaaki Hori
Jonathan Le Roux
32
7
0
07 Apr 2021
Relaxing the Conditional Independence Assumption of CTC-based ASR by
  Conditioning on Intermediate Predictions
Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions
Jumon Nozaki
Tatsuya Komatsu
86
75
0
06 Apr 2021
LT-LM: a novel non-autoregressive language model for single-shot lattice
  rescoring
LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Anton Mitrofanov
Mariya Korenevskaya
Ivan Podluzhny
Yuri Y. Khokhlov
A. Laptev
A. Andrusenko
A. Ilin
M. Korenevsky
Ivan Medennikov
A. Romanenko
KELMLRM
26
2
0
06 Apr 2021
Non-autoregressive Mandarin-English Code-switching Speech Recognition
Non-autoregressive Mandarin-English Code-switching Speech Recognition
Shun-Po Chuang
Heng-Jui Chang
Sung-Feng Huang
Hung-yi Lee
82
15
0
06 Apr 2021
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Yuan Shangguan
Rohit Prabhavalkar
Hang Su
Jay Mahadeokar
Yangyang Shi
...
Chunyang Wu
Duc Le
Ozlem Kalinli
Christian Fuegen
M. Seltzer
55
29
0
06 Apr 2021
Contextualized Streaming End-to-End Speech Recognition with Trie-Based
  Deep Biasing and Shallow Fusion
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Duc Le
Mahaveer Jain
Gil Keren
Suyoun Kim
Yangyang Shi
...
Yuan Shangguan
Christian Fuegen
Ozlem Kalinli
Yatharth Saraf
M. Seltzer
87
102
0
05 Apr 2021
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy
  For Latency
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Yangyang Shi
Varun K. Nagaraja
Chunyang Wu
Jay Mahadeokar
Duc Le
...
Ching-Feng Yeh
Julian Chan
Christian Fuegen
Ozlem Kalinli
M. Seltzer
55
15
0
05 Apr 2021
Semantic Distance: A New Metric for ASR Performance Analysis Towards
  Spoken Language Understanding
Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding
Suyoun Kim
Abhinav Arora
Duc Le
Ching-Feng Yeh
Christian Fuegen
Ozlem Kalinli
M. Seltzer
70
28
0
05 Apr 2021
SPGISpeech: 5,000 hours of transcribed financial audio for fully
  formatted end-to-end speech recognition
SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
Patrick K. O’Neill
Vitaly Lavrukhin
Somshubra Majumdar
Vahid Noroozi
Yuekai Zhang
...
Keenan Freyberg
Michael D. Shulman
Boris Ginsburg
Shinji Watanabe
Georg Kucsko
AI4TS
90
64
0
05 Apr 2021
Timers and Such: A Practical Benchmark for Spoken Language Understanding
  with Numbers
Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers
Loren Lugosch
Piyush Papreja
Mirco Ravanelli
A. Heba
Titouan Parcollet
68
14
0
04 Apr 2021
On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR
On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR
Tsz Kin Lam
Mayumi Ohta
Shigehiko Schamoni
Stefan Riezler
93
27
0
03 Apr 2021
Convex Aggregation for Opinion Summarization
Convex Aggregation for Opinion Summarization
Hayate Iso
Xiaolan Wang
Yoshihiko Suhara
Stefanos Angelidis
W. Tan
76
35
0
03 Apr 2021
Sampling and Filtering of Neural Machine Translation Distillation Data
Sampling and Filtering of Neural Machine Translation Distillation Data
Vilém Zouhar
24
2
0
01 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language
  Pre-training
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLMVLM
94
92
0
01 Apr 2021
Sample size estimation for comparing dynamic treatment regimens in a
  SMART: a Monte Carlo-based approach and case study with longitudinal
  overdispersed count outcomes
Sample size estimation for comparing dynamic treatment regimens in a SMART: a Monte Carlo-based approach and case study with longitudinal overdispersed count outcomes
Jamie Yap
John J. Dziak
David Kabiito
Claire Babirye
J. McKay
Bibhas Chakraborty
J. Nakatumba‐Nabende
64
0
0
31 Mar 2021
Leveraging Neural Machine Translation for Word Alignment
Leveraging Neural Machine Translation for Word Alignment
Vilém Zouhar
Daria Pylypenko
15
2
0
31 Mar 2021
Augmenting Poetry Composition with Verse by Verse
Augmenting Poetry Composition with Verse by Verse
David C. Uthus
M. Voitovich
R. Mical
159
10
0
31 Mar 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
103
84
0
28 Mar 2021
Variable Name Recovery in Decompiled Binary Code using Constrained
  Masked Language Modeling
Variable Name Recovery in Decompiled Binary Code using Constrained Masked Language Modeling
Pratyay Banerjee
Kuntal Kumar Pal
Fish Wang
Chitta Baral
57
13
0
23 Mar 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning
  with Self-Knowledge Distillation
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation
Md. Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
118
6
0
17 Mar 2021
Multi-view Subword Regularization
Multi-view Subword Regularization
Xinyi Wang
Sebastian Ruder
Graham Neubig
82
46
0
15 Mar 2021
Optimal Embedding Calibration for Symbolic Music Similarity
Optimal Embedding Calibration for Symbolic Music Similarity
Xinran Zhang
Maosong Sun
Jiafeng Liu
Xiaobing Li
28
1
0
13 Mar 2021
Text Mining of Stocktwits Data for Predicting Stock Prices
Text Mining of Stocktwits Data for Predicting Stock Prices
Mukul Jaggi
Priyanka Mandal
Shreya Narang
Usman Naseem
Matloob Khushi
AIFin
73
41
0
13 Mar 2021
Comparing the Performance of NLP Toolkits and Evaluation measures in
  Legal Tech
Comparing the Performance of NLP Toolkits and Evaluation measures in Legal Tech
Muhammad Zohaib Khan
ELMAILaw
29
3
0
12 Mar 2021
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
  End-to-End Speech Recognition
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
A. Laptev
A. Andrusenko
Ivan Podluzhny
Anton Mitrofanov
Ivan Medennikov
Yuri N. Matveev
VLM
45
14
0
12 Mar 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
  Representation
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
117
224
0
11 Mar 2021
Towards Continual Learning for Multilingual Machine Translation via
  Vocabulary Substitution
Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution
Xavier Garcia
Noah Constant
Ankur P. Parikh
Orhan Firat
138
46
0
11 Mar 2021
Unified Pre-training for Program Understanding and Generation
Unified Pre-training for Program Understanding and Generation
Wasi Uddin Ahmad
Saikat Chakraborty
Baishakhi Ray
Kai-Wei Chang
147
774
0
10 Mar 2021
Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative
  Adversarial Networks
Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks
Md. Akmal Haidar
Mehdi Rezagholizadeh
111
9
0
10 Mar 2021
Variable-rate discrete representation learning
Variable-rate discrete representation learning
Sander Dieleman
C. Nash
Jesse Engel
Karen Simonyan
BDLDRL
82
24
0
10 Mar 2021
Self-Learning for Zero Shot Neural Machine Translation
Self-Learning for Zero Shot Neural Machine Translation
Surafel Melaku Lakew
Matteo Negri
Marco Turchi
38
1
0
10 Mar 2021
Overcoming Poor Word Embeddings with Word Definitions
Overcoming Poor Word Embeddings with Word Definitions
Christopher Malon
36
3
0
05 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly
  Exponentially with Depth
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
161
388
0
05 Mar 2021
Meta-Curriculum Learning for Domain Adaptation in Neural Machine
  Translation
Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation
Runzhe Zhan
Xuebo Liu
Derek F. Wong
Lidia S. Chao
81
46
0
03 Mar 2021
Data Augmentation for Abstractive Query-Focused Multi-Document
  Summarization
Data Augmentation for Abstractive Query-Focused Multi-Document Summarization
Ramakanth Pasunuru
Asli Celikyilmaz
Michel Galley
Chenyan Xiong
Yizhe Zhang
Joey Tianyi Zhou
Jianfeng Gao
89
41
0
02 Mar 2021
Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in
  Language
Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language
Avia Efrat
Uri Shaham
D. Kilman
Omer Levy
ELM
63
18
0
01 Mar 2021
OmniNet: Omnidirectional Representations from Transformers
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
113
30
0
01 Mar 2021
Unbiased Sentence Encoder For Large-Scale Multi-lingual Search Engines
Unbiased Sentence Encoder For Large-Scale Multi-lingual Search Engines
Mahdi Hajiaghayi
Monir Hajiaghayi
Mark R. Bolin
34
0
0
01 Mar 2021
RuSentEval: Linguistic Source, Encoder Force!
RuSentEval: Linguistic Source, Encoder Force!
Vladislav Mikhailov
Ekaterina Taktasheva
Elina Sigdel
Ekaterina Artemova
VLM
36
6
0
28 Feb 2021
Gradient-guided Loss Masking for Neural Machine Translation
Gradient-guided Loss Masking for Neural Machine Translation
Xinyi Wang
Ankur Bapna
Melvin Johnson
Orhan Firat
65
9
0
26 Feb 2021
Automated essay scoring using efficient transformer-based language
  models
Automated essay scoring using efficient transformer-based language models
C. Ormerod
Akanksha Malhotra
Amir Jafari
46
31
0
25 Feb 2021
Do Transformer Modifications Transfer Across Implementations and
  Applications?
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
100
127
0
23 Feb 2021
Evaluating Contextualized Language Models for Hungarian
Evaluating Contextualized Language Models for Hungarian
Judit Ács
Dániel Lévai
D. Nemeskey
András Kornai
27
1
0
22 Feb 2021
End-to-End Neural Systems for Automatic Children Speech Recognition: An
  Empirical Study
End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Prashanth Gurunath Shivakumar
Shrikanth Narayanan
53
54
0
19 Feb 2021
Previous
123...303132...373839
Next