ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 1,950 papers shown
Title
Neural Networks for Entity Matching: A Survey
Neural Networks for Entity Matching: A Survey
Nils Barlaug
J. Gulla
132
96
0
21 Oct 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low
  Latency Streaming Speech Recognition
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
187
172
0
21 Oct 2020
Towards End-to-End In-Image Neural Machine Translation
Towards End-to-End In-Image Neural Machine Translation
Elman Mansimov
Mitchell Stern
Mengzhao Chen
Orhan Firat
Jakob Uszkoreit
Puneet Jain
75
26
0
20 Oct 2020
Revisiting Modularized Multilingual NMT to Meet Industrial Demands
Revisiting Modularized Multilingual NMT to Meet Industrial Demands
Sungwon Lyu
Bokyung Son
Kichang Yang
Jaekyoung Bae
MoE
64
20
0
19 Oct 2020
Mixed-Lingual Pre-training for Cross-lingual Summarization
Mixed-Lingual Pre-training for Cross-lingual Summarization
Ruochen Xu
Chenguang Zhu
Yu Shi
Michael Zeng
Xuedong Huang
57
26
0
18 Oct 2020
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
T. Tran
Yifan Hu
Changwei Hu
Kevin Yen
Fei Tan
Kyumin Lee
Serim Park
VLM
90
32
0
17 Oct 2020
Cross-Lingual Relation Extraction with Transformers
Cross-Lingual Relation Extraction with Transformers
Jian Ni
Taesun Moon
Parul Awasthy
Radu Florian
ViT
35
6
0
16 Oct 2020
Lightweight End-to-End Speech Recognition from Raw Audio Data Using
  Sinc-Convolutions
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
Ludwig Kurzinger
Nicolas Lindae
Palle Klewitz
Gerhard Rigoll
54
5
0
15 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
387
628
0
13 Oct 2020
The Tatoeba Translation Challenge -- Realistic Data Sets for Low
  Resource and Multilingual MT
The Tatoeba Translation Challenge -- Realistic Data Sets for Low Resource and Multilingual MT
Jörg Tiedemann
234
171
0
13 Oct 2020
Mathematical Word Problem Generation from Commonsense Knowledge Graph
  and Equations
Mathematical Word Problem Generation from Commonsense Knowledge Graph and Equations
Tianqiao Liu
Qian Fang
Wenbiao Ding
Hang Li
Zhongqin Wu
Zitao Liu
120
29
0
13 Oct 2020
Gradient Vaccine: Investigating and Improving Multi-task Optimization in
  Massively Multilingual Models
Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang
Yulia Tsvetkov
Orhan Firat
Yuan Cao
79
202
0
12 Oct 2020
Controlled Hallucinations: Learning to Generate Faithfully from Noisy
  Data
Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data
Katja Filippova
68
113
0
12 Oct 2020
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for
  Sign Language Translation
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
Dongxu Li
Chenchen Xu
Xin Yu
Kaihao Zhang
Ben Swift
H. Suominen
Hongdong Li
SLR
54
124
0
12 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
VLMLRM
107
276
0
11 Oct 2020
Multichannel Generative Language Model: Learning All Possible
  Factorizations Within and Across Channels
Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels
Harris Chan
J. Kiros
William Chan
LRM
21
0
0
09 Oct 2020
On the importance of pre-training data volume for compact language
  models
On the importance of pre-training data volume for compact language models
Vincent Micheli
Martin d'Hoffschmidt
Franccois Fleuret
67
42
0
08 Oct 2020
Super-Human Performance in Online Low-latency Recognition of
  Conversational Speech
Super-Human Performance in Online Low-latency Recognition of Conversational Speech
T. Nguyen
S. Stueker
A. Waibel
BDL
74
38
0
07 Oct 2020
Cross-lingual Extended Named Entity Classification of Wikipedia Articles
Cross-lingual Extended Named Entity Classification of Wikipedia Articles
Viet The Bui
Hong Phuong Le
13
2
0
07 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique
  for Intermediate Layers
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
60
35
0
06 Oct 2020
An Empirical Study of Tokenization Strategies for Various Korean NLP
  Tasks
An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks
Kyubyong Park
Joohong Lee
Seongbo Jang
Dawoon Jung
64
65
0
06 Oct 2020
Multi-task Learning for Multilingual Neural Machine Translation
Multi-task Learning for Multilingual Neural Machine Translation
Yiren Wang
Chengxiang Zhai
Hany Awadalla
88
69
0
06 Oct 2020
Efficient Inference For Neural Machine Translation
Efficient Inference For Neural Machine Translation
Y. Hsu
Sarthak Garg
Yi-Hsiu Liao
Ilya Chatsviorkin
AI4CE
50
12
0
06 Oct 2020
An Ensemble Approach for Automatic Structuring of Radiology Reports
An Ensemble Approach for Automatic Structuring of Radiology Reports
Morteza Pourreza Shahri
A. Tahmasebi
Bingyang Ye
Henghui Zhu
J. Aslam
T. Ferris
13
2
0
05 Oct 2020
Gauravarora@HASOC-Dravidian-CodeMix-FIRE2020: Pre-training ULMFiT on
  Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gauravarora@HASOC-Dravidian-CodeMix-FIRE2020: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gaurav Arora
41
27
0
05 Oct 2020
The Grammar of Emergent Languages
The Grammar of Emergent Languages
Oskar van der Wal
S. D. Boer
Elia Bruni
Dieuwke Hupkes
76
16
0
05 Oct 2020
Improving Target-side Lexical Transfer in Multilingual Neural Machine
  Translation
Improving Target-side Lexical Transfer in Multilingual Neural Machine Translation
Luyu Gao
Xinyi Wang
Graham Neubig
102
7
0
04 Oct 2020
Differentiable Weighted Finite-State Transducers
Differentiable Weighted Finite-State Transducers
Awni Y. Hannun
Vineel Pratap
Jacob Kahn
Wei-Ning Hsu
118
29
0
02 Oct 2020
STIL -- Simultaneous Slot Filling, Translation, Intent Classification,
  and Language Identification: Initial Results using mBART on MultiATIS++
STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++
Jack G. M. FitzGerald
69
13
0
02 Oct 2020
Nearest Neighbor Machine Translation
Nearest Neighbor Machine Translation
Urvashi Khandelwal
Angela Fan
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
78
288
0
01 Oct 2020
On Romanization for Model Transfer Between Scripts in Neural Machine
  Translation
On Romanization for Model Transfer Between Scripts in Neural Machine Translation
Chantal Amrhein
Rico Sennrich
80
15
0
30 Sep 2020
Rethinking Attention with Performers
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
194
1,605
0
30 Sep 2020
Generative latent neural models for automatic word alignment
Generative latent neural models for automatic word alignment
Anh Khoa Ngo Ho
François Yvon
DRL
56
2
0
28 Sep 2020
iNLTK: Natural Language Toolkit for Indic Languages
iNLTK: Natural Language Toolkit for Indic Languages
Gaurav Arora
VLM
61
66
0
26 Sep 2020
A little goes a long way: Improving toxic language classification
  despite data scarcity
A little goes a long way: Improving toxic language classification despite data scarcity
Mika Juuti
Tommi Gröndahl
Adrian Flanagan
Nirmal Asokan
94
25
0
25 Sep 2020
The importance of fillers for text representations of speech transcripts
The importance of fillers for text representations of speech transcripts
Tanvi Dinkar
Pierre Colombo
Matthieu Labeau
Chloé Clavel
150
24
0
23 Sep 2020
Harnessing Multilinguality in Unsupervised Machine Translation for Rare
  Languages
Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages
Xavier Garcia
Aditya Siddhant
Orhan Firat
Ankur P. Parikh
86
31
0
23 Sep 2020
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech
  Recognition Baseline
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline
Yerbolat Khassanov
Saida Mussakhojayeva
A. Mirzakhmetov
A. Adiyev
Mukhamet Nurpeiissov
H. A. Varol
57
31
0
22 Sep 2020
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New
  Datasets for Bengali-English Machine Translation
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Tahmid Hasan
Abhik Bhattacharjee
Kazi Samin Mubasshir
Masum Hasan
Madhusudan Basak
M. Rahman
Rifat Shahriyar
VLM
82
77
0
20 Sep 2020
Will it Unblend?
Will it Unblend?
Yuval Pinter
Cassandra L. Jacobs
Jacob Eisenstein
66
14
0
18 Sep 2020
Automated Source Code Generation and Auto-completion Using Deep
  Learning: Comparing and Discussing Current Language-Model-Related Approaches
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches
Juan Cruz-Benito
Sanjay Vishwakarma
Francisco Martín-Fernández
Ismael Faro Ibm Quantum
66
31
0
16 Sep 2020
Extremely Low Bit Transformer Quantization for On-Device Neural Machine
  Translation
Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
Insoo Chung
Byeongwook Kim
Yoonjung Choi
S. Kwon
Yongkweon Jeon
Baeseong Park
Sangha Kim
Dongsoo Lee
MQ
91
27
0
16 Sep 2020
Multi-span Style Extraction for Generative Reading Comprehension
Multi-span Style Extraction for Generative Reading Comprehension
Junjie Yang
Zhuosheng Zhang
Hai Zhao
SyDa
51
14
0
15 Sep 2020
Iterative Refinement in the Continuous Space for Non-Autoregressive
  Neural Machine Translation
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation
Jason D. Lee
Raphael Shu
Kyunghyun Cho
61
26
0
15 Sep 2020
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition
Soohwan Kim
Seyoung Bae
Cheolhwang Won
VLM
24
5
0
07 Sep 2020
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a
  Multi-Task Learning Architecture for Memotion Analysis
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion Analysis
G. Vlad
George-Eduard Zaharia
Dumitru-Clementin Cercel
Costin-Gabriel Chiru
Stefan Trausan-Matu
69
31
0
06 Sep 2020
GREEK-BERT: The Greeks visiting Sesame Street
GREEK-BERT: The Greeks visiting Sesame Street
John Koutsikakis
Ilias Chalkidis
Prodromos Malakasiotis
Ion Androutsopoulos
70
92
0
27 Aug 2020
Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting
  for Label Imbalance
Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label Imbalance
Selim F. Yilmaz
E. Kaynak
Aykut Koç
H. Dibeklioğlu
Suleyman S. Kozat
64
28
0
26 Aug 2020
JokeMeter at SemEval-2020 Task 7: Convolutional humor
JokeMeter at SemEval-2020 Task 7: Convolutional humor
Martin Docekal
Martin Fajcik
Josef Jon
Pavel Smrz
58
2
0
25 Aug 2020
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
Meghana Bhange
Nirant Kasliwal
38
7
0
22 Aug 2020
Previous
123...333435...373839
Next