ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 1,950 papers shown
Title
Simultaneous Speech Translation for Live Subtitling: from Delay to
  Display
Simultaneous Speech Translation for Live Subtitling: from Delay to Display
Alina Karakanta
Sara Papi
Matteo Negri
Marco Turchi
57
10
0
19 Jul 2021
Integrating Unsupervised Data Generation into Self-Supervised Neural
  Machine Translation for Low-Resource Languages
Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages
Dana Ruiter
Dietrich Klakow
Josef van Genabith
C. España-Bonet
72
9
0
19 Jul 2021
Stock Movement Prediction with Financial News using Contextualized
  Embedding from BERT
Stock Movement Prediction with Financial News using Contextualized Embedding from BERT
Qinkai Chen
AIFin
50
19
0
19 Jul 2021
FST: the FAIR Speech Translation System for the IWSLT21 Multilingual
  Shared Task
FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task
Yun Tang
Hongyu Gong
Xian Li
Changhan Wang
J. Pino
Holger Schwenk
Naman Goyal
61
10
0
14 Jul 2021
Between Flexibility and Consistency: Joint Generation of Captions and
  Subtitles
Between Flexibility and Consistency: Joint Generation of Captions and Subtitles
Alina Karakanta
Marco Gaido
Matteo Negri
Marco Turchi
68
9
0
13 Jul 2021
The IWSLT 2021 BUT Speech Translation Systems
The IWSLT 2021 BUT Speech Translation Systems
Hari Krishna Vydana
Martin Karafiát
L. Burget
J. Černocký
30
2
0
13 Jul 2021
Zero-shot Speech Translation
Zero-shot Speech Translation
Tu Anh Dinh
79
6
0
13 Jul 2021
Improving Speech Translation by Understanding and Learning from the
  Auxiliary Text Translation Task
Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task
Yun Tang
J. Pino
Xian Li
Changhan Wang
Dmitriy Genzel
175
84
0
12 Jul 2021
Noisy Training Improves E2E ASR for the Edge
Noisy Training Improves E2E ASR for the Edge
Dilin Wang
Yuan Shangguan
Haichuan Yang
P. Chuang
Jiatong Zhou
Meng Li
Ganesh Venkatesh
Ozlem Kalinli
Vikas Chandra
66
4
0
09 Jul 2021
A Survey on Low-Resource Neural Machine Translation
A Survey on Low-Resource Neural Machine Translation
Rui Wang
Xu Tan
Renqian Luo
Tao Qin
Tie-Yan Liu
3DV
98
61
0
09 Jul 2021
On lattice-free boosted MMI training of HMM and CTC-based full-context
  ASR models
On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Xiaohui Zhang
Vimal Manohar
David C. Zhang
Frank Zhang
Yangyang Shi
Nayan Singhal
Julian Chan
Fuchun Peng
Yatharth Saraf
M. Seltzer
83
14
0
09 Jul 2021
Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces
  and Conformers
Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers
Huahuan Zheng
Wenjie Peng
Zhijian Ou
Jinsong Zhang
94
5
0
07 Jul 2021
What Helps Transformers Recognize Conversational Structure? Importance
  of Context, Punctuation, and Labels in Dialog Act Recognition
What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition
Piotr Żelasko
R. Pappagari
Najim Dehak
60
14
0
05 Jul 2021
Arabic Code-Switching Speech Recognition using Monolingual Data
Arabic Code-Switching Speech Recognition using Monolingual Data
Ahmed M. Ali
Shammur A. Chowdhury
A. Hussein
Yasser Hifny
64
24
0
04 Jul 2021
Relaxed Attention: A Simple Method to Boost Performance of End-to-End
  Automatic Speech Recognition
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition
Timo Lohrenz
P. Schwarz
Zhengyang Li
Tim Fingscheidt
50
11
0
02 Jul 2021
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech
  Recognition
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition
Niko Moritz
Takaaki Hori
Jonathan Le Roux
59
21
0
02 Jul 2021
A Primer on Pretrained Multilingual Language Models
A Primer on Pretrained Multilingual Language Models
Sumanth Doddapaneni
Gowtham Ramesh
Mitesh M. Khapra
Anoop Kunchukuttan
Pratyush Kumar
LRM
115
76
0
01 Jul 2021
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at
  IWSLT 2021
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Dan Liu
Mengge Du
Xiaoxi Li
Yuchen Hu
Lirong Dai
96
21
0
01 Jul 2021
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Zewen Chi
Shaohan Huang
Li Dong
Shuming Ma
Bo Zheng
...
Payal Bajaj
Xia Song
Xian-Ling Mao
Heyan Huang
Furu Wei
104
121
0
30 Jun 2021
IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
Pavel Denisov
Manuel Mager
Ngoc Thang Vu
41
6
0
30 Jun 2021
Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale
  Biomedical Semantic Indexing and Question Answering
Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
A. Nentidis
Anastasia Krithara
K. Bougiatiotis
Martin Krallinger
Carlos Rodríguez-Penagos
Marta Villegas
George Giannakopoulos
127
32
0
28 Jun 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
218
791
0
25 Jun 2021
DeltaLM: Encoder-Decoder Pre-training for Language Generation and
  Translation by Augmenting Pretrained Multilingual Encoders
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Alexandre Muzio
Saksham Singhal
Hany Awadalla
Xia Song
Furu Wei
SLRAI4CE
81
81
0
25 Jun 2021
Domain-Specific Pretraining for Vertical Search: Case Study on
  Biomedical Literature
Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature
Yu Wang
Jinchao Li
Tristan Naumann
Chenyan Xiong
Hao Cheng
...
Yang Qin
Eric Horvitz
Paul N. Bennett
Jianfeng Gao
Hoifung Poon
OOD
85
14
0
25 Jun 2021
Where are we in semantic concept extraction for Spoken Language
  Understanding?
Where are we in semantic concept extraction for Spoken Language Understanding?
Sahar Ghannay
Antoine Caubrière
Salima Mdhaffar
G. Laperriere
Bassam Jabaian
Yannick Esteve
46
18
0
24 Jun 2021
Charformer: Fast Character Transformers via Gradient-based Subword
  Tokenization
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
155
162
0
23 Jun 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional
  Encoding
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Shengjie Luo
Shanda Li
Tianle Cai
Di He
Dinglan Peng
Shuxin Zheng
Guolin Ke
Liwei Wang
Tie-Yan Liu
95
50
0
23 Jun 2021
End-to-End Lexically Constrained Machine Translation for Morphologically
  Rich Languages
End-to-End Lexically Constrained Machine Translation for Morphologically Rich Languages
Josef Jon
João Paulo Aires
Duvsan Varivs
Ondrej Bojar
58
14
0
23 Jun 2021
Information Retrieval for ZeroSpeech 2021: The Submission by University
  of Wroclaw
Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
J. Chorowski
Grzegorz Ciesielski
Jaroslaw Dzikowski
Adrian Lañcucki
R. Marxer
Mateusz Opala
P. Pusz
Paweł Rychlikowski
Michal Stypulkowski
72
12
0
22 Jun 2021
CPM-2: Large-scale Cost-effective Pre-trained Language Models
CPM-2: Large-scale Cost-effective Pre-trained Language Models
Zhengyan Zhang
Yuxian Gu
Xu Han
Shengqi Chen
Chaojun Xiao
...
Minlie Huang
Wentao Han
Yang Liu
Xiaoyan Zhu
Maosong Sun
MoE
90
88
0
20 Jun 2021
JointGT: Graph-Text Joint Representation Learning for Text Generation
  from Knowledge Graphs
JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs
Pei Ke
Haozhe Ji
Yuanyuan Ran
Xin Cui
Liwei Wang
Linfeng Song
Xiaoyan Zhu
Minlie Huang
111
97
0
19 Jun 2021
Transformers for Headline Selection for Russian News Clusters
Transformers for Headline Selection for Russian News Clusters
Pavel Voropaev
Olga Sopilnyak
38
0
0
19 Jun 2021
An Improved Single Step Non-autoregressive Transformer for Automatic
  Speech Recognition
An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition
Ruchao Fan
Wei Chu
Peng Chang
Jing Xiao
Abeer Alwan
75
15
0
18 Jun 2021
Layer Pruning on Demand with Intermediate CTC
Layer Pruning on Demand with Intermediate CTC
Jaesong Lee
Jingu Kang
Shinji Watanabe
40
18
0
17 Jun 2021
Specializing Multilingual Language Models: An Empirical Study
Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau
Noah A. Smith
127
27
0
16 Jun 2021
Collaborative Training of Acoustic Encoders for Speech Recognition
Collaborative Training of Acoustic Encoders for Speech Recognition
Varun K. Nagaraja
Yangyang Shi
Ganesh Venkatesh
Ozlem Kalinli
M. Seltzer
Vikas Chandra
88
12
0
16 Jun 2021
Consistency Regularization for Cross-Lingual Fine-Tuning
Consistency Regularization for Cross-Lingual Fine-Tuning
Bo Zheng
Li Dong
Shaohan Huang
Wenhui Wang
Zewen Chi
Saksham Singhal
Wanxiang Che
Ting Liu
Xia Song
Furu Wei
55
58
0
15 Jun 2021
Language Tags Matter for Zero-Shot Neural Machine Translation
Language Tags Matter for Zero-Shot Neural Machine Translation
Liwei Wu
Shanbo Cheng
Mingxuan Wang
Lei Li
3DV
83
37
0
15 Jun 2021
SynthASR: Unlocking Synthetic Data for Speech Recognition
SynthASR: Unlocking Synthetic Data for Speech Recognition
A. Fazel
Wei Yang
Yulan Liu
Roberto Barra-Chicote
Yi Meng
Roland Maas
J. Droppo
SyDa
110
51
0
14 Jun 2021
Kaizen: Continuously improving teacher using Exponential Moving Average
  for semi-supervised speech recognition
Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition
Vimal Manohar
Tatiana Likhomanenko
Qiantong Xu
Wei-Ning Hsu
R. Collobert
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
105
26
0
14 Jun 2021
Can BERT Dig It? -- Named Entity Recognition for Information Retrieval
  in the Archaeology Domain
Can BERT Dig It? -- Named Entity Recognition for Information Retrieval in the Archaeology Domain
Alex Brandsen
Suzan Verberne
K. Lambers
M. Wansleeben
55
38
0
14 Jun 2021
Evaluating Various Tokenizers for Arabic Text Classification
Evaluating Various Tokenizers for Arabic Text Classification
Zaid Alyafeai
Maged S. Al-Shaibani
Mustafa Ghaleb
Irfan Ahmad
84
44
0
14 Jun 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of
  Transcribed Audio
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
125
385
0
13 Jun 2021
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized
  Streaming ASR
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR
Junkun Chen
Mingbo Ma
Renjie Zheng
Liang Huang
75
33
0
11 Jun 2021
Dynamic Language Models for Continuously Evolving Content
Dynamic Language Models for Continuously Evolving Content
Spurthi Amba Hombaiah
Tao Chen
Mingyang Zhang
Michael Bendersky
Marc Najork
CLLKELM
103
38
0
11 Jun 2021
Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language
  Generation
Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation
Xin Liu
Baosong Yang
Dayiheng Liu
Haibo Zhang
Weihua Luo
Min Zhang
Haiying Zhang
Jinsong Su
63
18
0
11 Jun 2021
Balanced End-to-End Monolingual pre-training for Low-Resourced Indic
  Languages Code-Switching Speech Recognition
Balanced End-to-End Monolingual pre-training for Low-Resourced Indic Languages Code-Switching Speech Recognition
A. Hussein
Shammur A. Chowdhury
Najim Dehak
Ahmed M. Ali
39
2
0
10 Jun 2021
Linguistically Informed Masking for Representation Learning in the
  Patent Domain
Linguistically Informed Masking for Representation Learning in the Patent Domain
Sophia Althammer
Mark Buckley
Sebastian Hofstatter
Allan Hanbury
59
11
0
10 Jun 2021
Exploring Unsupervised Pretraining Objectives for Machine Translation
Exploring Unsupervised Pretraining Objectives for Machine Translation
Christos Baziotis
Ivan Titov
Alexandra Birch
Barry Haddow
AAMLAI4CE
49
8
0
10 Jun 2021
Shades of BLEU, Flavours of Success: The Case of MultiWOZ
Shades of BLEU, Flavours of Success: The Case of MultiWOZ
Tomás Nekvinda
Ondrej Dusek
74
59
0
10 Jun 2021
Previous
123...282930...373839
Next