ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 1,950 papers shown
Title
Latent Programmer: Discrete Latent Codes for Program Synthesis
Latent Programmer: Discrete Latent Codes for Program Synthesis
Joey Hong
David Dohan
Rishabh Singh
Charles Sutton
Manzil Zaheer
131
23
0
01 Dec 2020
Bootstrap an end-to-end ASR system by multilingual training, transfer
  learning, text-to-text mapping and synthetic audio
Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio
Manuel Giollo
Deniz Gunceler
Yulan Liu
D. Willett
45
12
0
25 Nov 2020
Streaming Multi-speaker ASR with RNN-T
Streaming Multi-speaker ASR with RNN-T
Ilya Sklyar
A. Piunova
Yulan Liu
80
37
0
23 Nov 2020
Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary
  Words in End-To-End ASR Systems
Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems
Xianrui Zheng
Yulan Liu
Deniz Gunceler
D. Willett
129
79
0
23 Nov 2020
Evaluating Input Representation for Language Identification in
  Hindi-English Code Mixed Text
Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text
Ramchandra Joshi
Raviraj Joshi
43
14
0
23 Nov 2020
Facebook AI's WMT20 News Translation Task Submission
Facebook AI's WMT20 News Translation Task Submission
Peng-Jen Chen
Ann Lee
Changhan Wang
Naman Goyal
Angela Fan
Mary Williamson
Jiatao Gu
VLM
99
37
0
16 Nov 2020
End-to-end spoken language understanding using transformer networks and
  self-supervised pre-trained features
End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features
E. Morais
H. Kuo
Samuel Thomas
Zoltán Tüske
Brian Kingsbury
41
12
0
16 Nov 2020
Deep Shallow Fusion for RNN-T Personalization
Deep Shallow Fusion for RNN-T Personalization
Duc Le
Gil Keren
Julian Chan
Jay Mahadeokar
Christian Fuegen
M. Seltzer
76
80
0
16 Nov 2020
EDITOR: an Edit-Based Transformer with Repositioning for Neural Machine
  Translation with Soft Lexical Constraints
EDITOR: an Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints
Weijia Xu
Marine Carpuat
KELM
64
47
0
13 Nov 2020
Towards Semi-Supervised Semantics Understanding from Speech
Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Jeff Lai
Jin Cao
S. Bodapati
Shang-Wen Li
SSL
90
7
0
11 Nov 2020
Low-Resource Adaptation of Neural NLP Models
Low-Resource Adaptation of Neural NLP Models
Farhad Nooralahzadeh
78
0
0
09 Nov 2020
Listen, Look and Deliberate: Visual context-aware speech recognition
  using pre-trained text-video representations
Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations
Shahram Ghorbani
Yashesh Gaur
Yu Shi
Jinyu Li
69
14
0
08 Nov 2020
Dual Application of Speech Enhancement for Automatic Speech Recognition
Dual Application of Speech Enhancement for Automatic Speech Recognition
Ashutosh Pandey
Chunxi Liu
Yun Wang
Yatharth Saraf
86
37
0
07 Nov 2020
Improving RNN Transducer Based ASR with Auxiliary Tasks
Improving RNN Transducer Based ASR with Auxiliary Tasks
Chunxi Liu
Frank Zhang
Duc Le
Suyoun Kim
Yatharth Saraf
Geoffrey Zweig
87
49
0
05 Nov 2020
Alignment Restricted Streaming Recurrent Neural Network Transducer
Alignment Restricted Streaming Recurrent Neural Network Transducer
Jay Mahadeokar
Yuan Shangguan
Duc Le
Gil Keren
Hang Su
Thong Le
Ching-Feng Yeh
Christian Fuegen
M. Seltzer
AI4TS
70
66
0
05 Nov 2020
Investigating Societal Biases in a Poetry Composition System
Investigating Societal Biases in a Poetry Composition System
Emily Sheng
David C. Uthus
83
53
0
05 Nov 2020
Indic-Transformers: An Analysis of Transformer Language Models for
  Indian Languages
Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages
Kushal Kumar Jain
Adwait Deshpande
Kumar Shridhar
F. Laumann
Ayushman Dash
78
52
0
04 Nov 2020
Data Augmentation for End-to-end Code-switching Speech Recognition
Data Augmentation for End-to-end Code-switching Speech Recognition
Chenpeng Du
Hao Li
Yizhou Lu
Lan Wang
Y. Qian
57
28
0
04 Nov 2020
PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on
  User-Generated Contents
PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents
Ryoske Fujii
Masato Mita
Kaori Abe
Kazuaki Hanawa
Makoto Morishita
Jun Suzuki
Kentaro Inui
43
5
0
04 Nov 2020
SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End
  Simultaneous Speech Translation
SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
Xutai Ma
J. Pino
Philipp Koehn
72
97
0
03 Nov 2020
Integration of speech separation, diarization, and recognition for
  multi-speaker meetings: System description, comparison, and analysis
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
Desh Raj
Pavel Denisov
Zhuo Chen
Hakan Erdogan
Zili Huang
...
Yi Luo
Naoyuki Kanda
Jinyu Li
Scott Wisdom
J. Hershey
63
88
0
03 Nov 2020
Streaming Attention-Based Models with Augmented Memory for End-to-End
  Speech Recognition
Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition
Ching-Feng Yeh
Yongqiang Wang
Yangyang Shi
Chunyang Wu
Frank Zhang
Julian Chan
M. Seltzer
AI4TSRALM
76
8
0
03 Nov 2020
Multitask Learning and Joint Optimization for Transformer-RNN-Transducer
  Speech Recognition
Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition
J. Jeon
Eesung Kim
39
13
0
02 Nov 2020
SMRT Chatbots: Improving Non-Task-Oriented Dialog with Simulated
  Multiple Reference Training
SMRT Chatbots: Improving Non-Task-Oriented Dialog with Simulated Multiple Reference Training
Huda Khayrallah
João Sedoc
OffRL
33
1
0
01 Nov 2020
VECO: Variable and Flexible Cross-lingual Pre-training for Language
  Understanding and Generation
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation
Fuli Luo
Wei Wang
Jiahao Liu
Yijia Liu
Bin Bi
Songfang Huang
Fei Huang
Luo Si
106
52
0
30 Oct 2020
Semi-Supervised Speech Recognition via Graph-based Temporal
  Classification
Semi-Supervised Speech Recognition via Graph-based Temporal Classification
Niko Moritz
Takaaki Hori
Jonathan Le Roux
107
28
0
29 Oct 2020
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer
  for Speech Recognition
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition
Ruchao Fan
Wei Chu
Peng Chang
Jing Xiao
53
36
0
28 Oct 2020
Learning Contextualised Cross-lingual Word Embeddings and Alignments for
  Extremely Low-Resource Languages Using Parallel Corpora
Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora
Takashi Wada
Tomoharu Iwata
Yuji Matsumoto
Timothy Baldwin
Jey Han Lau
107
7
0
27 Oct 2020
Language ID in the Wild: Unexpected Challenges on the Path to a
  Thousand-Language Web Text Corpus
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Isaac Caswell
Theresa Breiner
D. Esch
Ankur Bapna
92
90
0
27 Oct 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
89
263
0
26 Oct 2020
Improved Neural Language Model Fusion for Streaming Recurrent Neural
  Network Transducer
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
Suyoun Kim
Shangguan Yuan
Jay Mahadeokar
A. Bruguier
Christian Fuegen
M. Seltzer
Duc Le
63
29
0
26 Oct 2020
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech
  and Language Model Pretraining
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
Cheng-I Jeff Lai
Yung-Sung Chuang
Hung-yi Lee
Shang-Wen Li
James R. Glass
VLMSSL
90
60
0
26 Oct 2020
Revisiting Neural Language Modelling with Syllables
Revisiting Neural Language Modelling with Syllables
Arturo Oncevay
Kervy Rivas Rojas
50
2
0
24 Oct 2020
When Being Unseen from mBERT is just the Beginning: Handling New
  Languages With Multilingual Language Models
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
Benjamin Muller
Antonis Anastasopoulos
Benoît Sagot
Djamé Seddah
LRM
209
170
0
24 Oct 2020
Rethinking embedding coupling in pre-trained language models
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
172
143
0
24 Oct 2020
Measuring the `I don't know' Problem through the Lens of Gricean
  Quantity
Measuring the `I don't know' Problem through the Lens of Gricean Quantity
Huda Khayrallah
João Sedoc
43
4
0
24 Oct 2020
Improving Multilingual Models with Language-Clustered Vocabularies
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
129
65
0
24 Oct 2020
Rapid Domain Adaptation for Machine Translation with Monolingual Data
Rapid Domain Adaptation for Machine Translation with Monolingual Data
Mahdis Mahdieh
Mengzhao Chen
Yuan Cao
Orhan Firat
75
7
0
23 Oct 2020
Event-Driven Learning of Systematic Behaviours in Stock Markets
Event-Driven Learning of Systematic Behaviours in Stock Markets
Xianchao Wu
AIFin
49
7
0
23 Oct 2020
Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian
  Tweets
Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets
Gaurish Thakkar
Marcis Pinnis
95
9
0
23 Oct 2020
BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
Moussa Kamal Eddine
A. Tixier
Michalis Vazirgiannis
BDL
137
65
0
23 Oct 2020
UniCase -- Rethinking Casing in Language Models
UniCase -- Rethinking Casing in Language Models
Rafal Powalski
Tomasz Stanislawek
36
4
0
22 Oct 2020
mT5: A massively multilingual pre-trained text-to-text transformer
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
162
2,569
0
22 Oct 2020
XOR QA: Cross-lingual Open-Retrieval Question Answering
XOR QA: Cross-lingual Open-Retrieval Question Answering
Akari Asai
Jungo Kasai
J. Clark
Kenton Lee
Eunsol Choi
Hannaneh Hajishirzi
129
152
0
22 Oct 2020
Towards Fully Bilingual Deep Language Modeling
Towards Fully Bilingual Deep Language Modeling
Li-Hsin Chang
S. Pyysalo
Jenna Kanerva
Filip Ginter
67
3
0
22 Oct 2020
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
Tatiana Likhomanenko
Qiantong Xu
Jacob Kahn
Gabriel Synnaeve
R. Collobert
VLM
136
65
0
22 Oct 2020
Self-training and Pre-training are Complementary for Speech Recognition
Self-training and Pre-training are Complementary for Speech Recognition
Qiantong Xu
Alexei Baevski
Tatiana Likhomanenko
Paden Tomasello
Alexis Conneau
R. Collobert
Gabriel Synnaeve
Michael Auli
SSLVLM
141
173
0
22 Oct 2020
Stronger Transformers for Neural Multi-Hop Question Generation
Stronger Transformers for Neural Multi-Hop Question Generation
Devendra Singh Sachan
Lingfei Wu
Mrinmaya Sachan
William L. Hamilton
34
8
0
22 Oct 2020
Cascaded Models With Cyclic Feedback For Direct Speech Translation
Cascaded Models With Cyclic Feedback For Direct Speech Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
94
13
0
21 Oct 2020
Beyond English-Centric Multilingual Machine Translation
Beyond English-Centric Multilingual Machine Translation
Angela Fan
Shruti Bhosale
Holger Schwenk
Zhiyi Ma
Ahmed El-Kishky
...
Vitaliy Liptchinsky
Sergey Edunov
Edouard Grave
Michael Auli
Armand Joulin
LRM
98
861
0
21 Oct 2020
Previous
123...323334...373839
Next