ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 1,950 papers shown
Title
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Xun Gong
Yu-Huan Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Y. Qian
RALM
67
11
0
17 Nov 2022
Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy
  Environments
Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments
Dominik Wagner
Ilja Baumann
Sebastian P. Bayerl
Korbinian Riedhammer
Tobias Bocklet
77
2
0
16 Nov 2022
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
  Coding Networks
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks
Tommaso Salvatori
Yuhang Song
Yordan Yordanov
Beren Millidge
Zheng R. Xu
Lei Sha
Cornelius Emde
Rafal Bogacz
Thomas Lukasiewicz
99
13
0
16 Nov 2022
Findings of the Covid-19 MLIA Machine Translation Task
Findings of the Covid-19 MLIA Machine Translation Task
F. Casacuberta
Alexandru Ceausu
K. Choukri
Miltos Deligiannis
Miguel Domingo
...
V. Papavassiliou
Stelios Piperidis
Prokopis Prokopidis
Dimitris Roussis
M. Salah
30
0
0
14 Nov 2022
Calibrated Interpretation: Confidence Estimation in Semantic Parsing
Calibrated Interpretation: Confidence Estimation in Semantic Parsing
Elias Stengel-Eskin
Benjamin Van Durme
UQLM
160
25
0
14 Nov 2022
ALBERT with Knowledge Graph Encoder Utilizing Semantic Similarity for
  Commonsense Question Answering
ALBERT with Knowledge Graph Encoder Utilizing Semantic Similarity for Commonsense Question Answering
Byeongmin Choi
Yong-Sook Lee
Yeunwoong Kyung
Eunchan Kim
55
10
0
14 Nov 2022
Addressing Segmentation Ambiguity in Neural Linguistic Steganography
Addressing Segmentation Ambiguity in Neural Linguistic Steganography
Jumon Nozaki
Yugo Murawaki
44
5
0
12 Nov 2022
Speech-to-Speech Translation For A Real-world Unwritten Language
Speech-to-Speech Translation For A Real-world Unwritten Language
Peng-Jen Chen
Ke M. Tran
Yilin Yang
Jingfei Du
Justine T. Kao
...
Sravya Popuri
Changhan Wang
J. Pino
Wei-Ning Hsu
Ann Lee
93
26
0
11 Nov 2022
Using Developer Discussions to Guide Fixing Bugs in Software
Using Developer Discussions to Guide Fixing Bugs in Software
Sheena Panthaplackel
Miloš Gligorić
Junyi Jessy Li
Raymond J. Mooney
56
5
0
11 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
474
2,398
0
09 Nov 2022
Self-conditioned Embedding Diffusion for Text Generation
Self-conditioned Embedding Diffusion for Text Generation
Robin Strudel
Corentin Tallec
Florent Altché
Yilun Du
Yaroslav Ganin
...
Will Grathwohl
Nikolay Savinov
Sander Dieleman
Laurent Sifre
Rémi Leblond
DiffM
89
88
0
08 Nov 2022
Conciseness: An Overlooked Language Task
Conciseness: An Overlooked Language Task
Felix Stahlberg
Aashish Kumar
Chris Alberti
Shankar Kumar
40
1
0
08 Nov 2022
Streaming, fast and accurate on-device Inverse Text Normalization for
  Automatic Speech Recognition
Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Yashesh Gaur
Nick Kibre
Jian Xue
Kangyuan Shu
Yuhui Wang
Issac Alphonso
Jinyu Li
Jiawei Liu
34
7
0
07 Nov 2022
Predictive Coding beyond Gaussian Distributions
Predictive Coding beyond Gaussian Distributions
Luca Pinchetti
Tommaso Salvatori
Yordan Yordanov
Beren Millidge
Yuhang Song
Thomas Lukasiewicz
UQCVBDL
74
11
0
07 Nov 2022
Biased Self-supervised learning for ASR
Biased Self-supervised learning for ASR
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
87
3
0
04 Nov 2022
Phonetic-assisted Multi-Target Units Modeling for Improving
  Conformer-Transducer ASR system
Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Li Li
Dongxing Xu
Haoran Wei
Yanhua Long
98
2
0
03 Nov 2022
Continual Learning of Neural Machine Translation within Low Forgetting
  Risk Regions
Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions
Shuhao Gu
Bojie Hu
Yang Feng
CLL
85
15
0
03 Nov 2022
Conversation-oriented ASR with multi-look-ahead CBS architecture
Conversation-oriented ASR with multi-look-ahead CBS architecture
Huaibo Zhao
S. Fujie
Tetsuji Ogawa
Jin Sakuma
Yusuke Kida
Tetsunori Kobayashi
92
3
0
02 Nov 2022
Fast and parallel decoding for transducer
Fast and parallel decoding for transducer
Wei Kang
Liyong Guo
Fangjun Kuang
Long Lin
Mingshuang Luo
Zengwei Yao
Xiaoyu Yang
Piotr Żelasko
Daniel Povey
AI4TS
80
17
0
31 Oct 2022
Efficient Speech Translation with Dynamic Latent Perceivers
Efficient Speech Translation with Dynamic Latent Perceivers
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
52
3
0
28 Oct 2022
Modeling structure-building in the brain with CCG parsing and large
  language models
Modeling structure-building in the brain with CCG parsing and large language models
Miloš Stanojević
Jonathan Brennan
Donald Dunagan
Mark Steedman
John T. Hale
44
14
0
28 Oct 2022
Random Utterance Concatenation Based Data Augmentation for Improving
  Short-video Speech Recognition
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Yist Y. Lin
Tao Han
Haihua Xu
Van Tung Pham
Yerbolat Khassanov
Tze Yuang Chong
Yi He
Lu Lu
Zejun Ma
65
2
0
28 Oct 2022
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Nobuyuki Morioka
Heiga Zen
Nanxin Chen
Yu Zhang
Yifan Ding
98
16
0
28 Oct 2022
Domain Adaptation of Machine Translation with Crowdworkers
Domain Adaptation of Machine Translation with Crowdworkers
Makoto Morishita
Jun Suzuki
Masaaki Nagata
44
3
0
28 Oct 2022
Token-level Sequence Labeling for Spoken Language Understanding using
  Compositional End-to-End Models
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Siddhant Arora
Siddharth Dalmia
Brian Yan
Florian Metze
A. Black
Shinji Watanabe
37
12
0
27 Oct 2022
ACES: Translation Accuracy Challenge Sets for Evaluating Machine
  Translation Metrics
ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics
Chantal Amrhein
Nikita Moghe
Liane Guillou
ELM
106
23
0
27 Oct 2022
Make More of Your Data: Minimal Effort Data Augmentation for Automatic
  Speech Recognition and Translation
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
VLM
86
10
0
27 Oct 2022
Can language models handle recursively nested grammatical structures? A
  case study on comparing models and humans
Can language models handle recursively nested grammatical structures? A case study on comparing models and humans
Andrew Kyle Lampinen
ReLMELM
121
36
0
27 Oct 2022
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic
  Forgetting in Automatic Speech Recognition
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition
Steven Vander Eeckt
Hugo Van hamme
CLLMoMe
113
15
0
27 Oct 2022
Too Brittle To Touch: Comparing the Stability of Quantization and
  Distillation Towards Developing Lightweight Low-Resource MT Models
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
79
5
0
27 Oct 2022
End-to-End Speech to Intent Prediction to improve E-commerce Customer
  Support Voicebot in Hindi and English
End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English
Abhinav Goyal
Ashutosh Kumar Singh
Nikesh Garera
42
4
0
26 Oct 2022
Beyond English-Centric Bitexts for Better Multilingual Language
  Representation Learning
Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Barun Patra
Saksham Singhal
Shaohan Huang
Zewen Chi
Li Dong
Furu Wei
Vishrav Chaudhary
Xia Song
127
24
0
26 Oct 2022
Towards automatic generation of Piping and Instrumentation Diagrams
  (P&IDs) with Artificial Intelligence
Towards automatic generation of Piping and Instrumentation Diagrams (P&IDs) with Artificial Intelligence
Edwin Hirtreiter
Lukas Schulze Balhorn
Artur M. Schweidtmann
AI4CE
48
20
0
26 Oct 2022
Towards Better Few-Shot and Finetuning Performance with Forgetful Causal
  Language Models
Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models
Hao Liu
Xinyang Geng
Lisa Lee
Igor Mordatch
Sergey Levine
Sharan Narang
Pieter Abbeel
KELMCLL
89
2
0
24 Oct 2022
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
70
25
0
24 Oct 2022
Finding Memo: Extractive Memorization in Constrained Sequence Generation
  Tasks
Finding Memo: Extractive Memorization in Constrained Sequence Generation Tasks
Vikas Raunak
Arul Menezes
71
14
0
24 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken
  sentence embeddings
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
82
2
0
23 Oct 2022
Translation Word-Level Auto-Completion: What can we achieve out of the
  box?
Translation Word-Level Auto-Completion: What can we achieve out of the box?
Yasmin Moslem
Rejwanul Haque
Andy Way
100
5
0
23 Oct 2022
Additive Interventions Yield Robust Multi-Domain Machine Translation
  Models
Additive Interventions Yield Robust Multi-Domain Machine Translation Models
Elijah Matthew Rippeth
Matt Post
20
0
0
23 Oct 2022
Information-Transport-based Policy for Simultaneous Translation
Information-Transport-based Policy for Simultaneous Translation
Shaolei Zhang
Yang Feng
104
52
0
22 Oct 2022
Guided contrastive self-supervised pre-training for automatic speech
  recognition
Guided contrastive self-supervised pre-training for automatic speech recognition
Aparna Khare
Minhua Wu
Saurabhchand Bhati
J. Droppo
Roland Maas
SSL
59
0
0
22 Oct 2022
Audio-to-Intent Using Acoustic-Textual Subword Representations from
  End-to-End ASR
Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR
Pranay Dighe
Prateeth Nayak
Oggi Rudovic
Erik Marchi
Xiaochuan Niu
Ahmed H. Tewfik
84
4
0
21 Oct 2022
$m^4Adapter$: Multilingual Multi-Domain Adaptation for Machine
  Translation with a Meta-Adapter
m4Adapterm^4Adapterm4Adapter: Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter
Wen Lai
Alexandra Chronopoulou
Alexander Fraser
80
3
0
21 Oct 2022
SIT at MixMT 2022: Fluent Translation Built on Giant Pre-trained Models
SIT at MixMT 2022: Fluent Translation Built on Giant Pre-trained Models
A. Khan
Hrishikesh Kanade
G. Budhrani
Preet Jhanglani
Jia Xu
138
2
0
21 Oct 2022
Separating Grains from the Chaff: Using Data Filtering to Improve
  Multilingual Translation for Low-Resourced African Languages
Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages
Idris Abdulmumin
Michael Beukman
Jesujoba Oluwadara Alabi
Chris C. Emezue
Everlyn Asiko
...
Shamsuddeen Hassan Muhammad
Mofetoluwa Adeyemi
Oreen Yousuf
Sahib Singh
T. Gwadabe
105
9
0
19 Oct 2022
Simultaneous Translation for Unsegmented Input: A Sliding Window
  Approach
Simultaneous Translation for Unsegmented Input: A Sliding Window Approach
Sukanta Sen
Ondrej Bojar
Barry Haddow
21
4
0
18 Oct 2022
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample
  Decoding
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan
Guoli Ye
Yashesh Gaur
Jinyu Li
38
4
0
16 Oct 2022
A Policy-based Approach to the SpecAugment Method for Low Resource E2E
  ASR
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li
Guodong Ma
Dexin Zhao
Ranran Zeng
Xiaoyu Li
Haolin Huang
69
2
0
16 Oct 2022
HashFormers: Towards Vocabulary-independent Pre-trained Transformers
HashFormers: Towards Vocabulary-independent Pre-trained Transformers
Huiyin Xue
Nikolaos Aletras
51
4
0
14 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models
On Compressing Sequences for Self-Supervised Speech Models
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
56
15
0
13 Oct 2022
Previous
123...192021...373839
Next