ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXivPDFHTML

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 1,926 papers shown
Title
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
  Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
36
107
0
12 Jul 2023
PolyLM: An Open Source Polyglot Large Language Model
PolyLM: An Open Source Polyglot Large Language Model
Xiangpeng Wei
Hao-Ran Wei
Huan Lin
Tianhao Li
Pei Zhang
...
Yu Bowen
Dayiheng Liu
Baosong Yang
Fei Huang
Jun Xie
LRM
48
57
0
12 Jul 2023
Large Language Models as General Pattern Machines
Large Language Models as General Pattern Machines
Suvir Mirchandani
F. Xia
Peter R. Florence
Brian Ichter
Danny Driess
Montse Gonzalez Arenas
Kanishka Rao
Dorsa Sadigh
Andy Zeng
LLMAG
67
187
0
10 Jul 2023
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing
Tom Sherborne
Tom Hosking
Mirella Lapata
OT
29
4
0
09 Jul 2023
On decoder-only architecture for speech-to-text and large language model
  integration
On decoder-only architecture for speech-to-text and large language model integration
Jian Wu
Yashesh Gaur
Zhuo Chen
Long Zhou
Yilun Zhu
...
Jinyu Li
Shujie Liu
Bo Ren
Linquan Liu
Yu-Huan Wu
AuLLM
41
122
0
08 Jul 2023
Token-Level Serialized Output Training for Joint Streaming ASR and ST
  Leveraging Textual Alignments
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Sara Papi
Peidong Wan
Junkun Chen
Jian Xue
Jinyu Li
Yashesh Gaur
43
8
0
07 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
36
5
0
06 Jul 2023
Focused Transformer: Contrastive Training for Context Scaling
Focused Transformer: Contrastive Training for Context Scaling
Szymon Tworkowski
Konrad Staniszewski
Mikolaj Pacek
Yuhuai Wu
Henryk Michalewski
Piotr Milo's
39
136
0
06 Jul 2023
Improving Language Plasticity via Pretraining with Active Forgetting
Improving Language Plasticity via Pretraining with Active Forgetting
Yihong Chen
Kelly Marchisio
Roberta Raileanu
David Ifeoluwa Adelani
Pontus Stenetorp
Sebastian Riedel
Mikel Artetx
KELM
AI4CE
CLL
47
24
0
03 Jul 2023
Challenges in Domain-Specific Abstractive Summarization and How to
  Overcome them
Challenges in Domain-Specific Abstractive Summarization and How to Overcome them
Anum Afzal
Juraj Vladika
Daniel Braun
Florian Matthes
HILM
41
10
0
03 Jul 2023
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal
  Data
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
Xinzhe Li
Ming Liu
Shang Gao
MU
63
8
0
02 Jul 2023
SMILE: Evaluation and Domain Adaptation for Social Media Language
  Understanding
SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
Vasilisa Bashlovkina
Riley Matthews
Zhaobin Kuang
Simon Baumgartner
Michael Bendersky
46
4
0
30 Jun 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen
  LLMs
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Lijun Yu
Yong Cheng
Zhiruo Wang
Vivek Kumar
Wolfgang Macherey
...
Yonatan Bisk
Ming-Hsuan Yang
Kevin Patrick Murphy
Alexander G. Hauptmann
Lu Jiang
MLLM
27
52
0
30 Jun 2023
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and
  Few-shot Agents
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents
M. Moradshahi
Tianhao Shen
Kalika Bali
Monojit Choudhury
Gaël de Chalendar
...
Michael Sun
Aditya Yadavalli
Chaobin You
Deyi Xiong
M. Lam
44
8
0
30 Jun 2023
A Formal Perspective on Byte-Pair Encoding
A Formal Perspective on Byte-Pair Encoding
Vilém Zouhar
Clara Meister
Juan Luis Gastaldi
Li Du
Tim Vieira
Mrinmaya Sachan
Ryan Cotterell
28
26
0
29 Jun 2023
Accelerating Transducers through Adjacent Token Merging
Accelerating Transducers through Adjacent Token Merging
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
38
4
0
28 Jun 2023
Extending Context Window of Large Language Models via Positional
  Interpolation
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian
51
503
0
27 Jun 2023
CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective
  Models on French Biomedical Data
CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective Models on French Biomedical Data
Rian Touchent
Laurent Romary
Eric Villemonte de la Clergerie
MedIm
39
4
0
27 Jun 2023
YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English
  Parallel Corpus
YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus
David C. Uthus
Garrett Tanzer
Manfred Georg
SLR
58
40
0
27 Jun 2023
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species
  Genome
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Zhihan Zhou
Yanrong Ji
Weijian Li
Pratik Dutta
R. Davuluri
Han Liu
27
175
0
26 Jun 2023
MotionGPT: Human Motion as a Foreign Language
MotionGPT: Human Motion as a Foreign Language
Biao Jiang
Xin Chen
Wen Liu
Jingyi Yu
Gang Yu
Tao Chen
MLLM
34
272
0
26 Jun 2023
Synthetic Alone: Exploring the Dark Side of Synthetic Data for
  Grammatical Error Correction
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
Chanjun Park
Seonmin Koo
Seolhwa Lee
Jaehyung Seo
Sugyeong Eo
Hyeonseok Moon
Heu-Jeoung Lim
53
0
0
26 Jun 2023
Resume Information Extraction via Post-OCR Text Processing
Resume Information Extraction via Post-OCR Text Processing
Selahattin Serdar Helli
Senem Tanberk
Sena Nur Cavsak
21
1
0
23 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
58
268
0
22 Jun 2023
Towards Accurate Translation via Semantically Appropriate Application of
  Lexical Constraints
Towards Accurate Translation via Semantically Appropriate Application of Lexical Constraints
Yujin Baek
Ko-tik Lee
Dayeon Ki
Hyoung-Gyu Lee
Cheonbok Park
Jaegul Choo
58
5
0
21 Jun 2023
Multi-pass Training and Cross-information Fusion for Low-resource
  End-to-end Accented Speech Recognition
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Xuefei Wang
Yanhua Long
Yijie Li
Haoran Wei
48
4
0
20 Jun 2023
Rehearsal-Free Online Continual Learning for Automatic Speech
  Recognition
Rehearsal-Free Online Continual Learning for Automatic Speech Recognition
Steven Vander Eeckt
Hugo Van hamme
CLL
45
3
0
19 Jun 2023
Guiding Language Models of Code with Global Context using Monitors
Guiding Language Models of Code with Global Context using Monitors
Lakshya A Agrawal
Aditya Kanade
Navin Goyal
Shuvendu K. Lahiri
S. Rajamani
68
23
0
19 Jun 2023
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech
  Representation
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Ziyang Ma
Zhisheng Zheng
Guanrou Yang
Yu Wang
Chuxu Zhang
Xie Chen
SSL
40
8
0
15 Jun 2023
Unified model for code-switching speech recognition and language
  identification based on a concatenated tokenizer
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer
Kunal Dhawan
KDimating Rekesh
Boris Ginsburg
27
10
0
14 Jun 2023
Tagged End-to-End Simultaneous Speech Translation Training using
  Simultaneous Interpretation Data
Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Katsuhito Sudoh
Satoshi Nakamura
40
6
0
14 Jun 2023
CipherSniffer: Classifying Cipher Types
CipherSniffer: Classifying Cipher Types
Brendan Artley
G. Mehdiyev
9
1
0
13 Jun 2023
Tokenization with Factorized Subword Encoding
Tokenization with Factorized Subword Encoding
David Samuel
Lilja Øvrelid
52
1
0
13 Jun 2023
Modality Adaption or Regularization? A Case Study on End-to-End Speech
  Translation
Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation
Yucheng Han
Chen Xu
Tong Xiao
Jingbo Zhu
40
3
0
13 Jun 2023
Measuring Sentiment Bias in Machine Translation
Measuring Sentiment Bias in Machine Translation
Kai Hartung
Aaricia Herygers
Shubham Kurlekar
Khabbab Zakaria
Taylan Volkan
Sören Gröttrup
Munir Georges
AI4CE
28
5
0
12 Jun 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for
  Automatic Speech Recognition
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Belen Alastruey
Lukas Drude
Jahn Heymann
Simon Wiesler
46
1
0
12 Jun 2023
Learning Multilingual Sentence Representations with Cross-lingual
  Consistency Regularization
Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization
Pengzhi Gao
Liwen Zhang
Zhongjun He
Hua Wu
Haifeng Wang
35
6
0
12 Jun 2023
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural
  Language Processing
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing
Asaad Alghamdi
Xinyu Duan
Wei Jiang
Zhenhai Wang
Yimeng Wu
...
Yifei Zheng
Mehdi Rezagholizadeh
Baoxing Huai
Peilun Cheng
Abbas Ghaddar
VLM
34
8
0
11 Jun 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
  Framework, and Benchmark
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhen-fei Yin
Jiong Wang
Jianjian Cao
Zhelun Shi
Dingning Liu
...
Lei Bai
Xiaoshui Huang
Zhiyong Wang
Jing Shao
Wanli Ouyang
MLLM
46
157
0
11 Jun 2023
Morphosyntactic probing of multilingual BERT models
Morphosyntactic probing of multilingual BERT models
Judit Ács
Endre Hamerlik
Roy Schwartz
Noah A. Smith
András Kornai
40
9
0
09 Jun 2023
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in
  End-to-End Automatic Speech Recognition
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Xianzhao Chen
Yist Y. Lin
Kang Wang
Yi He
Zejun Ma
34
2
0
09 Jun 2023
KIT's Multilingual Speech Translation System for IWSLT 2023
KIT's Multilingual Speech Translation System for IWSLT 2023
Danni Liu
Thai-Binh Nguyen
Sai Koneru
Enes Yavuz Ugan
Ngoc-Quan Pham
Tuan-Nam Nguyen
Tu Anh Dinh
Carlos Mullov
A. Waibel
Jan Niehues
44
7
0
08 Jun 2023
Privately generating tabular data using language models
Privately generating tabular data using language models
Alexandre Sablayrolles
Yue Wang
Brian Karrer
LMTD
33
4
0
07 Jun 2023
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Claytone Sikasote
Kalinda Siaminwe
Stanly Mwape
Bangiwe Zulu
Mofya Phiri
Martin Phiri
David Zulu
Mayumbo Nyirenda
Antonios Anastasopoulos
33
6
0
07 Jun 2023
Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based
  Augmentation
Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
Massa Baali
Ibrahim Almakky
Shady Shehata
Fakhri Karray
50
1
0
07 Jun 2023
LLMZip: Lossless Text Compression using Large Language Models
LLMZip: Lossless Text Compression using Large Language Models
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
D. Kalathil
J. Chamberland
S. Shakkottai
37
32
0
06 Jun 2023
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of
  Scientific Figure Captioning
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Zhishen Yang
Raj Dabre
Hideki Tanaka
Naoaki Okazaki
27
18
0
06 Jun 2023
Enhancing Language Representation with Constructional Information for
  Natural Language Understanding
Enhancing Language Representation with Constructional Information for Natural Language Understanding
Lvxiaowei Xu
Jian Wu
Jiawei Peng
Zhilin Gong
Ming Cai
Tianxiang Wang
34
3
0
05 Jun 2023
End-to-End Word-Level Pronunciation Assessment with MASK Pre-training
End-to-End Word-Level Pronunciation Assessment with MASK Pre-training
Yukang Liang
Kaitao Song
Shaoguang Mao
Huiqiang Jiang
Luna Qiu
Yuqing Yang
Dongsheng Li
Linli Xu
Lili Qiu
CVBM
28
4
0
05 Jun 2023
Cross-Lingual Transfer Learning for Phrase Break Prediction with
  Multilingual Language Model
Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model
Hoyeon Lee
Hyun-Wook Yoon
Jong-Hwan Kim
Jae-Min Kim
VLM
37
0
0
05 Jun 2023
Previous
123...131415...373839
Next