Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 1,923 papers shown
Title
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Avihu Dekel
Raul Fernandez
49
2
0
08 Jun 2024
Large Language Model-guided Document Selection
Xiang Kong
Tom Gunter
Ruoming Pang
41
4
0
07 Jun 2024
Recovering document annotations for sentence-level bitext
R. Wicks
Matt Post
Philipp Koehn
39
4
0
06 Jun 2024
Enhancing CTC-based speech recognition with diverse modeling units
Shiyi Han
Zhihong Lei
Mingbin Xu
Xingyu Na
Zhen Huang
41
0
0
05 Jun 2024
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Shaolei Zhang
Qingkai Fang
Shoutao Guo
Zhengrui Ma
Min Zhang
Yang Feng
36
5
0
05 Jun 2024
LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation
Zengkui Sun
Yijin Liu
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
47
2
0
05 Jun 2024
Xmodel-LM Technical Report
Yichuan Wang
Yang Liu
Yu Yan
Qun Wang
Xucheng Huang
Ling Jiang
OSLM
ALM
35
1
0
05 Jun 2024
Multi-word Term Embeddings Improve Lexical Product Retrieval
Viktor Shcherbakov
Fedor Krasnov
28
0
0
03 Jun 2024
Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Bar Iluz
Yanai Elazar
Asaf Yehudai
Gabriel Stanovsky
43
1
0
02 Jun 2024
An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging
Sulaiman Khan
Md. Rafiul Biswas
Alina Murad
Hazrat Ali
Zubair Shah
37
4
0
02 Jun 2024
μ
μ
μ
LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Benjamin Thérien
Charles-Étienne Joseph
Boris Knyazev
Edouard Oyallon
Irina Rish
Eugene Belilovsky
AI4CE
51
1
0
31 May 2024
How Multilingual Are Large Language Models Fine-Tuned for Translation?
Aquia Richburg
Marine Carpuat
LRM
48
5
0
30 May 2024
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
46
4
0
29 May 2024
X-VILA: Cross-Modality Alignment for Large Language Model
Hanrong Ye
De-An Huang
Yao Lu
Zhiding Yu
Ming-Yu Liu
...
Jan Kautz
Song Han
Dan Xu
Pavlo Molchanov
Hongxu Yin
MLLM
VLM
49
32
0
29 May 2024
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Ge Zhang
Scott Qu
Jiaheng Liu
Chenchen Zhang
Chenghua Lin
...
Zi-Kai Zhao
Jiajun Zhang
Wanli Ouyang
Wenhao Huang
Wenhu Chen
ELM
43
44
0
29 May 2024
Integrating Multi-scale Contextualized Information for Byte-based Neural Machine Translation
Langlin Huang
Yang Feng
39
1
0
29 May 2024
Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform
Viviane Potocnik
Luca Colagrande
Tim Fischer
L. Bertaccini
Daniele Jahier Pagliari
Luca Bompani
Luca Benini
31
3
0
29 May 2024
Descriptive Image Quality Assessment in the Wild
Zhiyuan You
Jinjin Gu
Zheyuan Li
Xin Cai
Kaiwen Zhu
Chao Dong
Tianfan Xue
EGVM
48
16
0
29 May 2024
Wavelet-Based Image Tokenizer for Vision Transformers
Zhenhai Zhu
Radu Soricut
ViT
54
3
0
28 May 2024
Multi-objective Representation for Numbers in Clinical Narratives: A CamemBERT-Bio-Based Alternative to Large-Scale LLMs
Boammani Aser Lompo
Thanh-Dung Le
33
0
0
28 May 2024
Empowering Character-level Text Infilling by Eliminating Sub-Tokens
Houxing Ren
Mingjie Zhan
Zhongyuan Wu
Hongsheng Li
AI4CE
40
1
0
27 May 2024
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
Dixuan Wang
Yanda Li
Junyuan Jiang
Zepeng Ding
Ziqin Luo
Guochao Jiang
Jiaqing Liang
Deqing Yang
34
11
0
27 May 2024
MoEUT: Mixture-of-Experts Universal Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
Christopher Potts
Christopher D. Manning
MoE
45
6
0
25 May 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Byung-Kwan Lee
Chae Won Kim
Beomchan Park
Yonghyun Ro
MLLM
LRM
48
19
0
24 May 2024
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
Xianzhi Du
Tom Gunter
Xiang Kong
Mark Lee
Zirui Wang
Aonan Zhang
Nan Du
Ruoming Pang
MoE
25
0
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
45
0
23 May 2024
Why Not Transform Chat Large Language Models to Non-English?
Xiang Geng
Ming Zhu
Jiahuan Li
Zhejian Lai
Wei Zou
...
Xinglin Lyu
Min Zhang
Jiajun Chen
Hao Yang
Shujian Huang
45
2
0
22 May 2024
Non-autoregressive real-time Accent Conversion model with voice cloning
Vladimir Nechaev
Sergey Kosyakov
42
1
0
21 May 2024
Targeted Multilingual Adaptation for Low-resource Language Families
C.M. Downey
Terra Blevins
Dhwani Serai
Dwija Parikh
Shane Steinert-Threlkeld
40
2
0
20 May 2024
FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes
Dawid Wi'sniewski
Zofia Rostek
Artur Nowakowski
50
0
0
20 May 2024
Chasing COMET: Leveraging Minimum Bayes Risk Decoding for Self-Improving Machine Translation
Kamil Guttmann
Miko Pokrywka
Adrian Charkiewicz
Artur Nowakowski
58
3
0
20 May 2024
Automated Radiology Report Generation: A Review of Recent Advances
Phillip Sloan
Philip Clatworthy
Edwin Simpson
Majid Mirmehdi
34
17
0
17 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
35
13
0
16 May 2024
Libra: Building Decoupled Vision System on Large Language Models
Yifan Xu
Xiaoshan Yang
Y. Song
Changsheng Xu
MLLM
VLM
43
8
0
16 May 2024
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data
Yihong Liu
Chunlan Ma
Haotian Ye
Hinrich Schütze
36
4
0
16 May 2024
Unsupervised Extractive Dialogue Summarization in Hyperdimensional Space
Seongmin Park
Kyungho Kim
Jaejin Seo
Jihwa Lee
35
0
0
16 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
62
266
0
16 May 2024
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Wanting Xu
Yang Liu
Langping He
Xucheng Huang
Ling Jiang
VLM
MLLM
43
2
0
15 May 2024
A Japanese-Chinese Parallel Corpus Using Crowdsourcing for Web Mining
Masaaki Nagata
Makoto Morishita
Katsuki Chousa
Norihito Yasuda
29
2
0
15 May 2024
Challenges and Opportunities in Text Generation Explainability
Kenza Amara
Rita Sevastjanova
Mennatallah El-Assady
SILM
48
2
0
14 May 2024
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling
Siyuan Li
Zedong Wang
Zicheng Liu
Di Wu
Cheng Tan
Jiangbin Zheng
Yufei Huang
Stan Z. Li
45
7
0
13 May 2024
A Generalist Learner for Multifaceted Medical Image Interpretation
Hong-Yu Zhou
Subathra Adithan
J. N. Acosta
E. Topol
Pranav Rajpurkar
MedIm
43
27
0
13 May 2024
Zero-Shot Tokenizer Transfer
Benjamin Minixhofer
Edoardo Ponti
Ivan Vulić
VLM
49
9
0
13 May 2024
An Empirical Study on the Robustness of Massively Multilingual Neural Machine Translation
Supryadi Supryadi
Leiyu Pan
Deyi Xiong
30
0
0
13 May 2024
Constructing a BPE Tokenization DFA
Martin Berglund
Willeke Martens
Brink van der Merwe
20
2
0
13 May 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
67
0
0
13 May 2024
SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora
Faisal Qarah
41
5
0
10 May 2024
Kreyòl-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages
Nathaniel R. Robinson
Raj Dabre
Ammon Shurtz
Rasul Dent
Onenamiyi Onesi
...
Matthew Dean Stutzman
Bismarck Odoom
Sanjeev Khudanpur
Stephen D. Richardson
Kenton Murray
MoE
49
6
0
08 May 2024
Revisiting character-level adversarial attacks
Elias Abad Rocamora
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
V. Cevher
AAML
39
3
0
07 May 2024
Position: Leverage Foundational Models for Black-Box Optimization
Xingyou Song
Yingtao Tian
Robert Tjarko Lange
Chansoo Lee
Yujin Tang
Yutian Chen
42
5
0
06 May 2024
Previous
1
2
3
...
5
6
7
...
37
38
39
Next