Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.12672
Cited By
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
23 June 2021
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Charformer: Fast Character Transformers via Gradient-based Subword Tokenization"
50 / 61 papers shown
Title
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
102
7
0
17 Mar 2025
FourierNAT: A Fourier-Mixing-Based Non-Autoregressive Transformer for Parallel Sequence Generation
Andrew Kiruluta
Eric Lundy
Andreas Lemos
AI4TS
69
0
0
04 Mar 2025
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation
Langlin Huang
Mengyu Bu
Yang Feng
73
0
0
03 Nov 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
75
4
0
28 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
128
5
0
22 Oct 2024
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Linting Xue
Aditya Barua
Noah Constant
Rami Al-Rfou
Sharan Narang
Mihir Kale
Adam Roberts
Colin Raffel
83
502
0
28 May 2021
Joint Optimization of Tokenization and Downstream Model
Tatsuya Hiraoka
Sho Takase
Kei Uchiumi
Atsushi Keyaki
Naoaki Okazaki
41
17
0
26 May 2021
XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation
Sebastian Ruder
Noah Constant
Jan A. Botha
Aditya Siddhant
Orhan Firat
...
Pengfei Liu
Junjie Hu
Dan Garrette
Graham Neubig
Melvin Johnson
ELM
AAML
LRM
55
187
0
15 Apr 2021
Multi-view Subword Regularization
Xinyi Wang
Sebastian Ruder
Graham Neubig
66
45
0
15 Mar 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
85
218
0
11 Mar 2021
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
130
717
0
08 Nov 2020
CharBERT: Character-aware Pre-trained Language Model
Wentao Ma
Yiming Cui
Chenglei Si
Ting Liu
Shijin Wang
Guoping Hu
51
106
0
03 Nov 2020
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
144
142
0
24 Oct 2020
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
115
2,533
0
22 Oct 2020
CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
Hicham El Boukkouri
Olivier Ferret
Thomas Lavergne
Hiroshi Noji
Pierre Zweigenbaum
Junichi Tsujii
101
160
0
20 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
167
1,570
0
30 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
146
1,115
0
14 Sep 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
499
2,074
0
28 Jul 2020
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai
Nikolaos Pappas
Hao Peng
James Cross
Noah A. Smith
63
137
0
18 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
185
1,694
0
08 Jun 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
76
233
0
05 Jun 2020
Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
Xuanli He
Gholamreza Haffari
Mohammad Norouzi
46
46
0
03 May 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Kaj Bostrom
Greg Durrett
61
209
0
07 Apr 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
161
970
0
24 Mar 2020
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
130
607
0
10 Mar 2020
Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT
Lichao Sun
Kazuma Hashimoto
Wenpeng Yin
Akari Asai
Jia Li
Philip Yu
Caiming Xiong
SILM
AAML
48
102
0
27 Feb 2020
GLU Variants Improve Transformer
Noam M. Shazeer
118
989
0
12 Feb 2020
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
193
6,538
0
05 Nov 2019
BPE-Dropout: Simple and Effective Subword Regularization
Ivan Provilkov
Dmitrii Emelianenko
Elena Voita
59
285
0
29 Oct 2019
On the Cross-lingual Transferability of Monolingual Representations
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
165
793
0
25 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
369
20,053
0
23 Oct 2019
PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification
Yinfei Yang
Y. Zhang
Chris Tar
Jason Baldridge
AAML
61
366
0
30 Aug 2019
Combating Adversarial Misspellings with Robust Word Recognition
Danish Pruthi
Bhuwan Dhingra
Zachary Chase Lipton
149
305
0
27 May 2019
Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification
Daniel Borkan
Lucas Dixon
Jeffrey Scott Sorensen
Nithum Thain
Lucy Vasserman
86
487
0
11 Mar 2019
Cross-lingual Language Model Pretraining
Guillaume Lample
Alexis Conneau
73
2,735
0
22 Jan 2019
Mesh-TensorFlow: Deep Learning for Supercomputers
Noam M. Shazeer
Youlong Cheng
Niki Parmar
Dustin Tran
Ashish Vaswani
...
HyoukJoong Lee
O. Milenkovic
C. Young
Ryan Sepassi
Blake Hechtman
GNN
MoE
AI4CE
81
387
0
05 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.5K
94,511
0
11 Oct 2018
Learning to Segment Inputs for NMT Favors Character-Level Processing
Julia Kreutzer
Artem Sokolov
61
31
0
02 Oct 2018
FRAGE: Frequency-Agnostic Word Representation
Chengyue Gong
Di He
Xu Tan
Tao Qin
Liwei Wang
Tie-Yan Liu
OOD
57
144
0
18 Sep 2018
XNLI: Evaluating Cross-lingual Sentence Representations
Alexis Conneau
Guillaume Lample
Ruty Rinott
Adina Williams
Samuel R. Bowman
Holger Schwenk
Veselin Stoyanov
ELM
55
1,379
0
13 Sep 2018
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
178
3,514
0
19 Aug 2018
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
Taku Kudo
195
1,165
0
29 Apr 2018
Image Transformer
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
ViT
110
1,678
0
15 Feb 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
184
11,542
0
15 Feb 2018
Synthetic and Natural Noise Both Break Neural Machine Translation
Yonatan Belinkov
Yonatan Bisk
109
741
0
06 Nov 2017
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Daniel Cer
Mona T. Diab
Eneko Agirre
I. Lopez-Gazpio
Lucia Specia
347
1,880
0
31 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
628
130,942
0
12 Jun 2017
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU
Jacob Devlin
54
36
0
04 May 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
497
4,473
0
18 Apr 2017
Sequence Modeling via Segmentations
Chong-Jun Wang
Yining Wang
Po-Sen Huang
Abdel-rahman Mohamed
Dengyong Zhou
Li Deng
51
45
0
24 Feb 2017
1
2
Next