Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.10392
Cited By
CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
20 October 2020
Hicham El Boukkouri
Olivier Ferret
Thomas Lavergne
Hiroshi Noji
Pierre Zweigenbaum
Junichi Tsujii
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"
30 / 30 papers shown
Title
We're Calling an Intervention: Exploring Fundamental Hurdles in Adapting Language Models to Nonstandard Text
Aarohi Srivastava
David Chiang
57
0
0
10 Apr 2024
The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations
Aina Garí Soler
Matthieu Labeau
Chloé Clavel
VLM
34
2
0
22 Feb 2024
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
Xinzhe Li
Ming Liu
Shang Gao
MU
25
8
0
02 Jul 2023
Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages
Verena Blaschke
Hinrich Schütze
Barbara Plank
34
14
0
20 Apr 2023
An Information Extraction Study: Take In Mind the Tokenization!
Christos Theodoropoulos
Marie-Francine Moens
21
6
0
27 Mar 2023
Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training
Jing-ling Huang
Zhengxuan Wu
Kyle Mahowald
Christopher Potts
24
13
0
19 Dec 2022
On the State of the Art in Authorship Attribution and Authorship Verification
Jacob Tyo
Bhuwan Dhingra
Zachary Chase Lipton
32
22
0
14 Sep 2022
Review of Natural Language Processing in Pharmacology
D. Trajanov
Vangel Trajkovski
Makedonka Dimitrieva
Jovana Dobreva
Milos Jovanovik
Matej Klemen
Alevs vZagar
Marko Robnik-vSikonja
LM&MA
21
7
0
22 Aug 2022
Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective
Lisa Raithel
Philippe E. Thomas
Roland Roller
Oliver Sapina
Sebastian Möller
Pierre Zweigenbaum
16
2
0
03 Aug 2022
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
30
46
0
14 Jul 2022
Decorate the Examples: A Simple Method of Prompt Design for Biomedical Relation Extraction
Hui-Syuan Yeh
Thomas Lavergne
Pierre Zweigenbaum
19
10
0
21 Apr 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Guangyan Zhang
Kaitao Song
Xu Tan
Daxin Tan
Yuzi Yan
...
G. Wang
Wei Zhou
Tao Qin
Tan Lee
Sheng Zhao
SSL
20
21
0
31 Mar 2022
vTTS: visual-text to speech
Yoshifumi Nakano
Takaaki Saeki
Shinnosuke Takamichi
Katsuhito Sudoh
Hiroshi Saruwatari
9
4
0
28 Mar 2022
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models
Mark Chu
Bhargav Srinivasa Desikan
E. Nadler
Ruggerio L. Sardo
Elise Darragh-Ford
Douglas Guilbeault
18
0
0
15 Mar 2022
An Ensemble of Pre-trained Transformer Models For Imbalanced Multiclass Malware Classification
Ferhat Demirkiran
Aykut Çayır
U. Ünal
Hasan Dag
30
42
0
25 Dec 2021
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
30
140
0
20 Dec 2021
Using Distributional Principles for the Semantic Study of Contextual Language Models
Olivier Ferret
17
1
0
23 Nov 2021
Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching
P. Chopra
Sai Krishna Rallabandi
A. Black
Khyathi Raghavi Chandu
10
6
0
01 Nov 2021
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
Arij Riabi
Benoît Sagot
Djamé Seddah
26
15
0
26 Oct 2021
Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models
Robert Wolfe
Aylin Caliskan
85
51
0
01 Oct 2021
BERT Cannot Align Characters
Antonis Maronikolakis
Philipp Dufter
Hinrich Schütze
23
0
0
20 Sep 2021
How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology?
Chantal Amrhein
Rico Sennrich
22
13
0
02 Sep 2021
Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens
Itay Itzhak
Omer Levy
17
18
0
25 Aug 2021
DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text
Bharathi Raja Chakravarthi
R. Priyadharshini
Vigneshwaran Muralidaran
Navya Jose
Shardul Suryawanshi
E. Sherly
John P. Mccrae
17
104
0
17 Jun 2021
CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing
Sai Muralidhar Jayanthi
Kavya Nerella
Khyathi Raghavi Chandu
A. Black
MoE
23
8
0
10 Jun 2021
IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always Hope in Transformers
Karthik Puranik
Adeep Hande
R. Priyadharshini
Sajeetha Thavareesan
Bharathi Raja Chakravarthi
15
59
0
19 Apr 2021
AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
LM&MA
MedIm
18
164
0
16 Apr 2021
UniParma at SemEval-2021 Task 5: Toxic Spans Detection Using CharacterBERT and Bag-of-Words Model
Akbar Karimi
L. Rossi
Andrea Prati
11
4
0
17 Mar 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
27
210
0
11 Mar 2021
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1